Publications | Yuhong Cao

2025

APEX: Action Priors Enable Efficient Exploration for Skill Imitation on Articulated Robots

Shivam Sood, Laukik Nakhwa, Yuhong Cao, Ge Sun, and Guillaume Sartoretti

Under review, 2025

Abs Bib PDF Video Website

Learning by imitation provides an effective way for robots to develop well-regulated complex behaviors and directly benefit from natural demonstrations. State-of-the-art imitation learning (IL) approaches typically leverage Adversarial Motion Priors (AMP), which, despite their impressive results, suffer from two key limitations. They are prone to mode collapse, which often leads to overfitting to the simulation environment and thus increased sim-to-real gap, and they struggle to learn diverse behaviors effectively. To overcome these limitations, we introduce APEX (Action Priors enable Efficient eXploration): a simple yet versatile imitation learning framework that integrates demonstrations directly into reinforcement learning (RL), maintaining high exploration while grounding behavior with expert-informed priors. We achieve this through a combination of decaying action priors, which initially bias exploration toward expert demonstrations but gradually allow the policy to explore independently. This is complemented by a multi-critic RL framework that effectively balances stylistic consistency with task performance. Our approach achieves sample-efficient imitation learning and enables the acquisition of diverse skills within a single policy. APEX generalizes to varying velocities and preserves reference-like styles across complex tasks such as navigating rough terrain and climbing stairs, utilizing only flat-terrain kinematic motion data as a prior. We validate our framework through extensive hardware experiments on the Unitree Go2 quadruped. There, APEX yields diverse and agile locomotion gaits, inherent gait transitions, and the highest reported speed for the platform to the best of our knowledge (peak velocity of 3.3 m/s on hardware). Our results establish APEX as a compelling alternative to existing IL methods, offering better efficiency, adaptability, and real-world performance.
@article{sood2025apex, title = {APEX: Action Priors Enable Efficient Exploration for Skill Imitation on Articulated Robots}, author = {Sood, Shivam and Nakhwa, Laukik and Cao, Yuhong and Sun, Ge and Sartoretti, Guillaume}, journal = {Under review}, year = {2025} }
CogniPlan: Uncertainty-Guided Path Planning with Conditional Generative Layout Prediction

Yizhuo Wang, Haodong He, Jingsong Liang, Yuhong Cao, Ritabrata Chakraborty, and Guillaume Sartoretti

Conference on Robot Learning, 2025

Abs Bib PDF Video Website

Path planning in unknown environments is a crucial yet inherently challenging capability for mobile robots, which primarily encompasses two coupled tasks: autonomous exploration and point-goal navigation. In both cases, the robot must perceive the environment, update its belief, and accurately estimate potential information gain on-the-fly to guide planning. In this work, we propose CogniPlan, a novel path planning framework that leverages multiple plausible layouts predicted by a COnditional GeNerative Inpainting model, mirroring how humans rely on cognitive maps during navigation. These predictions, based on the partially observed map and a set of layout conditioning vectors, enable our planner to reason effectively under uncertainty. We demonstrate strong synergy between generative image-based layout prediction and graph-attention-based path planning, allowing CogniPlan to combine the scalability of graph representations with the fidelity and predictiveness of occupancy maps, yielding notable performance gains in both exploration and navigation. We extensively evaluate CogniPlan on two datasets (hundreds of maps and realistic floor plans), consistently outperforming state-of-the-art planners. We further deploy it in a high-fidelity simulator and on hardware, showcasing its high-quality path planning and real-world applicability.
@article{wang2025cogniplan, title = {CogniPlan: Uncertainty-Guided Path Planning with Conditional Generative Layout Prediction}, author = {Wang, Yizhuo and He, Haodong and Liang, Jingsong and Cao, Yuhong and Chakraborty, Ritabrata and Sartoretti, Guillaume}, year = {2025}, journal = {Conference on Robot Learning} }
Search-TTA: A Multi-Modal Test-Time Adaptation Framework for Visual Search in the Wild

Derek Ming Siang Tan, Boyang Liu, Alok Raj, Qi Xuan Ang, Weiheng Dai, Tanishq Duhan, Jimmy Chiun, and 3 more authors

Conference on Robot Learning, 2025

Abs Bib PDF Video Website

To perform autonomous visual search for environmental monitoring, a robot may leverage satellite imagery as a prior map. This can help inform coarse, high level search and exploration strategies, even when such images lack sufficient resolution to allow fine-grained, explicit visual recognition of targets. However, there are some challenges to overcome with using satellite images to direct visual search. For one, targets that are unseen in satellite images are underrepresented (compared to real life) in most existing datasets, and thus vision models trained on these datasets fail to reason effectively based on indirect visual cues. Furthermore, approaches which leverage large Vision Language Models (VLMs) for generalization may yield inaccurate outputs due to hallucination, leading to inefficient search. To address these challenges, we introduce Search-TTA, a multimodal test-time adaptation framework that can accept text and/or image input. First, we pretrain a remote sensing image encoder to align with CLIP’s visual encoder to output probability distributions of target presence used for visual search. Second, our framework dynamically refines CLIP’s predictions during search using a test-time adaptation mechanism. Through a feedback loop inspired by Spatial Poisson Point Processes, gradient updates (weighted by uncertainty) are used to correct (potentially inaccurate) predictions and improve search performance. To validate Search-TTA’s performance, we curate a visual search dataset based on internet-scale ecological data. We find that Search-TTA improves planner performance by up to 9.7%, particularly in cases with poor initial CLIP predictions. It also achieves comparable performance to state-of-the-art VLMs. Finally, we deploy Search-TTA on a real UAV via hardware-in-the-loop testing, by simulating its operation within a large-scale simulation that provides onboard sensing.
@article{tan2025searchtta, title = {Search-TTA: A Multi-Modal Test-Time Adaptation Framework for Visual Search in the Wild}, author = {Tan, Derek Ming Siang and Liu, Boyang and Raj, Alok and Ang, Qi Xuan and Dai, Weiheng and Duhan, Tanishq and Chiun, Jimmy and Cao, Yuhong and Shkurti, Florian and Sartoretti, Guillaume}, journal = {Conference on Robot Learning}, year = {2025} }
SATA: Safe and Adaptive Torque-Based Locomotion Policies Inspired by Animal Learning

Peizhuo Li, Hongyi Li, Ge Sun, Jin Cheng, Xinrong Yang, Guillaume Bellegarda, Milad Shafiee, and 3 more authors

2025 Robotics: Science and Systems (RSS), 2025

Abs Bib PDF Video Code

Despite recent advances in learning-based controllers for legged robots, deployments in human-centric environments remain limited by safety concerns. Most of these approaches use position-based control, where policies output target joint angles that must be processed by a low-level controller (e.g., PD or impedance controllers) to compute joint torques. Although impressive results have been achieved in controlled real-world scenarios, these methods often struggle with compliance and adaptability when encountering environments or disturbances unseen during training, potentially resulting in extreme or unsafe behaviors. Inspired by how animals achieve smooth and adaptive movements by controlling muscle extension and contraction, torque-based policies offer a promising alternative by enabling precise and direct control of the actuators in torque space. In principle, this approach facilitates more effective interactions with the environment, resulting in safer and more adaptable behaviors. However, challenges such as a highly nonlinear state space and inefficient exploration during training have hindered their broader adoption. To address these limitations, we propose SATA, a bio-inspired framework that mimics key biomechanical principles and adaptive learning mechanisms observed in animal locomotion. Our approach effectively addresses the inherent challenges of learning torque-based policies by significantly improving early-stage exploration, leading to high-performance final policies. Remarkably, our method achieves zero-shot sim-to-real transfer. Our experimental results indicate that SATA demonstrates remarkable compliance and safety, even in challenging environments such as soft/slippery terrain or narrow passages, and under significant external disturbances, highlighting its potential for practical deployments in human-centric and safety-critical scenarios.
@article{li2025sata, title = {SATA: Safe and Adaptive Torque-Based Locomotion Policies Inspired by Animal Learning}, author = {Li, Peizhuo and Li, Hongyi and Sun, Ge and Cheng, Jin and Yang, Xinrong and Bellegarda, Guillaume and Shafiee, Milad and Cao, Yuhong and Ijspeert, Auke and Sartoretti, Guillaume}, journal = {2025 Robotics: Science and Systems (RSS)}, year = {2025}, }
MARVEL: Multi-Agent Reinforcement Learning for constrained field-of-View multi-robot Exploration in Large-scale environments

Jimmy Chiun, Shizhe Zhang, Yizhuo Wang, Yuhong Cao, and Guillaume Sartoretti

2025 IEEE International Conference on Robotics and Automation (ICRA), 2025

Abs Bib PDF Video Code

In multi-robot exploration, a team of mobile robot is tasked with efficiently mapping an unknown environments. While most exploration planners assume omnidirectional sensors like LiDAR, this is impractical for small robots such as drones, where lightweight, directional sensors like cameras may be the only option due to payload constraints. These sensors have a constrained field-of-view (FoV), which adds complexity to the exploration problem, requiring not only optimal robot positioning but also sensor orientation during movement. In this work, we propose MARVEL, a neural framework that leverages graph attention networks, together with novel frontiers and orientation features fusion technique, to develop a collaborative, decentralized policy using multi-agent reinforcement learning (MARL) for robots with constrained FoV. To handle the large action space of viewpoints planning, we further introduce a novel information-driven action pruning strategy. MARVEL improves multi-robot coordination and decision-making in challenging large-scale indoor environments, while adapting to various team sizes and sensor configurations (ie, FoV and sensor range) without additional training. Our extensive evaluation shows that MARVEL’s learned policies exhibit effective coordinated behaviors, outperforming state-ofthe-art exploration planners across multiple metrics. We experimentally demonstrate MARVEL’s generalizability in large-scale environments, of up to 90m by 90m, and validate its practical applicability through successful deployment on a team of real drone hardware.
@article{chiun2025marvel, title = {MARVEL: Multi-Agent Reinforcement Learning for constrained field-of-View multi-robot Exploration in Large-scale environments}, author = {Chiun, Jimmy and Zhang, Shizhe and Wang, Yizhuo and Cao, Yuhong and Sartoretti, Guillaume}, journal = {2025 IEEE International Conference on Robotics and Automation (ICRA)}, year = {2025}, publisher = {IEEE} }
SIGMA: Sheaf-Informed Geometric Multi-Agent Pathfinding

Shuhao Liao, Weihang Xia, Yuhong Cao, Weiheng Dai, Chengyang He, Wenjun Wu, and Guillaume Sartoretti

2025 IEEE International Conference on Robotics and Automation (ICRA), 2025

Abs Bib PDF Code

The Multi-Agent Path Finding (MAPF) problem aims to determine the shortest and collision-free paths for multiple agents in a known, potentially obstacle-ridden environment. It is the core challenge for robotic deployments in large-scale logistics and transportation. Decentralized learning-based approaches have shown great potential for addressing the MAPF problems, offering more reactive and scalable solutions. However, existing learning-based MAPF methods usually rely on agents making decisions based on a limited field of view (FOV), resulting in short-sighted policies and inefficient cooperation in complex scenarios. There, a critical challenge is to achieve consensus on potential movements between agents based on limited observations and communications. To tackle this challenge, we introduce a new framework that applies sheaf theory to decentralized deep reinforcement learning, enabling agents to learn geometric cross-dependencies between each other through local consensus and utilize them for tightly cooperative decision-making. In particular, sheaf theory provides a mathematical proof of conditions for achieving global consensus through local observation. Inspired by this, we incorporate a neural network to approximately model the consensus in latent space based on sheaf theory and train it through self-supervised learning. During the task, in addition to normal features for MAPF as in previous works, each agent distributedly reasons about a learned consensus feature, leading to efficient cooperation on pathfinding and collision avoidance. As a result, our proposed method demonstrates significant improvements over state-of-the-art learning-based MAPF planners, especially in relatively large and complex scenarios, demonstrating its superiority over baselines in various simulations and real-world robot experiments.
@article{liao2025sigma, title = {SIGMA: Sheaf-Informed Geometric Multi-Agent Pathfinding}, author = {Liao, Shuhao and Xia, Weihang and Cao, Yuhong and Dai, Weiheng and He, Chengyang and Wu, Wenjun and Sartoretti, Guillaume}, journal = {2025 IEEE International Conference on Robotics and Automation (ICRA)}, year = {2025}, publisher = {IEEE} }
DARE: Diffusion Policy for Autonomous Robot Exploration

Yuhong Cao, Jeric Lew, Jingsong Liang, Jin Cheng, and Guillaume Sartoretti

2025 IEEE International Conference on Robotics and Automation (ICRA), 2025

Abs Bib PDF Video Code

Autonomous robot exploration requires a robot to efficiently explore and map unknown environments. Compared to conventional methods that can only optimize paths based on the current robot belief, learning-based methods show the potential to achieve improved performance by drawing on past experiences to reason about unknown areas. In this paper, we propose DARE, a novel generative approach that leverages diffusion models trained on expert demonstrations, which can explicitly generate an exploration path through one-time inference. We build DARE upon an attention-based encoder and a diffusion policy model, and introduce ground truth optimal demonstrations for training to learn better patterns for exploration. The trained planner can reason about the partial belief to recognize the potential structure in unknown areas and consider these areas during path planning. Our experiments demonstrate that DARE achieves on-par performance with both conventional and learning-based state-of-the-art exploration planners, as well as good generalizability in both simulations and real-life scenarios.
@article{cao2025dare, title = {DARE: Diffusion Policy for Autonomous Robot Exploration}, author = {Cao, Yuhong and Lew, Jeric and Liang, Jingsong and Cheng, Jin and Sartoretti, Guillaume}, journal = {2025 IEEE International Conference on Robotics and Automation (ICRA)}, year = {2025}, publisher = {IEEE} }
Heterogeneous Multi-robot Task Allocation and Scheduling via Reinforcement Learning

Weiheng Dai, Utkarsh Rai, Jimmy Chiun, Yuhong Cao, and Guillaume Sartoretti

IEEE Robotics and Automation Letters, 2025

Abs Bib PDF Video Code

Many multi-robot applications require allocating a team of heterogeneous agents (robots) with different abilities to cooperatively complete a given set of spatially distributed tasks as quickly as possible. We focus on tasks that can only be initiated when all required agents are present otherwise arrived agents would be waiting idly. Agents need to not only execute a sequence of tasks by dynamically forming and disbanding teams to satisfy/match diverse ability requirements of each task but also account for the schedules of other agents to minimize unnecessary idle time. Conventional methods such as mix-integer programming generally require centralized scheduling and a long optimization time, which limits their potential for real-world applications. In this work, we propose a reinforcement learning framework to train a decentralized policy applicable to heterogeneous agents. To address the challenge of complex cooperation learning, we further introduce a constrained flashforward mechanism to guide/constrain the agents’ exploration and help them make better predictions. Through an attention mechanism that reasons about both short-term cooperation and long-term scheduling dependency, agents learn to reactively choose their next tasks (and subsequent coalitions) to avoid wasting abilities and to shorten the overall task completion time (makespan). We compare our method with state-of-the-art heuristic and mixed-integer programming methods, demonstrating its generalization ability and showing it closely matches or outperforms these baselines while remaining at least two orders of magnitude faster.
@article{dai2025heterogeneous, title = {Heterogeneous Multi-robot Task Allocation and Scheduling via Reinforcement Learning}, author = {Dai, Weiheng and Rai, Utkarsh and Chiun, Jimmy and Cao, Yuhong and Sartoretti, Guillaume}, journal = {IEEE Robotics and Automation Letters}, year = {2025}, publisher = {IEEE} }

2024

HDPlanner: Advancing Autonomous Deployments in Unknown Environments through Hierarchical Decision Networks

Jingsong Liang, Yuhong Cao, Yixiao Ma, Hanqi Zhao, and Guillaume Sartoretti

IEEE Robotics and Automation Letters, 2024

Abs Bib PDF Video Code

In this paper, we introduce HDPlanner, a deep reinforcement learning (DRL) based framework designed to tackle two core and challenging tasks for mobile robots: autonomous exploration and navigation, where the robot must optimize its trajectory adaptively to achieve the task objective through continuous interactions in unknown environments. Specifically, HDPlanner relies on novel hierarchical attention networks to empower the robot to reason about its belief across multiple spatial scales and sequence collaborative decisions, where our networks decompose long-term objectives into short-term informative task assignments and informative path plannings. We further propose a contrastive learning-based joint optimization to enhance the robustness of HDPlanner. We empirically demonstrate that HDPlanner significantly outperforms state-of-the-art conventional and learning-based baselines on an extensive set of simulations, including hundreds of test maps and large-scale, complex Gazebo environments. Notably, HDPlanner achieves real-time planning with travel distances reduced by up to 35.7% compared to exploration benchmarks and by up to 16.5% than navigation benchmarks. Furthermore, we validate our approach on hardware, where it generates high-quality, adaptive trajectories in both indoor and outdoor environments, highlighting its real-world applicability without additional training.
@article{liang2024hdplanner, title = {HDPlanner: Advancing Autonomous Deployments in Unknown Environments through Hierarchical Decision Networks}, author = {Liang, Jingsong and Cao, Yuhong and Ma, Yixiao and Zhao, Hanqi and Sartoretti, Guillaume}, journal = {IEEE Robotics and Automation Letters}, year = {2024}, publisher = {IEEE}, }
ViPER: Visibility-based Pursuit-Evasion via Reinforcement Learning

Yizhuo Wang*, Yuhong Cao*, Jimmy Chiun, Koley Subhadeep, Pham Mandy, and Guillaume Sartoretti

In Conference on Robot Learning , 2024

Abs Bib PDF Video Code

In visibility-based pursuit-evasion tasks, a team of mobile pursuer robots with limited sensing capabilities is tasked with detecting all evaders in a multiply-connected planar environment, whose map may or may not be known to pursuers beforehand. This requires tight coordination among multiple agents to ensure that the omniscient and potentially arbitrarily fast evaders are guaranteed to be detected by the pursuers. Whereas existing methods typically rely on a relatively large team of agents to clear the environment, we propose ViPER, a neural solution that leverages a graph attention network to learn a coordinated yet distributed policy via multi-agent reinforcement learning (MARL). We experimentally demonstrate that ViPER significantly outperforms other state-of-the-art non-learning planners, showcasing its emergent coordinated behaviors and adaptability to more challenging scenarios and various team sizes, and finally deploy its learned policies on hardware in an aerial search task.
@inproceedings{wang2024viper, title = {ViPER: Visibility-based Pursuit-Evasion via Reinforcement Learning}, author = {Wang*, Yizhuo and Cao*, Yuhong and Chiun, Jimmy and Subhadeep, Koley and Mandy, Pham and Sartoretti, Guillaume}, booktitle = {Conference on Robot Learning}, year = {2024} }
Privileged Reinforcement and Communication Learning for Distributed, Bandwidth-limited Multi-robot Exploration

Yixiao Ma, Jingsong Liang, Yuhong Cao, Derek Ming Siang Tan, and Guillaume Sartoretti

In International Symposium on Distributed Autonomous Robotic Systems (DARS) , 2024

Abs Bib PDF Code

Communication bandwidth is an important consideration in multi-robot exploration, where information exchange among robots is critical. While existing methods typically aim to reduce communication throughput, they either require significant computation or significantly compromise exploration efficiency. In this work, we propose a deep reinforcement learning framework based on communication and privileged reinforcement learning to achieve a significant reduction in bandwidth consumption, while minimally sacrificing exploration efficiency. Specifically, our approach allows robots to learn to embed the most salient information from their individual belief (partial map) over the environment into fixed-sized messages. Robots then reason about their own belief as well as received messages to distributedly explore the environment while avoiding redundant work. In doing so, we employ privileged learning and learned attention mechanisms to endow the critic (i.e., teacher) network with ground truth map knowledge to effectively guide the policy (i.e., student) network during training. Compared to relevant baselines, our model allows the team to reduce communication by up to two orders of magnitude, while only sacrificing a marginal 2.4% in total travel distance, paving the way for efficient, distributed multi-robot exploration in bandwidth-limited scenarios.
@inproceedings{ma2024privileged, title = {Privileged Reinforcement and Communication Learning for Distributed, Bandwidth-limited Multi-robot Exploration}, author = {Ma, Yixiao and Liang, Jingsong and Cao, Yuhong and Tan, Derek Ming Siang and Sartoretti, Guillaume}, booktitle = {International Symposium on Distributed Autonomous Robotic Systems (DARS)}, year = {2024} }
IR2: Implicit Rendezvous for Robotic Exploration Teams under Sparse Intermittent Connectivity

Derek Ming Siang Tan, Yixiao Ma, Jingsong Liang, Yi Cheng Chng, Yuhong Cao, and Guillaume Sartoretti

In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2024

Abs Bib PDF Video Code Website

Information sharing is critical in time-sensitive and realistic multi-robot exploration, especially for smaller robotic teams in large-scale environments where connectivity may be sparse and intermittent. Existing methods often overlook such communication constraints by assuming unrealistic global connectivity. Other works account for communication constraints (by maintaining close proximity or line of sight during information exchange), but are often inefficient. For instance, preplanned rendezvous approaches typically involve unnecessary detours resulting from poorly timed rendezvous, while pursuit-based approaches often result in short-sighted decisions due to their greedy nature. We present IR2, a deep reinforcement learning approach to information sharing for multi-robot exploration. Leveraging attention-based neural networks trained via reinforcement and curriculum learning, IR2 allows robots to effectively reason about the longer-term trade-offs between disconnecting for solo exploration and reconnecting for information sharing. In addition, we propose a hierarchical graph formulation to maintain a sparse yet informative graph, enabling our approach to scale to large-scale environments. We present simulation results in three large-scale Gazebo environments, which show that our approach yields 6.6-34.1% shorter exploration paths and significantly improved mapped area consistency among robots when compared to state-of-the-art baselines.
@inproceedings{tan2024IR2, title = {IR2: Implicit Rendezvous for Robotic Exploration Teams under Sparse Intermittent Connectivity}, author = {Tan, Derek Ming Siang and Ma, Yixiao and Liang, Jingsong and Chng, Yi Cheng and Cao, Yuhong and Sartoretti, Guillaume}, booktitle = {2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, year = {2024} }
STAR: Swarm Technology for Aerial Robotics Research

Jimmy Chiun, Yan Rui Tan, Yuhong Cao, John Tan, and Guillaume Sartoretti

The International Conference on Control, Automation, and Systems, 2024

Abs Bib PDF Video

In recent years, the field of aerial robotics has witnessed significant progress, finding applications in diverse domains, including post-disaster search and rescue operations. Despite these strides, the prohibitive acquisition costs associated with deploying physical multi-UAV systems have posed challenges, impeding their widespread utilization in research endeavors. To overcome these challenges, we present STAR (Swarm Technology for Aerial Robotics Research), a framework developed explicitly to improve the accessibility of aerial swarm research experiments. Our framework introduces a swarm architecture based on the Crazyflie, a low-cost, open-source, palm-sized aerial platform, well suited for experimental swarm algorithms. To augment cost-effectiveness and mitigate the limitations of employing low-cost robots in experiments, we propose a landmark-based localization module leveraging fiducial markers. This module, also serving as a target detection module, enhances the adaptability and versatility of the framework. Additionally, collision and obstacle avoidance are implemented through velocity obstacles. The presented work strives to bridge the gap between theoretical advances and tangible implementations, thus fostering progress in the field.
@article{chiun2024star, title = {STAR: Swarm Technology for Aerial Robotics Research}, author = {Chiun, Jimmy and Tan, Yan Rui and Cao, Yuhong and Tan, John and Sartoretti, Guillaume}, journal = {The International Conference on Control, Automation, and Systems}, year = {2024} }
Deep Reinforcement Learning-based Large-scale Robot Exploration

Yuhong Cao, Rui Zhao, Yizhuo Wang, Bairan Xiang, and Guillaume Sartoretti

IEEE Robotics and Automation Letters, 2024

Abs Bib HTML PDF Video Code

In this work, we propose a deep reinforcement learning (DRL) based reactive planner to solve large-scale Lidar-based autonomous robot exploration problems in 2D action space. Our DRL-based planner allows the agent to reactively plan its exploration path by making implicit predictions about unknown areas, based on a learned estimation of the underlying transition model of the environment. To this end, our approach relies on learned attention mechanisms for their powerful ability to capture long-term dependencies at different spatial scales to reason about the robot’s entire belief over known areas. Our approach relies on ground truth information (i.e., privileged learning) to guide the environment estimation during training, as well as on a graph rarefaction algorithm, which allows models trained in small-scale environments to scale to large-scale ones. Simulation results show that our model exhibits better exploration efficiency (12% in path length, 6% in makespan) and lower planning time (60%) than the state-of-the-art planners in a 130mx100m benchmark scenario. We also validate our learned model on hardware.
@article{cao2024deep, title = {Deep Reinforcement Learning-based Large-scale Robot Exploration}, author = {Cao, Yuhong and Zhao, Rui and Wang, Yizhuo and Xiang, Bairan and Sartoretti, Guillaume}, journal = {IEEE Robotics and Automation Letters}, year = {2024}, publisher = {IEEE} }

2023

Context-Aware Deep Reinforcement Learning for Autonomous Robotic Navigation in Unknown Area

Jingsong Liang*, Zhichen Wang*, Yuhong Cao*, Jimmy Chiun, Mengqi Zhang, and Guillaume Adrien Sartoretti

In Conference on Robot Learning , 2023

Abs Bib HTML PDF Video Code

Mapless navigation refers to a challenging task where a mobile robot must rapidly navigate to a predefined destination using its partial knowledge of the environment, which is updated online along the way, instead of a prior map of the environment. Inspired by the recent developments in deep reinforcement learning (DRL), we propose a learning-based framework for mapless navigation, which employs a context-aware policy network to achieve efficient decision-making (ie, maximize the likelihood of finding the shortest route towards the target destination), especially in complex and large-scale environments. Specifically, our robot learns to form a context of its belief over the entire known area, which it uses to reason about long-term efficiency and sequence show-term movements. Additionally, we propose a graph rarefaction algorithm to enable more efficient decision-making in large-scale applications. We empirically demonstrate that our approach reduces average travel time by up to 61.4% and average planning time by up to 88.2% compared to benchmark planners (D* lite and BIT) on hundreds of test scenarios. We also validate our approach both in high-fidelity Gazebo simulations as well as on hardware, highlighting its promising applicability in the real world without further training/tuning.
@inproceedings{liang2023context, title = {Context-Aware Deep Reinforcement Learning for Autonomous Robotic Navigation in Unknown Area}, author = {Liang*, Jingsong and Wang*, Zhichen and Cao*, Yuhong and Chiun, Jimmy and Zhang, Mengqi and Sartoretti, Guillaume Adrien}, booktitle = {Conference on Robot Learning}, pages = {1425--1436}, year = {2023}, organization = {PMLR} }
Intent-based Deep Reinforcement Learning for Multi-agent Informative Path Planning

Tianze Yang, Yuhong Cao, and Guillaume Sartoretti

In 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS) , 2023

Abs Bib HTML PDF Video Code

In multi-agent informative path planning (MAIPP), agents must collectively construct a global belief map of an underlying distribution of interest (e.g., gas concentration, light intensity, or pollution levels) over a given domain, based on measurements taken along their trajectory. They must frequently replan their path to balance the distributed exploration of new areas and the collective, meticulous exploitation of known high-interest areas, to maximize the information gained within a predefined budget (e.g., path length or working time). A common approach to achieving such cooperation relies on planning the agents’ paths reactively, conditioned on other agents’ future actions. However, as the agent’s belief is updated continuously, these predicted future actions may not end up being the ones executed by agents, introducing a form of noise/inaccuracy in the system and often decreasing performance. In this work, we propose a decentralized deep reinforcement learning (DRL) approach to MAIPP, which relies on an attention-based neural network, where agents optimize long-term individual and cooperative objectives by explicitly sharing their intent (i.e., medium-/long-term future positions distribution, obtained from their individual policy) in a reactive, asynchronous manner. That is, in our work, intent sharing allows agents to learn to claim/avoid broader areas of the world. Moreover, since our approach relies on learned attention over these shared intents, agents are able to learn to recognize the useful portion(s) of these (imperfect) predictions to maximize cooperation even in the presence of imperfect information. Our comparison experiments demonstrate the performance of our approach compared to its variants and high-quality baselines over a large set of MAIPP simulations.
@inproceedings{yang2023intent, title = {Intent-based Deep Reinforcement Learning for Multi-agent Informative Path Planning}, author = {Yang, Tianze and Cao, Yuhong and Sartoretti, Guillaume}, booktitle = {2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS)}, pages = {71--77}, year = {2023}, organization = {IEEE} }
Spatio-temporal attention network for persistent monitoring of multiple mobile targets

Yizhuo Wang, Yutong Wang, Yuhong Cao, and Guillaume Sartoretti

In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2023

Abs Bib HTML PDF Video Code

This work focuses on the persistent monitoring problem, where a set of targets moving based on an unknown model must be monitored by an autonomous mobile robot with a limited sensing range. To keep each target’s position estimate as accurate as possible, the robot needs to adaptively plan its path to (re-)visit all the targets and update its belief from measurements collected along the way. In doing so, the main challenge is to strike a balance between exploitation, i.e., re-visiting previously-located targets, and exploration, i.e., finding new targets or re-acquiring lost ones. Encouraged by recent advances in deep reinforcement learning, we introduce an attention-based neural solution to the persistent monitoring problem, where the agent can learn the inter-dependencies between targets, i.e., their spatial and temporal correlations, conditioned on past measurements. This endows the agent with the ability to determine which target, time, and location to attend to across multiple scales, which we show also helps relax the usual limitations of a finite target set with prior positional information. We experimentally demonstrate that our method outperforms other baselines in terms of number of targets visits and average estimation error in complex environments. Finally, we implement and validate our model in a drone-based simulation experiment to monitor mobile ground targets in a high-fidelity simulator.
@inproceedings{wang2023spatio, title = {Spatio-temporal attention network for persistent monitoring of multiple mobile targets}, author = {Wang, Yizhuo and Wang, Yutong and Cao, Yuhong and Sartoretti, Guillaume}, booktitle = {2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, pages = {3903--3910}, year = {2023}, organization = {IEEE} }
Ariadne: A reinforcement learning approach using attention-based deep networks for exploration

Yuhong Cao, Tianxiang Hou, Yizhuo Wang, Xian Yi, and Guillaume Sartoretti

In 2023 IEEE International Conference on Robotics and Automation (ICRA) , 2023

Abs Bib HTML PDF Video Code

In autonomous robot exploration tasks, a mobile robot needs to actively explore and map an unknown environment as fast as possible. Since the environment is being revealed during exploration, the robot needs to frequently re-plan its path online, as new information is acquired by onboard sensors and used to update its partial map. While state-of-the-art exploration planners are frontier- and sampling-based, encouraged by the recent development in deep reinforcement learning (DRL), we propose ARiADNE, an attention-based neural approach to obtain real-time, non-myopic path planning for autonomous exploration. ARiADNE is able to learn dependencies at multiple spatial scales between areas of the agent’s partial map, and implicitly predict potential gains associated with exploring those areas. This allows the agent to sequence movement actions that balance the natural trade-off between exploitation/refinement of the map in known areas and exploration of new areas. We experimentally demonstrate that our method outperforms both learning and non-learning state-of-the-art baselines in terms of average trajectory length to complete exploration in hundreds of simplified 2D indoor scenarios. We further validate our approach in high-fidelity Robot Operating System (ROS) simulations, where we consider a real sensor model and a realistic low-level motion controller, toward deployment on real robots.
@inproceedings{cao2023ariadne, title = {Ariadne: A reinforcement learning approach using attention-based deep networks for exploration}, author = {Cao, Yuhong and Hou, Tianxiang and Wang, Yizhuo and Yi, Xian and Sartoretti, Guillaume}, booktitle = {2023 IEEE International Conference on Robotics and Automation (ICRA)}, pages = {10219--10225}, year = {2023}, organization = {IEEE} }

2022

CAtNIPP: Context-aware attention-based network for informative path planning

Yuhong Cao, Yizhuo Wang, Apoorva Vashisth, Haolin Fan, and Guillaume Adrien Sartoretti

In Conference on Robot Learning , 2022

Abs Bib HTML PDF Video Code

Informative path planning (IPP) is an NP-hard problem, which aims at planning a path allowing an agent to build an accurate belief about a quantity of interest throughout a given search domain, within constraints on resource budget (e.g., path length for robots with limited battery life). IPP requires frequent online replanning as this belief is updated with every new measurement (i.e., adaptive IPP), while balancing short-term exploitation and longer-term exploration to avoid suboptimal, myopic behaviors. Encouraged by the recent developments in deep reinforcement learning, we introduce CAtNIPP, a fully reactive, neural approach to the adaptive IPP problem. CAtNIPP relies on self-attention for its powerful ability to capture dependencies in data at multiple spatial scales. Specifically, our agent learns to form a context of its belief over the entire domain, which it uses to sequence local movement decisions that optimize short- and longer-term search objectives. We experimentally demonstrate that CAtNIPP significantly outperforms state-of-the-art non-learning IPP solvers in terms of solution quality and computing time once trained, and present experimental results on hardware.
@inproceedings{cao2023catnipp, title = {CAtNIPP: Context-aware attention-based network for informative path planning}, author = {Cao, Yuhong and Wang, Yizhuo and Vashisth, Apoorva and Fan, Haolin and Sartoretti, Guillaume Adrien}, booktitle = {Conference on Robot Learning}, pages = {1928--1937}, year = {2022}, organization = {PMLR} }
Distributed reinforcement learning for robot teams: A review

Yutong Wang, Mehul Damani, Pamela Wang, Yuhong Cao, and Guillaume Sartoretti

Current Robotics Reports, 2022

Abs Bib HTML PDF

Recent advances in sensing, actuation, and computation have opened the door to multi-robot systems consisting of hundreds/thousands of robots, with promising applications to automated manufacturing, disaster relief, harvesting, last-mile delivery, port/airport operations, or search and rescue. The community has leveraged model-free multi-agent reinforcement learning (MARL) to devise efficient, scalable controllers for multi-robot systems (MRS). This review aims to provide an analysis of the state-of-the-art in distributed MARL for multi-robot cooperation. Decentralized MRS face fundamental challenges, such as non-stationarity and partial observability. Building upon the “centralized training, decentralized execution” paradigm, recent MARL approaches include independent learning, centralized critic, value decomposition, and communication learning approaches. Cooperative behaviors are demonstrated through AI benchmarks and fundamental real-world robotic capabilities such as multi-robot motion/path planning. This survey reports the challenges surrounding decentralized model-free MARL for multi-robot cooperation and existing classes of approaches. We present benchmarks and robotic applications along with a discussion on current open avenues for research.
@article{wang2022distributed, title = {Distributed reinforcement learning for robot teams: A review}, author = {Wang, Yutong and Damani, Mehul and Wang, Pamela and Cao, Yuhong and Sartoretti, Guillaume}, journal = {Current Robotics Reports}, volume = {3}, number = {4}, pages = {239--257}, year = {2022}, publisher = {Springer} }
Dan: Decentralized attention-based neural network for the minmax multiple traveling salesman problem

Yuhong Cao, Zhanhong Sun, and Guillaume Sartoretti

In International Symposium on Distributed Autonomous Robotic Systems , 2022

Awareded Abs Bib HTML PDF Video Code

Best student paper

The multiple traveling salesman problem (mTSP) is a well-known NP-hard problem with numerous real-world applications. In particular, this work addresses MinMax mTSP, where the objective is to minimize the max tour length among all agents. Many robotic deployments require recomputing potentially large mTSP instances frequently, making the natural trade-off between computing time and solution quality of great importance. However, exact and heuristic algorithms become inefficient as the number of cities increases, due to their computational complexity. Encouraged by the recent developments in deep reinforcement learning (dRL), this work approaches the mTSP as a cooperative task and introduces DAN, a decentralized attention-based neural method that aims at tackling this key trade-off. In DAN, agents learn fully decentralized policies to collaboratively construct a tour, by predicting each other’s future decisions. Our model relies on attention mechanisms and is trained using multi-agent RL with parameter sharing, providing natural scalability to the numbers of agents and cities. Our experimental results on small- to large-scale mTSP instances (50 to 1000 cities and 5 to 20 agents) show that DAN is able to match or outperform state-of-the-art solvers while keeping planning times low. In particular, given the same computation time budget, DAN outperforms all conventional and dRL-based baselines on larger-scale instances (more than 100 cities, more than 5 agents), and exhibits enhanced agent collaboration.
@inproceedings{cao2022dan, title = {Dan: Decentralized attention-based neural network for the minmax multiple traveling salesman problem}, author = {Cao, Yuhong and Sun, Zhanhong and Sartoretti, Guillaume}, booktitle = {International Symposium on Distributed Autonomous Robotic Systems}, pages = {202--215}, year = {2022}, organization = {Springer} }