Turning gripper
RAIL adjusts the orientation of the gripper to descend and safely pick up the can.
End-to-end imitation learning (IL) has shown great success in learning complex robot manipulation tasks. However, there remains a need for practical safety methods to justify widespread deployment. In particular, it is important to certify that a system obeys hard constraints on unsafe behavior in settings when it is unacceptable to design a tradeoff between performance and safety via tuning the policy (i.e. soft constraints). This leads to the question, how does enforcing hard constraints impact the performance (meaning safely completing tasks) of an IL policy?
To answer this question, this paper builds a reachability-based safety filter to enforce hard constraints on IL, which we call Reachability-Aided Imitation Learning (RAIL). Through evaluations with state-of-the-art IL policies in mobile robots and manipulation tasks, we make two key findings. First, the highest-performing policies are sometimes only so because they frequently violate constraints, and sig- nificantly lose performance under hard constraints. Second, surprisingly, hard constraints on the lower-performing policies can occasionally increase their ability to perform tasks safely. Finally, hardware evaluation confirms the method can operate in real-time.
In this PointMaze task, the mobile robot must navigate to its goal in a maze while not colliding with walls.
We report the key metric, Safe Success Rate—the percentage of successful episodes with no safety violations, out of 100 experiments. RAIL significantly increases safe success rate by combining long-horizon reasoning capability and safety guarantee respectively from IL policy and reachability analysis.
Method | Success Rate | Safe Success Rate | Collision Rate |
---|---|---|---|
Diffusion Policy | 100.00% | 15.00% | 5.66% |
Safety-guided Diffusion Policy | 100.00% | 18.00% | 4.85% |
Model-based Planner | 49.00% | 49.00% | 0.00% |
Diffusion Policy + RAIL (Ours) | 95.00% | 95.00% | 0.00% |
If the robot collides with a wall, the screen flashes in red. For Diffusion Policy + RAIL, a green circle appears when safety intervention happens and blue circle when it operates without interventon. Video speed: 1x
In this PickPlace task, the single-arm manipulator must pick the can and place in the target bin. Safety means avoiding collisions between the robot and the environment while obeying joint position and velocity constraints. The gripper-can collision is ignored during picking, but collisions between the can and the environment are considered unsafe during placing.
RAIL increases Safe Success Rate by 9.25% over Diffusion Policy and 2.00% over Action Chunking Transformer, by finding safe, successful IL policy modes. We illustrate some notable behaviors of RAIL:
RAIL adjusts the orientation of the gripper to descend and safely pick up the can.
RAIL repositions the can to avoid collisions between the bin and gripper during pick-up.
RAIL corrects the placing path to prevent collisions between the can and the cereal box.
If safety is violated, the screen flashes in red. For Diffusion Policy + RAIL, a green circle appears when safety intervention happens and blue circle when it operates without interventon. Video speed: (Left) 5x (Right) 2.5x when notable behaviors are observed, 10x otherwise
On hardware, RAIL is able to verify safety and compute the plan in average 0.42 seconds.
@article{jung2024rail,
author = {Jung, Wonsuhk and Anthony, Dennis and Mishra, Utkarsh and Arachchige, Nadun and Bronars, Matthew, and Xu, Danfei and Kousik, Shreyas},
title = {RAIL: Reachability-Aided Imitation Learning for Safe Policy Execution},
journal = {arXiv preprint arXiv:2409.19190},
year = {2024},
}