RAIL: Reachability-Aided Imitation Learning for Safe Policy Execution

Abstract

End-to-end imitation learning (IL) has shown great success in learning complex robot manipulation tasks. However, there remains a need for practical safety methods to justify widespread deployment. In particular, it is important to certify that a system obeys hard constraints on unsafe behavior in settings when it is unacceptable to design a tradeoff between performance and safety via tuning the policy (i.e. soft constraints). This leads to the question, how does enforcing hard constraints impact the performance (meaning safely completing tasks) of an IL policy?

To answer this question, this paper builds a reachability-based safety filter to enforce hard constraints on IL, which we call Reachability-Aided Imitation Learning (RAIL). Through evaluations with state-of-the-art IL policies in mobile robots and manipulation tasks, we make two key findings. First, the highest-performing policies are sometimes only so because they frequently violate constraints, and sig- nificantly lose performance under hard constraints. Second, surprisingly, hard constraints on the lower-performing policies can occasionally increase their ability to perform tasks safely. Finally, hardware evaluation confirms the method can operate in real-time.

Video

What can RAIL do?

(1) RAIL can safely reason in long-horizon

In this PointMaze task, the mobile robot must navigate to its goal in a maze while not colliding with walls.

We report the key metric, Safe Success Rate—the percentage of successful episodes with no safety violations, out of 100 experiments. RAIL significantly increases safe success rate by combining long-horizon reasoning capability and safety guarantee respectively from IL policy and reachability analysis.

Method	Success Rate	Safe Success Rate	Collision Rate
Diffusion Policy	100.00%	15.00%	5.66%
Safety-guided Diffusion Policy	100.00%	18.00%	4.85%
Model-based Planner	49.00%	49.00%	0.00%
Diffusion Policy + RAIL (Ours)	95.00%	95.00%	0.00%

If the robot collides with a wall, the screen flashes in red. For Diffusion Policy + RAIL, a green circle appears when safety intervention happens and blue circle when it operates without interventon. Video speed: 1x

(2) RAIL can discover safe successful mode of IL policy

In this PickPlace task, the single-arm manipulator must pick the can and place in the target bin. Safety means avoiding collisions between the robot and the environment while obeying joint position and velocity constraints. The gripper-can collision is ignored during picking, but collisions between the can and the environment are considered unsafe during placing.

RAIL increases Safe Success Rate by 9.25% over Diffusion Policy and 2.00% over Action Chunking Transformer, by finding safe, successful IL policy modes. We illustrate some notable behaviors of RAIL:

Turning gripper

RAIL adjusts the orientation of the gripper to descend and safely pick up the can.

Non-prehensile Maneuver

RAIL repositions the can to avoid collisions between the bin and gripper during pick-up.

Path correction

RAIL corrects the placing path to prevent collisions between the can and the cereal box.

If safety is violated, the screen flashes in red. For Diffusion Policy + RAIL, a green circle appears when safety intervention happens and blue circle when it operates without interventon. Video speed: (Left) 5x (Right) 2.5x when notable behaviors are observed, 10x otherwise

(3) RAIL can run real-time

On hardware, RAIL is able to verify safety and compute the plan in average 0.42 seconds.

BibTeX

@article{jung2024rail,
  author    = {Jung, Wonsuhk and Anthony, Dennis and Mishra, Utkarsh and Arachchige, Nadun and Bronars, Matthew, and Xu, Danfei and Kousik, Shreyas},
  title     = {RAIL: Reachability-Aided Imitation Learning for Safe Policy Execution},
  journal   = {arXiv preprint arXiv:2409.19190},
  year      = {2024},
}

RAIL: Reachability-based Imitation Learning for Safe Policy Execution

How do we bridge the gap between what robots can do and what they can safely do ? We introduce RAIL : a safety-filter framework for Imitation Learning (IL) policy leveraging reachability analysis to ensure robot safety while maximizing robot capability.