Admittance Visuomotor Policy for Contact-Rich Manipulation Tasks with Hand-Arm Teleoperation

Abstract

Operating smoothly in contact-rich environments is crucial for robots to effectively perform daily tasks. Leveraging contact force information can enhance the smoothness of interactive operations, force control should be addressed throughout the entire process of autonomous policy development, from data collection, training to deployment.

we propose an admittance imitation learning visuomotor policy framework to reduce mean contact force and force fluctuations. Our framework utilizes RGB images, robot joint positions, end-effector poses, and contact force as inputs, employing a diffusion model to generate future end-effector trajectories and contact force.

An admittance controller is employed to track this trajectories, enabling effective force control for various tasks. Furthermore, a low-cost hand-arm teleoperation system with interactive feedback is designed for data collection. A evaluation of our teleoperation system and policy framework was conducted on five contact-rich manipulation tasks, each representing an action primitive. Results show that our framework achieves the highest success rate and demonstrates smoother contact compared to other methods. Particularly for door opening task, the average contact force is reduced by 53.92%, while the standard deviation of contact force fluctuations is diminished by 76.51%.

Video

AdmitDiff Policy Framework

AdmitDiff Policy Framework. Left: During inference, the previous two steps’ observations are encoded as inputs for noise estimation, while the student model outputs actions for the next 8 time steps, the number K represents the denoising iteration required by the diffuser. The arm’s force-position trajectory is used in the admittance controller to compute the desired pose. Middle: The teacher model is trained for 100 denoising steps, then its parameters are frozen to train the student model with a consistency loss for single-step denoising. Right: Data collection, including contact force information, is performed using the teleoperation system designed in this work.

Robot Teleoperation System

There are three main components that provide critical data for the teleoperation task.

Wrist spatial pose control module: It estimates the 6-DoF position and orientation of the wearer’s wrist using the Intel RealSense Tracking Camera T265. The Intel RealSense T265 integrates Stereo cameras and an IMU to run Visual-Inertial Odometry to provide accurate spatial odometry. Following the approach in [9], we designed a simple glove using Velcro and 3D-printed parts to securely mount the T265 on the operator’s palm.

Gesture detection and remapping module: It uses the Ultraleap Leap Motion Controller to estimate the keypoints coordinates of human hand gestures. Then it generates the joint angles of a multi-fingered robotic hand based on detected human hand gestures, allowing precise gesture-based interaction with the robot.

Wrist-mounted force feedback module: It uses Eccentric Rotating Mass (ERM) vibration motors attached to the operator’s palm. Our system uses an stm32f103c8t6 microcontroller to generate Pulse Width Modulation (PWM) signals for a NPN Bipolar Junction Transistor, based on force data from the wrist Force/Torque sensor. Contact forces between 0-20N are mapped linearly to vibration intensity, providing real-time feedback to the operator, similar to the approach in [27].

Results

AdmitDiff Policy consistently demonstrates the most stable contact force control, with lower variance and more controlled median force, especially in tasks like Insertion and Wiping that require precision. In contrast, Diffusion Policy and Consistency Policy exhibit higher variance, particularly in force-intensive tasks like DoorOpening, indicating less stable control. The AdmitDiff Policy shows an average reduction of approximately 48.8% in mean contact force and 52.0% in standard deviation compared to the other methods in the five tasks.

Failure Cases

We also uploaded some failure cases of AdmitDiff Policy. Among them, for tasks involving repetitive behaviors, the policy occasionally fails to determine the task completion condition, resulting in repeated execution of the same set of actions even after the task has been completed.

This might be due to insufficient extraction of visual features, leading to reduced generalization. For tasks with high precision requirements, failures may occur due to occasional grasp misalignments, which subsequently lead to downstream action failures. This could be attributed to a lack of demonstration trajectories that include operations under specific grasping poses necessary for task completion.

We plan to further improve the robustness and generalization of the model in complex tasks by enhancing the visual feature extraction module and expanding the diversity of demonstration data.

BibTeX

@article{bo@2024admitdiff,
  author    = {Bo Zhou, Ruixuan Jiao, Yi Li, Xiaogang Yuan, Fang Fang, Shihua Li,},
  title     = {Admittance Visuomotor Policy Learning for General-Purpose Contact-Rich Manipulations},
  journal   = {arXiv preprint arXiv:2409.14440},
  year      = {2024},
}