Motion retargeting is pivotal in executing complex tasks by transferring human motion to robotic systems with precision. However, existing motion retargeting approaches often require elaborate human motion capture setups and fail to account for the scale discrepancies between hands and bodies, resulting in suboptimal accuracy. Additionally, their retargeting methods face challenges such as high computational complexity, susceptibility to local minima, and inaccuracies in mapping unseen motions. Moreover, these methods are typically tailored to specific robot platforms, lacking generalization capabilities. In this paper, we propose a novel vision-guided motion retargeting framework and approach to address these limitations. Our framework employs a dual-stream architecture, processing hands and bodies separately to effectively mitigate scale discrepancies, thereby enhancing precision. The proposed retargeting method integrates a graph encoder network to generate meaningful initial embeddings, subsequently optimized in the latent space. This strategy significantly reduces complexity while circumventing local minima issues. Crucially, by graphically modeling human poses and robot files, our method eliminates the need for paired datasets, enabling broad applicability across diverse robots. Experiments successfully replicate human motions and validate the feasibility and accuracy of human-robot motion retargeting in both simulated environments on RMC-DA, YuMi, and Unitree H1, as well as the real-world RMC-DA, underscoring the practical value of our vision-guided motion retargeting framework.
We estimate the human pose from RGB images using an off-the-shelf pose estimator, and feed our proposed GNNRetarget model with the predicted human pose. Thanks to its lightweight design and high retargeting accuracy, our model can directly control the real RMC-DA robot.
We take multiple human motion sequences and map them onto different robots through our motion retargeting algorithm. This algorithm allows us to transfer the captured human movements onto robots with varying kinematic structures and degrees of freedom.
Here is a comparison video of different motion retargeting methods in our paper, comparing the NLO method with our approach.
Here are various demonstrations of hand motion retargeting dexterity.This demonstration further showcases the dexterity of hand motion retargeting.
Schematic of the overall design framework for motion retargeting. The upper block represents the two-stream visual motion retargeting framework, while the lower block depicts the structure of the graph neural network-based retargeting method, GNNRetarget. The RGB camera captures visual information, which is then processed using visual extraction techniques to obtain the operator’s pose. This pose data is subsequently fed into the GNNRetarget module for latent optimization, ultimately yielding the optimized pose for robot control
author = {Yuanchuan Lai, Qing Gao, Xin Zhang, and Zhaojie Ju},
title = {GNNRetarget: Vision-Guided Motion Retargeting Based on Graph Neural Network for Dexterous Robot},
journal = {TASE},
year = {2025},