Robust and Expressive Humanoid Motion Retargeting via Optimization-Based Rig Unification

Korea University1, CINAMON2, Rainbow robotics3, NAVER LABS4,
University of Illinois Urbana-Champaign5

*Corresponding Author
Korea Univ CINAMON Rainbow Robotics NAVER LABS UIUC

Accepted to IROS 2025

This video is a motion result retargeted from a human motion video to the AMBIDEX robot. The AMBIDEX robot performs movements smoothly, ensuring no issues such as collisions occur. The video was filmed at the Naver 1784 building.

Abstract

Humanoid robots are increasingly being developed for seamless interaction with humans in diverse domains, yet generating expressive and physically-feasible motions remains a core challenge. We propose a robust and automated pipeline for motion retargeting that enables the generation of natural, stable, and highly expressive motions for a wide variety of humanoid robots using different motion data sources, including noisy pose estimations. To ensure robustness, our approach unifies motions from different kinematic structures into a common canonical rig, systematically refines the motion trajectory to address infeasible poses, enforces foot-contact constraints, and enhances stability. The retargeted motion is then refined to closely follow the source motion while respecting each robot's physical limits. Through extensive experiments on 12 simulated robots and validation on three real robots, we show that our methodology reliably produces expressive upper-body movements with consistent foot contact. This work represents an important step towards automating robust and expressive motion generation for humanoid robots, enabling deployment in various real-world scenarios.

Overview

MY ALT TEXT

(A) Human motion data is first extracted from motion capture systems or 3D pose estimation methods. These inputs may include high-fidelity MoCap recordings or noisy video-based estimations.

(B) The common-rigging process converts various human skeleton structures into a single unified rig. This step includes pre-rigging (to match proportions and joint constraints) and post-rigging (to correct foot placement, align the center of mass, and eliminate self-collisions).

(C) The refined motion is then retargeted to diverse robots using a direction vector-based approach. Joint trajectories are optimized through inverse kinematics, enforcing robot-specific constraints such as joint limits and collision avoidance. The final result is a physically-feasible robot motion that preserves the expressiveness of the original human movement.

Common-rigging for motion refinement

Common-Rigging is a critical step that unifies motion data from different skeleton structures into a single, standardized rig. In the Pre-rigging stage, various human skeletons are retargeted to a predefined common rig using inverse kinematics. This rig incorporates a rigid body structure and physical properties such as mass and moment of inertia, which help handle self-collisions and refine noisy poses. The Post-rigging stage uses these physical properties to refine the motion further—enforcing foot-ground contact, aligning the center of mass (COM), and smoothing out physically implausible artifacts. As a result, the final trajectory becomes both physically feasible and robot-executable.

Flexible motion retargeting to diverse robots

Our motion retargeting pipeline adapts the unified motion from the common rig to various robot platforms with different kinematic structures and physical constraints.

The process begins by identifying joints of interest (JOI) for each robot, which correspond to key joints in the common rig. A direction vector-based approach is then used to compute the robot’s target pose by scaling directional vectors according to each robot’s link lengths.

The resulting target pose is used as input to an inverse kinematics (IK) solver, which computes joint angles that respect robot-specific constraints such as joint limits, velocity bounds, and self-collision avoidance.

After solving IK, the motion trajectory is further optimized to track the original motion while ensuring physical feasibility and trajectory smoothness.

This flexible and generalizable pipeline enables the generation of expressive, physically-valid motions across a diverse set of humanoid robots, as demonstrated in the accompanying visual materials.

Real Robot Experiments

Robust Robot Motion Retargeting Pipeline: Real-Time Execution

This video illustrates the end-to-end workflow of our robust robot motion retargeting pipeline.

The process begins with capturing human motion using a video camera. The recorded footage is processed through a state-of-the-art 3D pose estimation algorithm, extracting joint-level motion trajectories from monocular RGB input.

The estimated motion is then passed through the common-rigging module, where skeletal inconsistencies are resolved and physical feasibility is enforced, preparing the motion for robot deployment.

Next, the refined motion is retargeted to a target robot via our flexible pipeline, which accounts for the robot's unique kinematic configuration and physical constraints, such as joint limits and collision boundaries.

Finally, the resulting trajectory is executed in real time on the robot, demonstrating smoothness, stability, and expressiveness that faithfully reflect the original human motion.

This seamless integration—from human video input to real-world robotic execution—highlights the practicality and deployability of our framework in real-time human-robot interaction scenarios.

Dance with AMBIDEX

BibTeX

@inproceedings{jeong2025robust,
  title     = {Robust and Expressive Humanoid Motion Retargeting via Optimization-Based Rig Unification},
  author    = {Jeong, Taemoon and Byun, Taehyun and Kim, Jihoon and Choi, Keunjoon and Oh, Jaesung and Lee, Sungpyo and Darwish, Omar and Kim, Joohyung and Choi, Sungjoon},
  booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year      = {2025},
  note      = {Accepted}
}