Humanoid robots are increasingly being developed for seamless interaction with humans in diverse domains, yet generating expressive and physically-feasible motions remains a core challenge. We propose a robust and automated pipeline for motion retargeting that enables the generation of natural, stable, and highly expressive motions for a wide variety of humanoid robots using different motion data sources, including noisy pose estimations. To ensure robustness, our approach unifies motions from different kinematic structures into a common canonical rig, systematically refines the motion trajectory to address infeasible poses, enforces foot-contact constraints, and enhances stability. The retargeted motion is then refined to closely follow the source motion while respecting each robot's physical limits. Through extensive experiments on 12 simulated robots and validation on three real robots, we show that our methodology reliably produces expressive upper-body movements with consistent foot contact. This work represents an important step towards automating robust and expressive motion generation for humanoid robots, enabling deployment in various real-world scenarios.
(A) Human motion data is first extracted from motion capture systems or 3D pose estimation methods. These inputs may include high-fidelity MoCap recordings or noisy video-based estimations.
(B) The common-rigging process converts various human skeleton structures into a single unified rig. This step includes pre-rigging (to match proportions and joint constraints) and post-rigging (to correct foot placement, align the center of mass, and eliminate self-collisions).
(C) The refined motion is then retargeted to diverse robots using a direction vector-based approach. Joint trajectories are optimized through inverse kinematics, enforcing robot-specific constraints such as joint limits and collision avoidance. The final result is a physically-feasible robot motion that preserves the expressiveness of the original human movement.
Common-Rigging is a critical step that unifies motion data from different skeleton structures into a single, standardized rig. In the Pre-rigging stage, various human skeletons are retargeted to a predefined common rig using inverse kinematics. This rig incorporates a rigid body structure and physical properties such as mass and moment of inertia, which help handle self-collisions and refine noisy poses. The Post-rigging stage uses these physical properties to refine the motion further—enforcing foot-ground contact, aligning the center of mass (COM), and smoothing out physically implausible artifacts. As a result, the final trajectory becomes both physically feasible and robot-executable.
Our motion retargeting pipeline adapts the unified motion from the common rig to various robot platforms with different kinematic structures and physical constraints.
The process begins by identifying joints of interest (JOI) for each robot, which correspond to key joints in the common rig. A direction vector-based approach is then used to compute the robot’s target pose by scaling directional vectors according to each robot’s link lengths.
The resulting target pose is used as input to an inverse kinematics (IK) solver, which computes joint angles that respect robot-specific constraints such as joint limits, velocity bounds, and self-collision avoidance.
After solving IK, the motion trajectory is further optimized to track the original motion while ensuring physical feasibility and trajectory smoothness.
This flexible and generalizable pipeline enables the generation of expressive, physically-valid motions across a diverse set of humanoid robots, as demonstrated in the accompanying visual materials.
To evaluate the effectiveness and practical applicability of our motion retargeting pipeline, we conducted real-world experiments on three humanoid platforms: AMBIDEX, THORMANG, and JF2.
These robots possess distinct kinematic structures and physical characteristics, such as different joint limits, link geometries, and dynamic properties, thereby validating the generalizability of our method.
The retargeted motions—generated from both high-quality MoCap data and noisy 3D pose estimations from RGB videos—were successfully deployed without additional tuning.
Our experiments confirmed that the proposed approach can produce smooth, stable, and expressive upper-body motions that closely follow the original human movement, even under foot-fixed constraints.
This video illustrates the end-to-end workflow of our robust robot motion retargeting pipeline.
The process begins with capturing human motion using a video camera. The recorded footage is processed through a state-of-the-art 3D pose estimation algorithm, extracting joint-level motion trajectories from monocular RGB input.
The estimated motion is then passed through the common-rigging module, where skeletal inconsistencies are resolved and physical feasibility is enforced, preparing the motion for robot deployment.
Next, the refined motion is retargeted to a target robot via our flexible pipeline, which accounts for the robot's unique kinematic configuration and physical constraints, such as joint limits and collision boundaries.
Finally, the resulting trajectory is executed in real time on the robot, demonstrating smoothness, stability, and expressiveness that faithfully reflect the original human motion.
This seamless integration—from human video input to real-world robotic execution—highlights the practicality and deployability of our framework in real-time human-robot interaction scenarios.
@inproceedings{jeong2025robust,
title = {Robust and Expressive Humanoid Motion Retargeting via Optimization-Based Rig Unification},
author = {Jeong, Taemoon and Byun, Taehyun and Kim, Jihoon and Choi, Keunjoon and Oh, Jaesung and Lee, Sungpyo and Darwish, Omar and Kim, Joohyung and Choi, Sungjoon},
booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year = {2025},
note = {Accepted}
}