We implement four representative teleoperation pipelines—monocular vision, MoCap, VR, and exoskeleton— under
a unified, modular interface.
1.Vision-based
The system first calibrates once in a neutral T‑pose, using SMPL to derive body shape β and link-scale
factors s that align human and robot kinematics. During operation, SMPLer-X streams the operator's
upper-body pose, which is rescaled by s and solved with PINK IK for arm-and-wrist motion, while MediaPipe
key-points refined by Dex-Retargeting drive precise finger control. Decoupling limb and hand estimation
yields robust, real-time teleoperation from pure vision input.
2.MoCap-based
The inertial-MoCap pipeline uses an Xsens MVN suit (23 IMUs) plus Manus Metagloves. After a one-time
calibration, the MVN stream provides the 6-DoF pose of 23 body segments, while each glove outputs 20
finger-joint DoFs. Raw limb data are first transformed from the MVN's global frame to the robot frame
(pelvis origin, forward +X, vertical +Z).
A joint-specific, real-time rescaling module then compensates for
human-robot link-length mismatches before Closed-Loop Inverse Kinematics (CLIK) solves the robot's
arm-and-wrist poses. For the hands, the glove's MCP, PIP, DIP and ab/adduction angles are mapped
directly—subject to the dexterous hand's joint limits—yielding accurate, low-latency replication of both
limb and finger motions.
3.VR-based
The VR-based teleoperation system includes two main components:
1. Upper-body Limb Motion Control
For upper-body limb motion control, the Apple VisionPro is utilized for hand, wrist, and head tracking,
adhering to the OpenXR coordinate system. Wrist and head poses are initially transformed into the robot's
coordinate frame. The wrist offset relative to the head is then converted into an offset relative to the
pelvis. Only the wrist translation data is fed to an IK algorithm based on Pink, which computes all
degrees of freedom except for finger joints.
2. Hand Control
For hand control, to enhance manual dexterity across different teleoperators, the distal phalanx lengths
of
each operator's fingers are measured and scaled proportionally to match the corresponding robotic finger
segments. Subsequently, vector-based optimizers are employed, following the OpenTelevision approach, to
generate robot-hand joint commands within the dexterous-retargeting framework of AnyTeleop.
4.Exoskeleton-based
This exoskeleton-based teleoperation framework creates isomorphic systems customized to replicate a target
humanoid's upper body kinematics, based on HOMIE principles. Servo-driven joints ensure real-time
synchronization of operator and robot movements. Integrated motion-sensing gloves with Hall-effect sensors
provide 15 DoF per-hand tracking. By directly mapping operator kinematics to the humanoid's joints, this
method bypasses inverse kinematics (IK) approximations, thereby eliminating algorithmic errors and
enhancing
operational bandwidth and positional accuracy.
1. Isomorphic Exoskeleton for Unitree G1.
2. Isomorphic Exoskeleton for Fourier GR-1.
3. Isomorphic Exoskeleton for Unitree H1-2.