Technical Report Project Page

SynManDex: From Human Pre-Grasps to Executable Dexterous Robot Grasps

A synthetic data-generation pipeline that uses human pre-grasps as semantic seeds for robot-native grasp grounding, trajectory validation, and policy-data construction.

Yanming Shao, Zanxin Chen, Wenwei Lin, Mingjie Zhou, Tianxing Chen, Xiaokang Yang, Yichen Chi*Yao Mu** Corresponding authors.

Shanghai AI Lab · SJTU · Shenzhen University · Fudan University · University of Hong Kong · ZTE Corporation

SynManDex teaser showing human-like bimanual dexterous grasping scenes with XHand.
Project teaser. SynManDex grounds human-functional grasp priors into robot-native bimanual dexterous grasps.

Abstract

Synthetic human priors for robot dexterous grasping.

SynManDex decouples semantic grasp intent from robot execution constraints. An object-conditioned diffusion model samples MANO pre-grasps, GeoRT-calibrated retargeting transfers them to the robot embodiment, and force-closure optimization plus dynamic rollout validation produces demonstrations for downstream policy learning.

312objects across 25 categories
86.4%force-closure after refinement
4.67/5combined human-likeness audit
25/30three-object hardware trials

Method Overview

The pipeline uses human priors as proposals, not as final robot grasps.

The central design is to preserve human-functional intent while letting the target robot embodiment decide physically valid contact. This avoids directly copying MANO contacts that are unreachable, penetrating, or unstable for the robot hand.

The figure below is intentionally large because it is the main map for the project page: every later result section corresponds to one stage of this pipeline.

Overview of the SynManDex pipeline from human-prior synthesis to robot physical grounding and policy learning.
Overview of SynManDex. A diffusion model produces MANO pre-grasp proposals; retargeting and robot-native optimization ground the proposals into valid XHand grasps; trajectory generation turns accepted keyframes into policy demonstrations.
01

Human-prior synthesis

Object-conditioned MANO samples encode approach direction, wrist orientation, and coarse finger coordination.

02

Robot-native grounding

Retargeted seeds are refined with collision, penetration, contact, and force-closure objectives on the target hand.

03

Executable demonstrations

Accepted keyframes are rolled out through approach, pre-grasp, grasp, and lift phases for policy training.

Experiment Studies

Each study asks what the human prior controls.

The experiment section is organized as a sequence of study stories instead of an object gallery. The first two figures ask whether one prior can preserve a functional bimanual intent and whether changing the prior changes a unimanual approach direction.

The later figures separate stability from diversity: bottle grasps test repeated side-prior grounding, and flute grasping tests fine-grained playing-function priors. Baseline failures are placed inside the same figures as the successful SynManDex results, so each story is read as a direct comparison.

Bimanual functional consistency from one human prior.

Camera and binoculars are medium-size bimanual objects where the grasp must preserve task semantics, not merely make contact. Given one photo-taking or viewing prior, SynManDex produces multiple physically grounded XHand samples that keep the two-hand functional arrangement stable.

Two-row comparison for camera and binoculars showing other-method bimanual failures beside SynManDex successful samples.
Functional consistency. Each row pairs an other-method failure with three SynManDex samples for the same object. One camera or binoculars human prior preserves hand roles and object-use semantics while contact details vary.

Unimanual diversity from different approach priors.

Alarm clock and wine glass examples isolate diversity: the object is fixed, but the human-prior direction changes. The resulting grasps approach the same object from distinct sides and orientations while preserving a plausible human-like contact pattern.

Alarm clock and wine glass examples showing five unimanual approach-prior variants per object.
Directional diversity. Five priors per object produce distinct unimanual approach modes for alarm clock and wine glass, showing that the human prior controls approach direction rather than forcing one canonical robot grasp.

Direction stability for bottle side grasps.

The water-bottle study keeps the side-grasp prior fixed and checks whether repeated grounding remains stable. This matters because generic grasp generators often drift toward a gripper-like wrap that closes around the bottle but does not resemble a human side grasp.

Bottle comparison showing a BODex gripper-like wrap beside SynManDex side-prior samples.
Direction stability. The BODex-style result forms a gripper-like wrap, while eight SynManDex samples generated from the same bottle side prior remain side-oriented and human-like.

Fine-grained flute-holding priors preserve functional pose structure.

For flute, simple two-sided holding is not enough: the useful pose must preserve stable support while allowing local finger release. SynManDex uses the human prior as the semantic anchor, then grounds the robot hands without reducing the result to a generic bimanual support grasp.

Flute comparison showing a bimanual holding baseline beside four SynManDex flute-holding release modalities.
Flute-holding release grasps. The baseline holds the flute but does not preserve the task-motivated hand-object configuration. SynManDex preserves structured release modalities: all pressed, left-hand release, right-hand release, and cross-hand release.

Appendix taxonomy for flute fingering variants.

The appendix figure groups the prepared flute variants by release count and hand location. This makes the story explicit: changing a local release state should preserve the stable flute-holding pose while varying which fingers leave the instrument.

Finger-release taxonomy of flute-holding grasp poses grouped by release count and hand location.
Flute release taxonomy preview. The appendix taxonomy is structured around a canonical support grasp, release count, release location, and representative generated poses.

Trajectory Grounding

From optimized keyframes to executable rollouts.

The trajectory figure is kept at full text width because it is the clearest evidence that SynManDex is not only generating static hand poses. Each row shows one object-conditioned rollout with the goal pose followed by a dynamic sequence.

Only trajectories that pass the lift condition enter the imitation dataset, so the figure connects the visual result to the training data used by the policy.

Trajectory grounding examples for piggy bank, rose, duck, cylinder, and donut objects.
Trajectory grounding. Rows show optimized goal poses and rollout phases used to validate each demonstration before it enters the policy dataset.

Generated Data

Diverse object-conditioned demonstrations.

The dataset figure is the bridge between the method and the policy experiments. It shows that the generated demonstrations cover varied object geometry, contact style, and bimanual configurations rather than one repeated grasp family.

The supporting figures are kept full-width as separate stories so they can be inspected at page scale: one for in-grasp manipulation and one for bimanual prehensile grasping.

Gallery of generated bimanual demonstrations across diverse objects.
Generated demonstration gallery. Object classes and bimanual contact configurations vary across the dataset rather than repeating one grasp template.
In-grasp manipulation sequences generated by SynManDex.
In-grasp manipulation. Possession is maintained through object reconfiguration, connecting static grasp synthesis to dynamic control data.
Diverse bimanual prehensile grasps generated by SynManDex.
Bimanual prehensile grasps. Object-specific contact choices appear for tools, cameras, and binoculars, supporting the functional bimanual study above.

Real Robot

Zero-shot hardware validation on a bimanual UR5e-XHand system.

The simulation-trained point-cloud policy is evaluated on the same observation and control interface used during data generation. A trial succeeds only when the system establishes contact, lifts the object, and maintains stable possession through the terminal state.

Real bimanual UR5e and XHand hardware validation platform with calibrated cameras.
Hardware validation setup. The real platform uses two UR5e arms, XHand dexterous hands, and calibrated camera observations.
Real robot rollout strips for vase, apple, and spray bottle trials.
Real-world rollouts. Successful vase, apple, and spray-bottle trials show stable contact, lift, and terminal possession.
Extended real-world keyframes for camera lifting, pick handover put, and pouring trials.
Extended functional trials. Camera lifting uses an additional toy camera outside the three-object benchmark, pick-handover-put aligns real execution with the simulated transfer protocol, and pouring tests stable possession through a tilted terminal state.
Camera lifting. The policy lifts an additional toy camera outside the three-object benchmark.
Pick-handover-put. The real rollout follows the simulated transfer stages from pickup through terminal placement.
Pouring. The grasp remains stable as the object reaches a tilted functional pose.

Resources

Release placeholders for code, tasks, data, and contact.

These entries reserve stable locations for the final release artifacts. They are written as compact research artifacts rather than promotional cards.

Code: Generate

Generation pipeline

Placeholder command for producing human-prior seeds, robot-grounded grasps, and trajectory demonstrations.

python tools/generate_synmandex.py \
  --object examples/mug.obj \
  --hand xhand \
  --num-seeds 240 \
  --out runs/demo_generation
Code: Task Sample

Task specification

Placeholder schema for object-centric manipulation built on validated grasp keyframes.

task:
  object: spray_bottle
  primitive: lift_and_place
  active_hand: right
  support_hand: left
  validation:
    - force_closure
    - ik
    - lift
Dataset Sample

Demonstration record

Placeholder structure for one policy-training example with validation and provenance metadata.

sample_000127/
  object_mesh.obj
  pointcloud.npz
  trajectory.h5
  validation.json
  provenance.json
Email

Project contact

Placeholder contact for dataset access, implementation questions, and release notices.

release:
  code: pending
  dataset: pending
  contact: email

Citation

BibTeX

Use the current draft citation until the arXiv or conference metadata is finalized.

@article{shao2026synmandex,
  title   = {SynManDex: Synthesizing Human-like Dexterous Grasps from Synthetic Human Pre-Grasps},
  author  = {Shao, Yanming and Chen, Zanxin and Lin, Wenwei and Zhou, Mingjie and Chen, Tianxing and Yang, Xiaokang and Chi, Yichen and Mu, Yao},
  journal = {arXiv preprint},
  year    = {2026}
}