Human-prior synthesis
Object-conditioned MANO samples encode approach direction, wrist orientation, and coarse finger coordination.
Technical Report Project Page
A synthetic data-generation pipeline that uses human pre-grasps as semantic seeds for robot-native grasp grounding, trajectory validation, and policy-data construction.
Shanghai AI Lab · SJTU · Shenzhen University · Fudan University · University of Hong Kong · ZTE Corporation
Abstract
SynManDex decouples semantic grasp intent from robot execution constraints. An object-conditioned diffusion model samples MANO pre-grasps, GeoRT-calibrated retargeting transfers them to the robot embodiment, and force-closure optimization plus dynamic rollout validation produces demonstrations for downstream policy learning.
Method Overview
The central design is to preserve human-functional intent while letting the target robot embodiment decide physically valid contact. This avoids directly copying MANO contacts that are unreachable, penetrating, or unstable for the robot hand.
The figure below is intentionally large because it is the main map for the project page: every later result section corresponds to one stage of this pipeline.
Object-conditioned MANO samples encode approach direction, wrist orientation, and coarse finger coordination.
Retargeted seeds are refined with collision, penetration, contact, and force-closure objectives on the target hand.
Accepted keyframes are rolled out through approach, pre-grasp, grasp, and lift phases for policy training.
Experiment Studies
The experiment section is organized as a sequence of study stories instead of an object gallery. The first two figures ask whether one prior can preserve a functional bimanual intent and whether changing the prior changes a unimanual approach direction.
The later figures separate stability from diversity: bottle grasps test repeated side-prior grounding, and flute grasping tests fine-grained playing-function priors. Baseline failures are placed inside the same figures as the successful SynManDex results, so each story is read as a direct comparison.
Camera and binoculars are medium-size bimanual objects where the grasp must preserve task semantics, not merely make contact. Given one photo-taking or viewing prior, SynManDex produces multiple physically grounded XHand samples that keep the two-hand functional arrangement stable.
Alarm clock and wine glass examples isolate diversity: the object is fixed, but the human-prior direction changes. The resulting grasps approach the same object from distinct sides and orientations while preserving a plausible human-like contact pattern.
The water-bottle study keeps the side-grasp prior fixed and checks whether repeated grounding remains stable. This matters because generic grasp generators often drift toward a gripper-like wrap that closes around the bottle but does not resemble a human side grasp.
For flute, simple two-sided holding is not enough: the useful pose must preserve stable support while allowing local finger release. SynManDex uses the human prior as the semantic anchor, then grounds the robot hands without reducing the result to a generic bimanual support grasp.
The appendix figure groups the prepared flute variants by release count and hand location. This makes the story explicit: changing a local release state should preserve the stable flute-holding pose while varying which fingers leave the instrument.
Trajectory Grounding
The trajectory figure is kept at full text width because it is the clearest evidence that SynManDex is not only generating static hand poses. Each row shows one object-conditioned rollout with the goal pose followed by a dynamic sequence.
Only trajectories that pass the lift condition enter the imitation dataset, so the figure connects the visual result to the training data used by the policy.
Generated Data
The dataset figure is the bridge between the method and the policy experiments. It shows that the generated demonstrations cover varied object geometry, contact style, and bimanual configurations rather than one repeated grasp family.
The supporting figures are kept full-width as separate stories so they can be inspected at page scale: one for in-grasp manipulation and one for bimanual prehensile grasping.
Real Robot
The simulation-trained point-cloud policy is evaluated on the same observation and control interface used during data generation. A trial succeeds only when the system establishes contact, lifts the object, and maintains stable possession through the terminal state.
Resources
These entries reserve stable locations for the final release artifacts. They are written as compact research artifacts rather than promotional cards.
Placeholder command for producing human-prior seeds, robot-grounded grasps, and trajectory demonstrations.
python tools/generate_synmandex.py \ --object examples/mug.obj \ --hand xhand \ --num-seeds 240 \ --out runs/demo_generation
Placeholder schema for object-centric manipulation built on validated grasp keyframes.
task:
object: spray_bottle
primitive: lift_and_place
active_hand: right
support_hand: left
validation:
- force_closure
- ik
- lift
Placeholder structure for one policy-training example with validation and provenance metadata.
sample_000127/ object_mesh.obj pointcloud.npz trajectory.h5 validation.json provenance.json
Placeholder contact for dataset access, implementation questions, and release notices.
synmandex-contact@example.comrelease: code: pending dataset: pending contact: email
Citation
Use the current draft citation until the arXiv or conference metadata is finalized.
@article{shao2026synmandex,
title = {SynManDex: Synthesizing Human-like Dexterous Grasps from Synthetic Human Pre-Grasps},
author = {Shao, Yanming and Chen, Zanxin and Lin, Wenwei and Zhou, Mingjie and Chen, Tianxing and Yang, Xiaokang and Chi, Yichen and Mu, Yao},
journal = {arXiv preprint},
year = {2026}
}