SynPolDex — Synergizing Fingers via Bi-Level Policy Learning

SynPolDex starts from a stricter question than generic dexterous RL: which fingers should lead, which fingers should support, and how should that assignment change across tasks, objects, and robot hands?

SynPolDex teaser showing multi-task and multi-hand dexterous manipulation. — The core idea is to split dexterity into two levels: a finger-use plan at the top, and a synergy-guided control policy at the bottom.

Problem framing

Multi-finger manipulation often fails for two different reasons. First, the same task demands different coordination patterns under different hand embodiments: Allegro, Inspire, and Leap do not share the same kinematics or thumb geometry. Second, standard end-to-end RL tends to smear credit over every joint, even when only a subset of fingers should dominate the action.

SynPolDex responds by making finger roles explicit. The method asks the policy to decide not only how to move, but also how the fingers should divide labor.

Bi-level policy

At the high level, a vision-language model reads the task scene and produces a finger-use plan:

S = \pi_{\mathrm{HL}}(Q_v, \tau, \eta),

where $Q_v$ denotes visual task cues, $\tau$ is task information, and $\eta$ summarizes hand-specific or synergy-related priors. The output plan $S = [s_1, s_2, \dots, s_M]$ specifies which fingers should lead, stabilize, or stay secondary.

At the low level, the controller conditions on that plan:

a_t \sim \pi_{\phi}(a_t \mid o_t, S).

A compact way to summarize the low-level mechanism is:

\tilde r_t = \sum_{k=1}^{K} s_k \, r_t^{(k)}, \qquad \tilde A_t = P(S)\, A_t,

so reward reshaping and advantage assignment concentrate gradient on the fingers that matter for the chosen role allocation. This is the clean part of the design: the high level tells the policy which fingers should be important, and the low level changes credit assignment accordingly.

SynPolDex system figure with high-level planning and low-level policy learning. — The pipeline is explicitly split into multi-turn finger planning and synergy-guided low-level policy learning.

SynPolDex qualitative examples on multiple objects and hands. — The same principle transfers across containers, faucets, buckets, and bottles rather than only a single benchmark object.

Why the decomposition helps

This bi-level split is useful for three separate reasons:

it gives a typed interface between language-level planning and continuous control;
it improves credit assignment by avoiding all-fingers-are-equal updates;
it makes cross-embodiment generalization easier, because the plan can adapt before the controller does.

In other words, SynPolDex is not only a stronger policy. It is a cleaner way to say where symbolic structure ends and where motor learning begins.

Results and generalization

Evaluated on the DexArt-m benchmark across four manipulation tasks:

Task	Hand	Success Rate
Toilet lid	Allegro	90%
Faucet	Allegro	82%
Faucet	Leap	76%
Laptop lid	Allegro	58%
Bucket	Allegro	56%
Toilet lid	Inspire	52%

The broader result from the paper is that SynPolDex outperforms the PPO baseline by 26% in task success and by 2.6x in human-likeness evaluation. That second number matters: the method is not only completing tasks more often, but doing so with finger usage that is easier to interpret.

SynPolDex result panels across different objects and embodiments. — The result panel highlights two kinds of generalization: new object shapes and new poses within the same task family.

Takeaway

The paper makes a narrow but important claim: dexterous manipulation improves when finger roles are treated as explicit latent structure rather than hidden side effects of PPO. That claim is modest enough to test, strong enough to transfer across hands, and concrete enough to be useful in future embodied-agent systems.

Venue

Published at 2025 IEEE-RAS 24th International Conference on Humanoid Robots (Humanoids).

Authors: Yanming Shao, Yan Ding*, Chenxi Xiao*