ResearchAct2Goal

Act2Goal :From World Model to General Goal-conditioned Policy

Jan 1, 2026

System overview of Act2Goal

System overview of Act2Goal

Generalization

Real-world scenarios that are difficult to specify precisely via language and require fine-grained control for successful execution.

In-Domain

  • Write

    The robot writes the English word shown in the goal image on a whiteboard using a marker pen.

  • Dessert

    The robot performs dessert plating with reference to a given goal image.

  • Plug-In

    Following the goal image's guidance, the robot picks up bearing workpieces and inserts them into small holes one by one.

Out of Domain

  • Write

    The robot writes words unseen during training on a whiteboard using a marker pen.

  • Dessert

    The robot performs dessert plating with reference to a given goal image, using a plate and dessert types unseen in the training data.

  • Plug-In

    The robot completes a plug-in task unseen in training: inserting a cylindrical drink bottle into a cup holder.

Real-World Tasks Results

Model/Task Whiteboard Word Writing Dessert Plating Plug-In Operation
ID DP-GC 0.00 0.10 0.00
π0.5-GC 0.23 0.18 0.00
HyperGoalNet 0.00 0.08 0.00
Act2Goal 0.93 0.75 0.45
OOD DP-GC 0.00 0.00 0.00
π0.5-GC 0.20 0.05 0.00
HyperGoalNet 0.00 0.00 0.00
Act2Goal 0.90 0.48 0.30

Simulated Benchmarks

Four tasks from the RoboTwin 2.0 benchmark with both easy (no noise) and hard (with noise) environment settings.

Model/Task Move Can Pick Bottles Place Cup Place Shoe
Easy DP-GC 0.18 0.04 0.03 0.04
π0.5-GC 0.54 0.13 0.16 0.30
HyperGoalNet 0.11 0.08 0.08 0.01
Act2Goal 0.62 0.80 0.64 0.52
Hard DP-GC 0.00 0.00 0.00 0.00
π0.5-GC 0.42 0.06 0.04 0.06
HyperGoalNet 0.00 0.00 0.00 0.00
Act2Goal 0.13 0.43 0.13 0.15

Autonomous Improvement

Rapid improvement of success rate by Online-training.

Real-World Examples

  • Drawing Unseen Pattern 1

    The robot learns to draw a cross (unseen pattern in training) through 27-min of online training.

  • Drawing Unseen Pattern 2

    The robot learns to draw a paper clip (unseen pattern in training) through 20-min of online training.

  • Plug bottle into cup holder

    The robot learns to accurately place beverages into cup holders through 17-min of online training.

Simulated Benchmarks

Online-training effectiveness test in the Robotwin Simulator.
Figure on the left shows success rates improvement for all four challenging scenarios.
Figure in the middle shows success rates improvement for different replay buffer settings.
Video on the right demonstrates the progress of the robot's movements during online simulation learning.

(A: Move Can Pot, B: Pick Bottles, C: Place Cup, D: Place Shoe)

Online training results comparison

Online training results comparison

Ablation Study for Multi-Scale Temporal Hashing

The MSTH-based prediction strategy significantly outperforms the widely-used fixed-horizon action chunking approach in goal-conditioned tasks, particularly for long-horizon scenarios.

Model Short Medium Long
ID w/o MSTH 0.95 0.35 0.10
w/ MSTH 0.95 0.90 0.90
OOD w/o MSTH 0.60 0.20 0.00
w/ MSTH 0.93 0.90 0.88

Goal-conditioned Video generation via MSTH

The video is divided into two parts: a short-horizon proximal segment with densely and uniformly sampled future states for fine-grained dynamics, and a long-horizon distal segment with logarithmically spaced samples from the remaining path to the goal, enabling coarser, goal-directed planning.

Contact Us

If you are interested in collaborating, please reach out. research@agibot.com

We are also hiring. Join Us