PRIMAL

Physically Reactive and Interactive Motor Model for Avatar Learning

ICCV 2025


Yan Zhang1, Yao Feng1,3, Alpár Cseke1, Nitin Saini1, Nathan Bajandas1, Nicolas Heron1, Michael J. Black2

1 Meshcapade, 2 Max Planck Institute for Intelligent Systems, Tübingen, 3 Stanford University

teaser


We formulate the motor system of an interactive avatar as a generative motion model that can drive the body to move through 3D space in a perpetual, realistic, controllable, and responsive manner. Although human motion generation has been extensively studied, many existing methods lack the responsiveness and realism of real human movements. Inspired by recent advances in foundation models, we propose PRIMAL, which is learned with a two-stage paradigm. In the pretraining stage, the model learns body movements from a large number of sub-second motion segments, providing a generative foundation from which more complex motions are built. This training is fully unsupervised without annotations. Given a single-frame initial state during inference, the pretrained model not only generates unbounded, realistic, and controllable motion, but also enables the avatar to be responsive to induced impulses in real time. In the adaptation phase, we employ a novel ControlNet-like adaptor to fine-tune the base model efficiently, adapting it to new tasks such as few-shot personalized action generation and spatial target reaching. Evaluations show that our proposed method outperforms state-of-the-art baselines. We leverage the model to create a real-time character animation system in Unreal Engine that feels highly responsive and natural.

Demo Videos


Few-shot fine tuning and animation

PRIMAL supports few-shot action personalization. Provided a few seconds of videos, we can fine-tune the base model with the mocapaded results. The motion realism and responsiveness are kepted, and the avatar motion style is personalized. In this demo, we retarget the generated motion to Unitree G1 character in Unreal.

Walk

Run

Poke while limping


Text-to-motion generation in real time

PRIMAL also supports real-time text-to-motion generation. By finetuning the base model with text annotations, we can control the avatar's actions given text prompts in the terminal. The following demo is running on Macbook Pro. To ensure real-time performance on a single Apple M3 Pro chip, we employ a set of suboptimal hyper-parameters. The model's performance is better on a more powerful machine.


Advantages of the 0.5-second atomic action

The key novelty is the formulation that generates 0.5-second motion given a single initial state. This contrasts with prior work that generates a long future motion conditioned on a past motion. Its benefits include reducing overfitting, making model training easier, and making the avatar reactive to impulses and classifier-based guidance.

To better understand the benefits of our formulation, we compare two identical settings except the motion length, where ours generates 15 frames given 1 frame, and baseline generates 40 frames given 20 frames. We replace in-context with cross-attention to handle multi-frame conditioning. Both models are successfully overfit to a ballet sequence with 229 frames, and the ballet motion can be reproduced given the first frame(s).

Ballet motion for training.

ours given the first frame.

baseline given the first 20 frames.

First, we generate 780 future frames given the end frame(s) of that ballet sequence, and use ASR to measure the foot-skating ratio. We find ours produces ballet stably, whereas baseline gradually fails as time progresses.

ours, ASR=0.08.

baseline, ASR=0.12.

Second, we generate 156 frames, with conditions from another walking sequence. We find ours produces fast and natural transitions to ballet, whereas baseline produces severe artifacts.

ours, ASR=0.06.

baseline, ASR=0.3.

These results indicate our setting makes the model more generalizable w.r.t. motion length and semantics.

License and Citation

This software/data/materials and the associated license is for academic research purposes. The software/data/material is also available for commercial licensing through Meshcapade.com. For commercial use please email sales@meshcapade.com. All rights reserved on the videos presented on this page.

   @article{zhang2025primal,
      title={PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning},
      author={Zhang, Yan and Feng, Yao and Cseke, Alp{\'a}r and Saini, Nitin and Bajandas, Nathan and Heron, Nicolas and Black, Michael J},
      journal={ICCV},
      year={2025}
  }