WildActor: Unconstrained Identity-Preserving Video Generation

QQin GuoTTianyu YangXXuanhua HeFFei ShenYYong ZhangZZhuoliang KangXXiaoming WeiDDan Xu

Published: February 28, 2026
Authors: 8
Word Count: 6,810

Actor-18M enables production-ready human video generation with consistent identity across viewpoints and motions.

Abstract

Production-ready human video generation requires digital actors to maintain strictly consistent full-body identities across dynamic shots, viewpoints and motions, a setting that remains challenging for existing methods. Prior methods often suffer from face-centric behavior that neglects body-level consistency, or produce copy-paste artifacts where subjects appear rigid due to pose locking. We present Actor-18M, a large-scale human video dataset designed to capture identity consistency under unconstrained viewpoints and environments. Actor-18M comprises 1.6M videos with 18M corresponding human images, covering both arbitrary views and canonical three-view representations. Leveraging Actor-18M, we propose WildActor, a framework for any-view conditioned human video generation. We introduce an Asymmetric Identity-Preserving Attention mechanism coupled with a Viewpoint-Adaptive Monte Carlo Sampling strategy that iteratively re-weights reference conditions by marginal utility for balanced manifold coverage. Evaluated on the proposed Actor-Bench, WildActor consistently preserves body identity under diverse shot compositions, large viewpoint transitions, and substantial motions, surpassing existing methods in these challenging settings.

Key Takeaways

1
WildActor introduces Actor-18M, a 1.6M video dataset with 18M images addressing viewpoint bias in human video generation.
2
The framework uses Asymmetric Identity-Preserving Attention to maintain full-body consistency across dynamic viewpoints and motions.
3
Viewpoint-Adaptive Monte Carlo Sampling re-weights reference images by marginal utility for balanced identity coverage during generation.

Limitations

Existing methods suffer from face-centric behavior neglecting body consistency or rigid pose-locking causing copy-paste artifacts.
Prior datasets rely on expensive studio captures lacking scalability to real-world unconstrained environments and diverse viewpoints.

Keywords

human video generationidentity consistencyviewpoint adaptationasymmetric identity-preserving attentionviewpoint-adaptive monte carlo samplingmanifold coveragebody identity preservation

More in Generative AI

View all

Helios: Real Real-Time Long Video Generation Model

Shenghai Yuan, Yuanyang Yin +4

We introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline. We mak...

Mar 4136

OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens

Yiying Yang, Wei Cheng +6

OmniLottie is a versatile framework that generates high quality vector animations from multi-modal instructions. For flexible motion and visual content control, we focus on Lottie, a light weight JSON...

Mar 2111

Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models

Zengbin Wang, Xuecai Hu +4

Text-to-image (T2I) models have achieved remarkable success in generating high-fidelity images, but they often fail in handling complex spatial relationships, e.g., spatial perception, reasoning, or i...

Jan 28107

VIBE: Visual Instruction Based Editor

Grigorii Alekseenko, Aleksandr Gordeev +8

Instruction-based image editing is among the fastest developing areas in generative AI. Over the past year, the field has reached a new level, with dozens of open-source models released alongside high...

Jan 558

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

Hojung Jung, Rodrigo Hormazabal +6

Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the dis...

Feb 1954

More Generative AI papers