4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

Chaoyang Wang1,* Ashkan Mirzaei1,* Vidit Goel1 Willi Menapace1 Aliaksandr Siarohin1 Avalon Vinella1 Michael Vasilkovsky1 Ivan Skorokhodov1 Vladislav Shakhrai1 Sergey Korolev1 Sergey Tulyakov1 Peter Wonka1,2
1Snap Inc. 2KAUST
Paper

4Real-Video-V2 is capable of computing a 4D spatio-temporal grid of video frames and 3D Gaussian particles for each time step using a feed-forward architecture. Its architecture has two main components, a 4D video diffusion model and a feedforward reconstruction model.

This represents a major upgrade over 4Real-Video, introducing a new 4D video diffusion model architecture that adds no additional parameters to the base video model. The key to the new design is a sparse attention pattern, where tokens attend to others in the same frame, at the same timestamp, or from the same viewpoint. This design makes it easily scalable to large pre-trained video models, efficient to train and offers good generalization.

Generate 4D Videos from Text

Explore videos generated by the 4D video diffusion model. Click on a thumbnail to view the corresponding fixed-view and frozen-time video demonstrations.

Fixed View:

Frozen Time:

Animating Real 3D Scenes

Explore videos generated by the 4D video diffusion model. Click on a thumbnail to view the corresponding fixed-view and frozen-time video demonstrations.

Fixed View:

Frozen Time:

Interactive Renderings of Dynamic Gaussians

Animating 3D Assets

Explore videos generated by the proposed 4D video diffusion model. Click on a thumbnail to view the corresponding fixed-view and frozen-time video demonstrations.

Fixed View:

Frozen Time:

Comparing with Prior Multi-View Video Generation Methods

Comparing with 4Real-Video on 3D Asset Animation

Ours - Fixed View

Ours - Frozen Time

4Real-Video - Fixed View

4Real-Video - Frozen Time

Architecture Comparison on Objaverse

Visual comparison of different architectures on sample Objaverse scenes.

Ours
Parallel
Sequential
SV4D
Ours
Parallel
Sequential
SV4D
Ours
Parallel
Sequential
SV4D
Ours
Parallel
Sequential
SV4D

Acknowledgement

We extend our gratitude to Tuan Duc Ngo, Sherwin Bahmani, Jiahao Luo, Hanwen Liang and Guochen Qian for their valuable assistance with data preparation and model training.