4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation

Drag slider to change view points!

Viewpoint:

Multi-View Video Player Usage:

Viewpoint (slider) — Drag to adjust the viewing angle.
Freeze / Play — Toggle between pausing and resuming playback.
0.5×, 1×, 2× — Select the desired playback speed.
Auto View / Stop Auto — Enable or disable automatic viewpoint rotation.

4Real-Video-V2 is capable of computing a 4D spatio-temporal grid of video frames and 3D Gaussian particles for each time step using a feed-forward architecture. Its architecture has two main components, a 4D video diffusion model and a feedforward reconstruction model.

This represents a major upgrade over4Real-Video, encompassing two primary improvements.. First, we introduce a new 4D video diffusion model architecture that adds no additional parameters to the base video model. The key to the new design is a sparse attention pattern, where tokens attend to others in the same frame, at the same timestamp, or from the same viewpoint. This design makes it easily scalable to large pre-trained video models, efficient to train and offers good generalization. Second, in replace of slower optimization based reconstruction, A feedforward model that jointly recovers camera parameters and Gaussian particles from multi-view videos.

Generate 4D Videos from Text

Explore videos generated by the 4D video diffusion model. Click on a thumbnail to view the corresponding fixed-view and frozen-time video demonstrations.

Fixed View:

Frozen Time:

Animating Real 3D Scenes

Explore videos generated by the 4D video diffusion model. Click on a thumbnail to view the corresponding fixed-view and frozen-time video demonstrations.

Fixed View:

Frozen Time:

Animating 3D Assets

Explore videos generated by the proposed 4D video diffusion model. Click on a thumbnail to view the corresponding fixed-view and frozen-time video demonstrations.

Fixed View:

Frozen Time: