Each row contains frames sharing the same timestep, while each column contains frames from the same viewpoint.
Given a casually captured real-world scene, 4Real-Video can transform it into 4D animations driven by text prompts.
 
  
           
           
  
           
         4Real-Video can also animate 3D assets seamlessly across multiple views.
 
           
           
           
          
       Finally, 4Real can create 4D videos directly from text input.
 
       
       
       
       
       
       
       
       
       
      
| 4Real-Video (Ours) | MotionCtrl | SV4D | 
|  |  |  | 
|  |  |  |