VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

1University of Toronto 2Vector Institute 3Snap Inc. 4SFU

arXiv 2024


Camera Input

Reference Trajectory Video

Camera Controlled Generation

A trio of fashionable, beret-clad cats sips coffee at a chic Parisian cafe
An astronaut cooking with a pan and fire in the kitchen
A cat sits at a grand piano, its paws gracefully tapping the keys
A huge dinosaur skeleton is walking in a golden wheat field on a bright sunny day
A man with a skull face in flames walking around Piccadilly circus
Otters with chef hats, skillfully preparing a miniature sushi feast on a lily pad
An astronaut feeding ducks on a sunny afternoon, reflection from the water
Cats engage in a strategic chess match on an ornate board
In a chic urban kitchen, a cat donned in a chef's hat expertly kneads dough on a marble countertop
In a potter's studio, skilled hands mold clay into a delicate sculpture
Melting ice cream dripping down the cone
3 sheep enjoying spaghetti together
A burning volcano
A cat wearing sunglasses and working as a lifeguard at a pool
A cat, dressed in detective attire, examines a magnifying glass over a crime scene made of scattered yarn
A chihuahua in astronaut suit and sunglasses floating in space
A couple enjoys a romantic gondola ride in Venice, Italy
A cute golden hamster throwing punches wearing pair of boxing gloves in a boxing ring
A hamster wearing virtual reality headsets is a dj in a disco
A Corgi dog riding a bike in Times Square. It is wearing sunglasses and a beach hat
A mouse in renaissance clothing eating a cheese slice
A snowman in a Venetian gondola ride
A squirrel eating a burger
Across a sun-soaked desert, two sports cars engage in a high-speed chase

Citation

@article{bahmani2024vd3d,
  author = {Bahmani, Sherwin and Skorokhodov, Ivan and Siarohin, Aliaksandr and Menapace, Willi and Qian, Guocheng and Vasilkovsky, Michael and Lee, Hsin-Ying and Wang, Chaoyang and Zou, Jiaxu and Tagliasacchi, Andrea and Lindell, David B. and Tulyakov, Sergey},
  title = {VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control},
  journal = {arXiv preprint arXiv:2407.12781},
  year = {2024},
}

Website template from DreamFusion and MVDream . We thank the authors for the open-source code.