Snap Inc.1 University of Trento2 UC Merced3 Fondazione Bruno Kessler4
Work performed while interning at Snap Inc.*
We devise a hierarchical generation strategy to increase video duration and framerate where we adopt the reconstruction guidance method of "Video Diffusion Models" to condition the video generator on previously generated frames. We define a hierarchy of progressively increasing framerates and start by autoregressively generating a video of the desired length at the lowest framerate, at each step using the last generated frame as the conditioning. Subsequently, for each successive framerate in the hierarchy, we autoregressively generate a video of the same length but conditioning the model on all frames that have already been generated at the lower framerates.
We show a selection of 32 frames videos sampled at 12fps.
Hover the cursor on the video to reveal the prompt.