Ziyi Wu1,2,3,
Anil Kag1,
Ivan Skorokhodov1,
Willi Menapace1,
Ashkan Mirzaei1,
Igor Gilitschenski2,3,*,
Sergey Tulyakov1,*,
Aliaksandr Siarohin1,*
1Snap Research
2University of Toronto
3Vector Institute
* Equal Supervision
DenseDPO is a post-training method tailored towards video diffusion models. Compared to vanilla DPO, it improves the paired data construction and the preference label granularity, leading to videos with better visual quality and motion strength using only 1/3 of the data.
Caption | Pre-trained Model | VanillaDPO | DenseDPO (Ours) |
---|---|---|---|
A woman doing push-up exercise. |
|||
In a studio, a popping dancer creates precise isolation movements. |
|||
A panda breakdancing in a neon-lit urban alley. |
(a) VanillaDPO compares videos generated from independent random noises and only assigns a single binary preference, biasing the annotators toward artifact-free slow-motion videos.
(b) DenseDPO generates structurally similar videos from partially noised real videos, and label segment-level dense preferences (e.g., every 1s subclip).
We show text prompts and generated videos from the pre-trained, VanillaDPO aligned, and our DenseDPO aligned models. For more results and comparison with SFT and StructuralDPO, see here.
Caption | Pre-trained Model | VanillaDPO | DenseDPO (Ours) |
---|---|---|---|
A young woman dances in the night bustle against the backdrop of a glowing fanfare. |
|||
A young adult male doing a handstand on the beach. |
|||
A weightlifter performs a deadlift with perfect form in a concrete garage gym. |
|||
A man exercising with battle ropes at a gym. |
|||
A woman dancing in a gym. The woman is spinning around repeatedly. |
|||
A monkey performs a jump on a skateboard at the skate park, landing smoothly. |
|||
A giraffe stepping gingerly along a tightrope above a city plaza, drawing gasps from the crowd below. |
|||
Fingers press into a shimmering slime ball. |
|||
Close-up of a sushi chef slicing sashimi with deliberate, smooth movements. |
|||
Water poured into a glass. |
|||
A goat balancing on a large circus ball. |
|||
A bear wobbling slightly as it rides a bicycle down a forest trail, its paws gripping the seat for balance. |
|||
A raccoon rollerblading in a skate park, performing small jumps off the ramps. |
@article{wu2025densedpo,
title={{DenseDPO}: Fine-Grained Temporal Preference Optimization for Video Diffusion Models},
author={Wu, Ziyi and Kag, Anil and Skorokhodov, Ivan and Menapace, Willi and Mirzaei, Ashkan and Gilitschenski, Igor and Tulyakov, Sergey and Siarohin, Aliaksandr},
journal={arXiv},
year={2025}
}
[1] Wallace, Bram, et al. "Diffusion Model Alignment Using Direct Preference Optimization." CVPR. 2024.
[2] Meng, Chenlin, et al. "SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations." ICLR. 2022.