DenseDPO

DenseDPO: Fine-Grained Temporal

Preference Optimization for Video Diffusion Models

NeurIPS 2025 (Spotlight)

Ziyi Wu^1,2,3, Anil Kag¹, Ivan Skorokhodov¹, Willi Menapace¹, Ashkan Mirzaei¹,
Igor Gilitschenski^2,3,*, Sergey Tulyakov^1,*, Aliaksandr Siarohin^1,*
¹Snap Research ²University of Toronto ³Vector Institute
^* Equal Supervision

[Paper] [Twitter Thread]

TL;DR:

DenseDPO is a post-training method tailored towards video diffusion models. Compared to vanilla DPO, it improves the paired data construction and the preference label granularity, leading to videos with better visual quality and motion strength using only 1/3 of the data.

Caption Pre-trained Model VanillaDPO DenseDPO (Ours)

A woman doing push-up exercise.

In a studio, a popping dancer creates precise isolation movements.

A panda breakdancing in a neon-lit urban alley.

Method

(a) VanillaDPO compares videos generated from independent random noises and only assigns a single binary preference, biasing the annotators toward artifact-free slow-motion videos.
(b) DenseDPO generates structurally similar videos from partially noised real videos, and label segment-level dense preferences (e.g., every 1s subclip).

Qualitative Results

We show text prompts and generated videos from the pre-trained, VanillaDPO aligned, and our DenseDPO aligned models. For more results and comparison with SFT and StructuralDPO, see here.

Pre-trained model often generates deformed limbs and objects.
VanillaDPO reduces distortions, but with significantly lower dynamics.
Our DenseDPO model generates high-quality videos with correct object dynamics and realistic details.

Caption Pre-trained Model VanillaDPO DenseDPO (Ours)

A young woman dances in the night bustle against the backdrop of a glowing fanfare.

A young adult male doing a handstand on the beach.

A weightlifter performs a deadlift with perfect form in a concrete garage gym.

A man exercising with battle ropes at a gym.

A woman dancing in a gym. The woman is spinning around repeatedly.

A monkey performs a jump on a skateboard at the skate park, landing smoothly.

A giraffe stepping gingerly along a tightrope above a city plaza, drawing gasps from the crowd below.

Fingers press into a shimmering slime ball.

Close-up of a sushi chef slicing sashimi with deliberate, smooth movements.

Water poured into a glass.

A goat balancing on a large circus ball.

A bear wobbling slightly as it rides a bicycle down a forest trail, its paws gripping the seat for balance.

A raccoon rollerblading in a skate park, performing small jumps off the ramps.

BibTeX

If you find our work useful, please consider citing our paper:


@article{wu2025densedpo,
  title={{DenseDPO}: Fine-Grained Temporal Preference Optimization for Video Diffusion Models},
  author={Wu, Ziyi and Kag, Anil and Skorokhodov, Ivan and Menapace, Willi and Mirzaei, Ashkan and Gilitschenski, Igor and Tulyakov, Sergey and Siarohin, Aliaksandr},
  journal={NeurIPS},
  year={2025}
}

References

[1] Wallace, Bram, et al. "Diffusion Model Alignment Using Direct Preference Optimization." CVPR. 2024.
[2] Meng, Chenlin, et al. "SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations." ICLR. 2022.

Caption	Pre-trained Model	VanillaDPO	DenseDPO (Ours)
A woman doing push-up exercise.
In a studio, a popping dancer creates precise isolation movements.
A panda breakdancing in a neon-lit urban alley.

Caption	Pre-trained Model	VanillaDPO	DenseDPO (Ours)
A young woman dances in the night bustle against the backdrop of a glowing fanfare.
A young adult male doing a handstand on the beach.
A weightlifter performs a deadlift with perfect form in a concrete garage gym.
A man exercising with battle ropes at a gym.
A woman dancing in a gym. The woman is spinning around repeatedly.
A monkey performs a jump on a skateboard at the skate park, landing smoothly.
A giraffe stepping gingerly along a tightrope above a city plaza, drawing gasps from the crowd below.
Fingers press into a shimmering slime ball.
Close-up of a sushi chef slicing sashimi with deliberate, smooth movements.
Water poured into a glass.
A goat balancing on a large circus ball.
A bear wobbling slightly as it rides a bicycle down a forest trail, its paws gripping the seat for balance.
A raccoon rollerblading in a skate park, performing small jumps off the ramps.