TL;DR - We propose a new technique for personalizing text-to-video models, enabling them to capture, manipulate and combine Dynamic Concepts.
Unlike Static Concepts , which are defined solely by their appearance, Dynamic Concepts are characterized by both appearance and motion.
We introduce Set and Sequence, a framework that imposes a spatio-temporal weight space in a DiT-based video model, enabling it to capture both the appearance and motion of dynamic concepts.
We decompose a dynamic concept into two components: Set – an unordered collection of frames, representing appearance. Sequence – a temporally coherent video, capturing motion.
Our method consists of two key steps:
LoRA Set Encoding – We train LoRA on the unordered set of frames, focusing solely on learning appearance.
LoRA Sequence Encoding – We freeze the LoRA Basis learned in the first step and augment its coefficients with those from LoRA trained on the temporal sequence, capturing motion dynamics.
*Tap the navigation dots above or swipe left/right on your mobile device to browse the videos.
We thank Gordon Guocheng Qian and Kuan-Chieh (Jackson) Wang for their feedback and support.
@misc{abdal2025dynamic, title={Dynamic Concepts Personalization from Single Videos}, author={Rameen Abdal and Or Patashnik and Ivan Skorokhodov and Willi Menapace and Aliaksandr Siarohin and Sergey Tulyakov and Daniel Cohen-Or and Kfir Aberman}, year={2025}, eprint={2502.14844}, archivePrefix={arXiv}, primaryClass={cs.GR} }