TL;DR - We propose a new technique for personalizing text-to-video models, enabling them to capture, manipulate and combine Dynamic Concepts.

What are Dynamic Concepts?

Unlike Static Concepts , which are defined solely by their appearance, Dynamic Concepts are characterized by both appearance and motion.

Static Concepts

Dynamic Concepts

Set and Sequence

We introduce Set and Sequence, a framework that imposes a spatio-temporal weight space in a DiT-based video model, enabling it to capture both the appearance and motion of dynamic concepts.

We decompose a dynamic concept into two components: Set – an unordered collection of frames, representing appearance. Sequence – a temporally coherent video, capturing motion.

How Does it Work?

Our method consists of two key steps:

LoRA Set Encoding – We train LoRA on the unordered set of frames, focusing solely on learning appearance.
LoRA Sequence Encoding – We freeze the LoRA Basis learned in the first step and augment its coefficients with those from LoRA trained on the temporal sequence, capturing motion dynamics.

Editing and Compostion

↔

*Tap the navigation dots above or swipe left/right on your mobile device to browse the videos.

Input Videos

Compose

+background

+pants,rain

+clothes,mountains

Compose

Compose and Edit

Input Videos

Compose

Compose and Edit

Input Videos

Compose

Input Videos

Compose

Compose and Edit

Input Videos

Compose and Edit

+dog

+snow,white suit,camera

Edit

Compose and Stylize

Experiments

Ablation

Comparison

Acknowledgements

We thank Gordon Guocheng Qian and Kuan-Chieh (Jackson) Wang for their feedback and support.

BibTeX Citation

      @misc{abdal2025dynamic,
        title={Dynamic Concepts Personalization from Single Videos},
        author={Rameen Abdal and Or Patashnik and Ivan Skorokhodov and Willi Menapace and Aliaksandr Siarohin and Sergey Tulyakov and Daniel Cohen-Or and Kfir Aberman},
        year={2025},
        eprint={2502.14844},
        archivePrefix={arXiv},
        primaryClass={cs.GR}
    }