TL;DR - A zero-shot feed-forward method for personalized video generation enabling manipulation and combination of Dynamic Concepts.
Test Time Fine-Tuning requires training a separate LoRA per video, making it slow and non-generalizable. Our Zero-Shot Dynamic Concept [Abdal et al. 2025] Personalization removes this need—capturing identity and motion in a single feed-forward pass, enabling fast, expressive edits like adding smoke or lights.
To enable feed-forward generation, we use a 2×2 Grid Layout where cells represent input and output concepts. This structure supports both editing and composition in a zero-shot manner using a single shared model.
Here these grids are generated using Dynamic Concepts [Abdal et al. 2025].
Our method consists of three key stages:
Multi-DC LoRA learns a shared, token-conditioned representation of multiple dynamic concepts (appearance + motion) across videos.
Grid LoRA is trained on 2×2 grids generated using Multi-DC LoRA to learn layout-aware composition from scratch via attention masking.
Grid-Fill LoRA uses partially-filled grids from Grid LoRA as input to learn inpainting and editing, enabling zero-shot personalization from limited inputs.
*Tap the navigation dots above or swipe left/right on your mobile device to browse the videos.
@misc{abdal2025zeroshotdynamicconceptpersonalization, title={Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA}, author={Rameen Abdal and Or Patashnik and Ekaterina Deyneka and Hao Chen and Aliaksandr Siarohin and Sergey Tulyakov and Daniel Cohen-Or and Kfir Aberman}, year={2025}, eprint={2507.17963}, archivePrefix={arXiv}, primaryClass={cs.GR}, url={https://arxiv.org/abs/2507.17963}, }