ECCV 2022 Tutorial on

Video Synthesis: Early Days and New Developments

Date: 8:50 - 12:30 am (IST), Octorber 24, 2022. Hall: N&G (80)

The recordings are all available under this link !!


Overview



The introduction of generative adversarial networks in 2014 had a profound impact on video synthesis. Initial works generated videos with plain backgrounds and simple motions. Image synthesis advanced quite rapidly over the years. Multiple works in video synthesis capitalized on this success. Various subfields of video synthesis were introduced: prediction, animation, retargeting, manipulation, and stylization. Many of them led to a number of practical applications, democratizing video editing for non-experienced users and sparking start-ups. With the introduction of language-based models, image-based diffusion and large-scale datasets, video synthesis is seeing substantial improvement, with students, researchers andpractitioners wanting to enter and contribute to the domain. Our tutorial will help them get the necessary knowledge, understand challenges and benchmarks, and choose a promising research direction. For practitioners, our tutorial will provide a detailed overview of the domain. We expect an attendee to have intermediate knowledge of CV & ML.


Organizers


Program

Introduction [Recording]

Sergey Tulyakov 8:50 -
9:00

Advances in Image Synthesis [Recording]

Backbone architectures and frameworks that are necessary for further topics: GANs, diffusion, generative transformers, and quantized representations.
Stéphane Lathuilière 9:00 -
9:40

Unconditional Video Synthesis [Recording]

Early and recent frameworks for synthesizing frames from noise, actions, and images.
Sergey Tulyakov 9:50 -
10:40

Image Animation [Recording] [Recording]

Methods for unsupervised and supervised animation. The former supports a variety of object categories, while the latter requires object-specific prior, such as a morphable face or body model and 2D or 3D keypoints.
Jian Ren,
Aliaksandr Siarohin
10:50 -
11:30

New Trends in Video Synthesis [Recording] [Recording]

Multimodal video synthesis: video synthesis methods conditioned on text, sketches, images, or other modalities. Interactive video synthesis: a recently emerged group of works that enable user interaction while the video is being generated.
Jian Ren,
Aliaksandr Siarohin
11:40 -
12:25

Closing Remarks

Sergey Tulyakov 12:25 -
12:30

About the speakers

Sergey Tulyakov is a Principal Research Scientist at Snap Inc, where he leads the Creative Vision team. His work focuses on creating methods for manipulating the world via computer vision and machine learning. This includes human and object understanding, photorealistic manipulation and animation, video synthesis, prediction and retargeting. He pioneered the unsupervised image animation domain with MonkeyNet and First Order Motion Model that sparked a number of startups in the domain. His work on Interactive Video Stylization received the Best in Show Award at SIGGRAPH Real-Time Live! 2020. He has published 30+ top conference papers, journals and patents resulting in multiple innovative products, including Snapchat Pet Tracking, OurBaby, Real-time Neural Lenses (gender swap, baby face, aging lens, face animation) and many others. Before joining Snap Inc., Sergey was with Carnegie Mellon University, Microsoft, NVIDIA. He holds a PhD degree from the University of Trento, Italy.

Jian Ren is a Research Scientist in the Creative Vision team at Snap Research. He got Ph.D. in Computer Engineering from Rutgers University in 2019. He is interested in image and video generation and manipulation, and efficient neural networks. Before joining Snap Inc, Jian did internships in Adobe, Snap, and Bytedance.

Stéphane Lathuilière is an associate professor (maître de conférence) at Telecom Paris, France, in the multimedia team. Until October 2019, he was a post-doctoral fellow at the University of Trento in the Multimedia and Human Understanding Group, led by Prof. Nicu Sebe and Prof. Elisa Ricci. He received the M.Sc. degree in applied mathematics and computer science from ENSIMAG, Grenoble Institute of Technology (Grenoble INP), France, in 2014. He completed his master thesis at the International Research Institute MICA (Hanoi, Vietnam). He worked towards his Ph.D. in mathematics and computer science in the Perception Team at Inria under the supervision of Dr. Radu Horaud, and obtained it from Université Grenoble Alpes (France) in 2018. His research interests cover machine learning for computer vision problems (eg. domain adaptation, continual learning) and deep models for image and video generation. He published papers in the most prestigious computer vision conferences (CVPR, ICCV, ECCV, NeurIPS) and top journals (T-PAMI).

Aliaksandr Siarohin is a Research Scientist working at Snap Research in the Creative vision team. Previously, he was a Ph.D Student at the University of Trento where he worked under the supervision of Nicu Sebe at the Multimedia and Human Understanding Group (MHUG). His research interests include machine learning for image animation, video generation, generative adversarial networks and domain adaptation. His works have been published in top computer vision and machine learning conferences. He also did internships at Snap Inc. and Google. He was a Snap Research Fellow of 2020.


Please contact Sergey Tulyakov (stulyakov@snap.com) if you have question. The webpage template is by the courtesy of awesome Georgia.