CVPR 2021 Tutorial on

Unlocking Creativity with Computer Vision: Representations for Animation, Stylization and Manipulation

Slides and recorded videos will be provided on this webpage.
Time: TBD


Creativity-the ability to create with the use of imagination and original ideas-requires a command of a diverse set of skills, availability of creative tools, a great deal of effort and most importantly a creative mind. Stylization or manipulation of an object requires an artist to understand the object's structure and factors of variation. Animation further requires the knowledge of rigid and non-rigid motion patterns of the object. Such complicated manipulations can be achieved by computer vision systems in which an appropriate representation is used.

We will walk the attendee through designing and learning representations for building creative tools. Choosing the right representations and building a framework to learn it is often a key to unlocking the creativity. We will look at 2D and volumetric object representations, image and video representations, content, style and motion representations. Some representations can be learned in a supervised fashion, when labelled data is available, otherwise self-supervision can be adopted. Furthermore, we distinguish explicit explainable representations as well as implicit. We show that better representations lead to better understanding of the data, which in turn leads to higher quality of the generated content eventually forming a loop.


Tentative Schedule

30 mins. Preliminaries Stéphane Lathuilière

Representations for controllable image and video synthesis

15 mins. Image synthesis and manipulation Ming-Yu Liu

15 mins. Video synthesis and manipulation Sergey Tulyakov

Object representations for manipulation

15 mins. Manipulating hair Menglei Chai, Kyle Olszewski

15 mins. Representations for modeling human bodies Zeng Huang, Kyle Olszewski

15 mins. Volumetric implicit representations for object manipulation Kyle Olszewski

15 mins. Manipulating objects via GAN-inversion Hsin-Ying Lee

Content and motion representations for animation

15 mins. Supervised and few-shot animation Jian Ren

15 mins. Unsupervised animation of diverse objects Aliaksandr Siarohin, Sergey Tulyakov

Representations for stylization

15 mins. Appearance & geometry stylization for face image Menglei Chai

15 mins. Interactive video stylization Menglei Chai

About the speakers

Stéphane Lathuilière is an associate professor (maître de conférence) at Telecom Paris, France, in the multimedia team. Until October 2019, he was a post-doctoral fellow at the University of Trento in the Multimedia and Human Understanding Group, led by Prof. Nicu Sebe and Prof. Elisa Ricci. He received the M.Sc. degree in applied mathematics and computer science from ENSIMAG, Grenoble Institute of Technology (Grenoble INP), France, in 2014. He completed his master thesis at the International Research Institute MICA (Hanoi, Vietnam). He worked towards his Ph.D. in mathematics and computer science in the Perception Team at Inria under the supervision of Dr. Radu Horaud, and obtained it from Université Grenoble Alpes (France) in 2018. His research interests cover machine learning for computer vision problems (eg. domain adaptation, continual learning) and deep models for image and video generation. He published papers in the most prestigious computer vision conferences (CVPR, ICCV, ECCV, NeurIPS) and top journals (T-PAMI).

Ming-Yu Liu is a Distinguished Research Scientist and Manager at NVIDIA Research. Before joining NVIDIA in 2016, he was a Principal Research Scientist at Mitsubishi Electric Research Labs (MERL). He received his Ph.D. from the Department of Electrical and Computer Engineering at the University of Maryland College Park in 2012. Ming-Yu Liu has won several prestigious awards in his field. He is a recipient of the R&D 100 Award by R&D Magazine in 2014 for his robotic bin picking system. In SIGGRAPH 2019, he won the Best in Show Award and Audience Choice Award in the Real-Time Live track for his GauGAN work. His GauGAN work also won the Best of What's New Award by the Popular Science Magazine in 2019. His research interest is on generative image modeling. His goal is to enable machines' human-like imagination capability.

Sergey Tulyakov is a Lead Research Scientist heading the Creative Vision team at Snap Research. His work focuses on creating methods for manipulating the world via computer vision and machine learning. This includes style transfer, photorealistic object manipulation and animation, video synthesis, prediction, retargeting. His work has been published as 20+ top papers and patents resulting in multiple innovative projects, including Snapchat Pet Tracking, OurBaby Snappable and Real-time Neural Lenses (gender swap, baby face, aging lens) and others. His work on Interactive Video Stylization received the Best in Show award at SIGGRAPH Real Time Live in 2020! Before joining Snap Inc., Sergey was with Carnegie Mellon University, Microsoft, NVIDIA. He holds a PhD degree from the University of Trento, Italy.

Menglei Chai is a Senior Research Scientist in the Creative Vision team at Snap Research. He received his Ph.D. degree from Zhejiang University in 2017, supervised by Professor Kun Zhou. He is doing research in the intersection between Computer Vision and Computer Graphics, majorly on human digitization, image manipulation, 3D reconstruction, and physics-based animation.

Kyle Olszewski is a Research Scientist in the Creative Vision team at Snap Research. His research interests include real-time facial expression tracking for emerging domains such as AR/VR telepresence, intuitive interfaces for interactive photorealistic image synthesis, human body performance and appearance capture, and scene understanding for 3D reconstruction and image manipulation. He has published papers in venues such as Siggraph, Siggraph Asia, CVPR, ICCV and ECCV, and his work on Volumetric Human Teleportation was voted Best in Show at Siggraph 2020 Real-Time Live. He received his Ph.D. from the University of Southern California, at which he worked in the Geometric Capture Lab and Institute for Creative Technologies. Before joining Snap Research, he was a Senior Software Engineer at NVIDIA, and a research intern at Facebook/Oculus, Adobe, and Microsoft Research, and was a recipient of the 2018 Snap Research Fellowship.

Zeng Huang is a Research Scientist in the Creative Vision team at Snap Research. His research efforts focus on 3D reconstruction and human digitizations specifically for consumer-level devices, allowing easy digital content creation for everyone. His work leverages a combination of computer graphics, vision, and machine learning. His work on real-time full body digitization has been awarded ‘Best in Show’ in Siggraph 2020 Real-Time Live!. He holds a Ph.D. degree from University of Southern California.

Hsin-Ying Lee is a Research Scientist in the Creative Vision team at Snap Research. He received his Ph.D degree from the University of California, Merced, in 2020. His work focuses on applying generative models to various content creation tasks, including image-to-image translation, dance modeling, design generation, and image editing. Before joining Snap Inc, Hsin-Ying was an intern at Google Research and Nvidia.

Jian Ren is a Research Scientist in the Creative Vision team at Snap Research. He got Ph.D. in Computer Engineering from Rutgers University in 2019. He is interested in image and video generation and manipulation, and efficient neural networks. Before joining Snap Inc, Jian did internships in Adobe, Snap, and Bytedance.

Aliaksandr Siarohin received the M.Sc. degree in computer science from the University of Trento, Italy in 2017. He is currently a PhD student in the Multimedia and Human Understanding Group at the University of Trento. His primary research focus is domain adaptation, image and video generation and generative adversarial networks.

Please contact Sergey Tulyakov if you have question. The webpage template is by the courtesy of awesome Georgia.