CVPR'24 Tutorial on 3D/4D Generation and Modeling with Generative Priors

Recorded Video

Overview

In the ever-expanding metaverse, where the physical and digital worlds seamlessly merge, the need to capture, represent, and analyze three-dimensional structures is crucial. The advancements in 3D and 4D generation technologies have transformed gaming, augmented reality (AR), and virtual reality (VR), offering unprecedented immersion and interaction. Bridging the gap between reality and virtuality, 3D modeling enables realistic simulations, immersive gaming experiences, and AR overlays. Adding the temporal dimension enhances these experiences, enabling lifelike animations, object tracking, and understanding complex spatiotemporal relationships, reshaping digital interactions in entertainment, education, and beyond.

Traditionally, 3D generation involved directly manipulating 3D data and attempt to recover 3D details using 2D data. Recent breakthroughs in 2D diffusion models have significantly improved 3D generation. Methods using 2D priors from diffusion models have emerged, enhancing the quality and diversity of 3D asset generation. These methods range from inpainting-based approaches and optimization-based techniques like Score Distillation Sampling (SDS), to recent feed-forward generation using multi-view images as an auxiliary medium.

On the other hand, challenges persist in extending 3D asset generation to scenes and mitigating biases in 2D priors for realistic synthesis in real-world settings. Addressing these issues, our tutorial delves into 3D scene generation, exploring techniques for diverse scene scales, compositionality, and realism. Finally, we also cover recent advancements in 4D generation using images and videos models as priors, crucial for applications like augmented reality. Attendees will gain insights into various paradigms of 3D/4D generation, from training on 3D data to leveraging 2D diffusion model knowledge, resulting in a comprehensive understanding of contemporary 3D modeling approaches.

In conclusion, our tutorial provides a comprehensive exploration of 3D/4D generation and modeling, covering fundamental techniques to cutting-edge advancements. By navigating scene-level generation intricacies and leveraging 2D priors for enhanced realism, attendees will emerge equipped with a nuanced understanding of the evolving landscape of 3D modeling in the metaverse era.

Organizers

Hsin-Ying Lee
Creative Vision, Snap Research

Peiye Zhuang
Creative Vision, Snap Research

Chaoyang Wang
Creative Vision, Snap Research

Program

Introduction	Hsin-Ying Lee	08:30 - 08:40	PDF
3D Generation w/o Large-Scale 2D Priors Introducing conventional ways of training 3D generation models using 2D and 3D data without large-scale image and video diffusion models.	Hsin-Ying Lee	08:40 - 09:00	PDF
Bridging 2D and 3D: From Optimization to Feedforward Introducing two ways of performing 3D generation with the help of large-scale 2D diffusion models, including optimization-based methods distilling knowledge with Score Distillation Sampling (SDS) and its variants, and feedforward methods with the help of multi-view image generation.	Peiye Zhuang	09:10 - 10:00	PDF
3D Scene Generation Introducing the recent advances and challenges in 3D scene generation.	Hsin-Ying Lee	10:10 - 10:40	PDF
4D Generation and Reconstruction Introducing recent advancements on 4D generation as well as generation vis reconstruction.	Chaoyang Wang	10:50 - 11:35	PDF
Closing Remarks	Hsin-Ying Lee	11:35 - 11:45

About the Speakers

Hsin-Ying Lee is a Senior Research Scientist in the Creative Vision team at Snap Research. His research focuses on content generation, specifically, image/video/3D/4D generation and manipulation. He has published 50+ top conference papers and journals. Hsin-Ying got Ph.D. in the University of California, Merced. Before joining Snap Inc, Hsin-Ying did internships in Google and Nvidia.

Peiye Zhuang is a Research Scientist in the Creative Vision group at Snap Research. Her research focuses on foundation generative models and various content creation applications, including 2D/3D/video generation and editing. Before joining Snap, Peiye received her PhD degree in Computer Science at University of Illinois at Urbana-Champaign (UIUC) in 2023. She also spent time at Stanford University and interned with Apple, Google Brain, Facebook (now Meta), and Adobe.

Chaoyang Wang is a Research Scientist in the Creative Vision group at Snap Research. His research focuses on 3D/4D reconstruction and its application for photo-realistic novel view synthesis and content generation. He got his Ph.D. degree in the Robotics Institute of Carnegie Mellon University. Before joining Snap Inc, Chaoyang did internships in Nvidia, Adobe, Microsoft and Argo AI.

Please contact Hsin-Ying Lee if you have question. The webpage template is by the courtesy of awesome Georgia.

CVPR 2024 Tutorial on

Date: Tuesday, June 18th 8:30 a.m. PDT - noon PDT.

Location: Summit 440-441

Recorded Video

Overview

Organizers

Program

About the Speakers