CVPR 2024 Tutorial on

3D/4D Generation and Modeling with Generative Priors

Date: Tuesday, June 18th 8:30 a.m. PDT - noon PDT.

Location: Summit 440-441


Recorded Video


Overview


GTR scenetex 4real

In the ever-expanding metaverse, where the physical and digital worlds seamlessly merge, the need to capture, represent, and analyze three-dimensional structures is crucial. The advancements in 3D and 4D generation technologies have transformed gaming, augmented reality (AR), and virtual reality (VR), offering unprecedented immersion and interaction. Bridging the gap between reality and virtuality, 3D modeling enables realistic simulations, immersive gaming experiences, and AR overlays. Adding the temporal dimension enhances these experiences, enabling lifelike animations, object tracking, and understanding complex spatiotemporal relationships, reshaping digital interactions in entertainment, education, and beyond.

Traditionally, 3D generation involved directly manipulating 3D data and attempt to recover 3D details using 2D data. Recent breakthroughs in 2D diffusion models have significantly improved 3D generation. Methods using 2D priors from diffusion models have emerged, enhancing the quality and diversity of 3D asset generation. These methods range from inpainting-based approaches and optimization-based techniques like Score Distillation Sampling (SDS), to recent feed-forward generation using multi-view images as an auxiliary medium.

On the other hand, challenges persist in extending 3D asset generation to scenes and mitigating biases in 2D priors for realistic synthesis in real-world settings. Addressing these issues, our tutorial delves into 3D scene generation, exploring techniques for diverse scene scales, compositionality, and realism. Finally, we also cover recent advancements in 4D generation using images and videos models as priors, crucial for applications like augmented reality. Attendees will gain insights into various paradigms of 3D/4D generation, from training on 3D data to leveraging 2D diffusion model knowledge, resulting in a comprehensive understanding of contemporary 3D modeling approaches.

In conclusion, our tutorial provides a comprehensive exploration of 3D/4D generation and modeling, covering fundamental techniques to cutting-edge advancements. By navigating scene-level generation intricacies and leveraging 2D priors for enhanced realism, attendees will emerge equipped with a nuanced understanding of the evolving landscape of 3D modeling in the metaverse era.


Organizers


Program

Introduction

Hsin-Ying Lee 08:30 -
08:40
PDF

3D Generation w/o Large-Scale 2D Priors

Introducing conventional ways of training 3D generation models using 2D and 3D data without large-scale image and video diffusion models.
Hsin-Ying Lee 08:40 -
09:00
PDF

Bridging 2D and 3D: From Optimization to Feedforward

Introducing two ways of performing 3D generation with the help of large-scale 2D diffusion models, including optimization-based methods distilling knowledge with Score Distillation Sampling (SDS) and its variants, and feedforward methods with the help of multi-view image generation.
Peiye Zhuang 09:10 -
10:00
PDF

3D Scene Generation

Introducing the recent advances and challenges in 3D scene generation.
Hsin-Ying Lee 10:10 -
10:40
PDF

4D Generation and Reconstruction

Introducing recent advancements on 4D generation as well as generation vis reconstruction.
Chaoyang Wang 10:50 -
11:35
PDF

Closing Remarks

Hsin-Ying Lee 11:35 -
11:45

About the Speakers

Hsin-Ying Lee is a Senior Research Scientist in the Creative Vision team at Snap Research. His research focuses on content generation, specifically, image/video/3D/4D generation and manipulation. He has published 50+ top conference papers and journals. Hsin-Ying got Ph.D. in the University of California, Merced. Before joining Snap Inc, Hsin-Ying did internships in Google and Nvidia.

Peiye Zhuang is a Research Scientist in the Creative Vision group at Snap Research. Her research focuses on foundation generative models and various content creation applications, including 2D/3D/video generation and editing. Before joining Snap, Peiye received her PhD degree in Computer Science at University of Illinois at Urbana-Champaign (UIUC) in 2023. She also spent time at Stanford University and interned with Apple, Google Brain, Facebook (now Meta), and Adobe.

Chaoyang Wang is a Research Scientist in the Creative Vision group at Snap Research. His research focuses on 3D/4D reconstruction and its application for photo-realistic novel view synthesis and content generation. He got his Ph.D. degree in the Robotics Institute of Carnegie Mellon University. Before joining Snap Inc, Chaoyang did internships in Nvidia, Adobe, Microsoft and Argo AI.


Please contact Hsin-Ying Lee if you have question. The webpage template is by the courtesy of awesome Georgia.