CVPR 2024 Tutorial on

3D/4D Generation and Modeling with Generative Priors



Recorded Video

Recored Video will be available here


In the ever-expanding metaverse, where the physical and digital worlds seamlessly merge, the need to capture, represent, and analyze three-dimensional structures is crucial. The advancements in 3D and 4D generation technologies have transformed gaming, augmented reality (AR), and virtual reality (VR), offering unprecedented immersion and interaction. Bridging the gap between reality and virtuality, 3D modeling enables realistic simulations, immersive gaming experiences, and AR overlays. Adding the temporal dimension enhances these experiences, enabling lifelike animations, object tracking, and understanding complex spatiotemporal relationships, reshaping digital interactions in entertainment, education, and beyond.

Traditionally, 3D generation involved directly manipulating 3D data, evolving alongside advancements in 2D generation techniques. Recent breakthroughs in 2D diffusion models have improved 3D generation, leveraging large-scale image datasets to enhance tasks. Methods using 2D priors from diffusion models have emerged, from inpainting-based approaches to techniques like Score Distillation Sampling (SDS), improving the quality and diversity of 3D asset generation. However, scalability and realism limitations remain due to biases in 2D priors and the lack of comprehensive 3D data.

Challenges persist in extending 3D asset generation to scenes and mitigating biases in 2D priors for realistic synthesis in real-world settings. Addressing these issues, our tutorial delves into 3D scene generation, exploring techniques for diverse scene scales, compositionality, and realism. We also cover recent advancements in 3D and 4D reconstruction from images and videos, crucial for applications like augmented reality. Attendees will gain insights into various paradigms of 3D/4D generation, from training on 3D data to leveraging 2D diffusion model knowledge, resulting in a comprehensive understanding of contemporary 3D modeling approaches.

In conclusion, our tutorial provides a comprehensive exploration of 3D/4D generation and modeling, covering fundamental techniques to cutting-edge advancements. By navigating scene-level generation intricacies and leveraging 2D priors for enhanced realism, attendees will emerge equipped with a nuanced understanding of the evolving landscape of 3D modeling in the metaverse era.




Hsin-Ying Lee 08:30 -

3D Generation with 3D data

Introducing conventional ways of training 3D generation models using 3D data, including VAEs, GANs, transformers, and diffusion models.
Hsin-Ying Lee 08:40 -

Bridging 2D and 3D: Inpainting with Depth Estimation and Knowledge Distillation

Introducing two major branches of performing 3D generation with the help of large-scale 2D diffusion models, including adopting 2D priors by inpainting and depth estimators, and distilling knowledge with Score Distillation Sampling (SDS) and its variants.
Peiye Zhuang 09:15 -

3D Scene Generation

Introducing the recent advances and challenges in 3D scene generation.
Hsin-Ying Lee 10:15 -

3D and 4D Reconstruction

Introducing 3D and 4D reconstruction from images and videos, and recent works leveraging generative priors including 2D diffusion models.
Chaoyang Wang 11:10 -

Closing Remarks

Hsin-Ying Lee 11:55 -

About the Speakers

Hsin-Ying Lee is a Senior Research Scientist in the Creative Vision team at Snap Research. His research focuses on content generation, specifically, image/video/3D/4D generation and manipulation. He has published 50+ top conference papers and journals. Hsin-Ying got Ph.D. in the University of California, Merced. Before joining Snap Inc, Hsin-Ying did internships in Google and Nvidia.

Peiye Zhuang is a Research Scientist in the Creative Vision group at Snap Research. Her research focuses on foundation generative models and various content creation applications, including 2D/3D/video generation and editing. Before joining Snap, Peiye received her PhD degree in Computer Science at University of Illinois at Urbana-Champaign (UIUC) in 2023. She also spent time at Stanford University and interned with Apple, Google Brain, Facebook (now Meta), and Adobe.

Chaoyang Wang is a Research Scientist in the Creative Vision group at Snap Research. His research focuses on 3D/4D reconstruction and its application for photo-realistic novel view synthesis and content generation. He got his Ph.D. degree in the Robotics Institute of Carnegie Mellon University. Before joining Snap Inc, Chaoyang did internships in Nvidia, Adobe, Microsoft and Argo AI.

Please contact Hsin-Ying Lee if you have question. The webpage template is by the courtesy of awesome Georgia.