Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion Models

ACM Transactions on Graphics

Willi Menapace, Aliaksandr Siarohin, Stéphane Lathuilière, Panos Achlioptas,
Vladislav Golyanik, Sergey Tulyakov, Elisa Ricci

Work performed while interning at Snap Inc.

Overview Dataset Paper GitHub

Camera Manipulation Results

Starting from the first frame of each video, we generate novel views and show corresponding depth.

Minecraft

Novel View

Depth

Tennis

Ball not visible in certain depth maps due to video compression artifacts. Due missing camera translation movement in the Tennis dataset, depth for static elements cannot be reconstructed and is replaced by a planar prior.

Novel View

Depth