Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion Models

ACM Transactions on Graphics

Willi Menapace, Aliaksandr Siarohin, Stéphane Lathuilière, Panos Achlioptas,
Vladislav Golyanik, Sergey Tulyakov, Elisa Ricci

Work performed while interning at Snap Inc.

Overview Dataset Paper GitHub

Synthesis module comparison to baselines

We evaluate our synthesis model against the Playable Environments baseline (PE) and a version of it retrained on the 1024x576px resolution employed by our model (PE+).

Minecraft

PE

Note the low resolution and inability to correctly model the orientation of players.

PE+

Note lack of players in the video.

Ours small

The full version of our model, trained with a reduced amount of computational resources, matching the one used for the baselines.

Ours

The full version of our model

Tennis

PE

Note the low resolution and texture-sticking checkerboard artifacts during camera movements resulting from the use of CNN upsamplers. 2D UI elements are not eliminated, generating noticeable artifacts.

PE+

Note lack of players in the video.

Ours small

The full version of our model, trained with a reduced amount of computational resources, matching the one used for the baselines.

Ours

The full version of our model