Additional Results:
Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Alper Canberk, Kwot Sin Lee, Vicente Ordonez, Sergey Tulyakov
Since audio can exist in various formats for the same video, we demonstrate our model's ability to control the generated sound using input text prompts. For this, we selected videos from the Movie Gen Benchmarks that feature temporal actions.