Snap Research
Since audio can exist in various formats for the same video, we demonstrate our model's ability to control the generated sound using input text prompts. For this, we selected videos from the Movie Gen Benchmarks that feature temporal actions.