Task Description: We evaluate our method against baselines on the newly released Movie Gen Benchmark that contains AI-generated videos and replaced the audio using the designated baseline methods. Movie Gen license can be found here. Input: video → audio.
Baselines: We compare our method with state-of-the-art approaches: FoleyCrafter, Diff-Foley, and Frieren. We also include Movie Gen released videos that were generated with text prompts. We observe that all baselines lack precise temporal alignment. All videos are downsampled and cropped to 5s for consistency.
Dataset: Movie Gen Benchmark is a recently released benchmark by Meta featuring AI-generated videos. From the 527 released videos, we select those that display distinct temporal actions.
Video ID | Ours | MovieGen w/ prompt | FoleyCrafter | Diff-foley | Frieren |
---|---|---|---|---|---|
320 | |||||
117 | |||||
57 | |||||
16 | |||||
28 | |||||
30 | |||||
39 | |||||
61 | |||||
100 | |||||
240 | |||||
256 | |||||
323 | |||||
324 | |||||
355 | |||||
377 | |||||
438 | |||||
537 |
Task Description: We evaluate our method against baselines on the newly released Movie Gen Benchmark that contains AI-generated videos and replaced the audio using the designated baseline methods. Movie Gen license can be found here. Input: video + audio text prompt → audio.
Baselines: We compare our method with state-of-the-art approaches: Movie Gen A2V, FoleyCrafter, Diff-Foley, and Seeing and Hearing. We observe that all baselines lack precise temporal alignment. All videos are downsampled and cropped to 5s for consistency.
Dataset: Movie Gen Benchmark is a recently released benchmark by Meta featuring AI-generated videos. From the 527 released videos, we select those that display distinct temporal actions.
Video ID | Text Prompt | Ours | MovieGen | FoleyCrafter | Seeing & Hearing |
---|---|---|---|---|---|
115 | electric guitar power chords ringing out loudly and resonating. | ||||
519 | muddy splatter and sucking sounds with each step. | ||||
56 | wheels spinning rapidly and scraping against the concrete floor, and slamming sound as the skateboard lands on the concrete floor. | ||||
32 | footsteps echoing loudly off the walls and floor. | ||||
35 | clip-clop of the horse's hooves hitting the ground. | ||||
83 | bubbles rising to the surface, and school of fish splashing and swimming. | ||||
106 | gurgling of coffee flowing from thermos. | ||||
108 | bright blue button clicking with a crisp, sharp sound. | ||||
113 | drums are hit with loud and intense thuds and crashes. | ||||
155 | footsteps muffled by the carpet. | ||||
164 | footsteps crunching on dirt path. | ||||
247 | water splashing and slapping against the shore. | ||||
279 | water splashing and rippling as the bear moves through the stream. | ||||
299 | Gentle strumming of the guitar with a soaring sound. | ||||
320 | ice cracking with sharp snapping sound, and metal tool scraping against the ice surface. | ||||
334 | snow crunches and compresses under heavy footsteps. | ||||
385 | Broom bristles scratch against the hardwood floor. |