Taming Data and Transformers for Audio Generation


Moayed Haji-Ali,  Willi Menapace,  Aliaksandr Siarohin,  Guha Balakrishnan,  Sergey Tulyakov,  Vicente Ordonez 

Rice University Logo
Snap Research Logo Snap Research
! Click anywhere on the page, then hover over (or click) the video to listen to examine samples from our proposed dataset AutoReCap .
A loud bang
Clicking and rustling
A crowd of people chanting
A car engine revs and tires squeal as a crowd cheers
A crowd of people chanting and cheering
Some objects are being opened and closed
A man yells followed by a loud bang and a man yelling
Fireworks are going off
Rain falls onto a hard surface
A person breathes heavily
Paper is being crumpled
A dog barks
A gun is fired and a man yells
A motorcycle engine is running
A violin is playing a note
A motorcycle engine revving
A beep followed by a beep
A vehicle engine is idling and then revving up
A power tool drilling
A gun is fired several times
Some objects are crumpled
Multiple dogs howling
Waves crash against a shoreline
Traffic passes by in the distance
A large motor vehicle engine idles and then accelerates
Birds chirp in the distance, followed by a small motor running
A power tool motor is running
Fireworks are going off
A crowd of people are screaming and cheering, and the wind is blowing
Some objects are moved around
Food sizzles in a pan
A man yells and a motor vehicle passes by