☨ Work performed while interning at Snap Inc.
Results for the task of predicting all video frames starting from the first frame and using text actions for each frame.