Training [full] (2.01 GB) | 70.7M samples / 167 khrs duration / ~36 TB |
Training [10M] (381 MB) | 10.5M samples / 37.0 khrs duration / ~8.0 TB |
Training [2M] (86.5 MB) | 2.4M samples / 7.56 khrs duration / ~1.6 TB |
Validation (803 KB) | 6000 samples / 18.5 hrs duration / ~4.0 GB |
Testing (803 KB) | 6000 samples / 18.5 hrs duration / ~4.0 GB |
The video samples are collected from the publicy available dataset.
Users must follow the related license to use these video samples.
We first collect 3.8M long videos from HD-VILA-100M dataset and split it into 70.8M semantically coherent clips (blue). Next, we utilize a number of teacher models with different multimodal inputs to generate multiple captions for a video clip (green). Lastly, we finetune a fine-grained retrieval model to select the caption that best describes the video clip as the annotation (yellow).
We demo our splitting and captioning algorithm on long videos (scroll to view more). The results are shown in the subtitles:
We show the value of Panda-70M on three downstream tasks. We compare the models training on the existing dataset and the proposed dataset. For a fair comparison, we use the same model architecture, same training configuration, and same amount of training data for all comparisons. For more details:
We sincerely thank to everyone who contributed to the meaningful discussions, and also extend our gratitude to Snap Inc. for providing the computational resources and fostering a conducive research environment. 🤗 🙏 👻
Copyright © Snap Inc. 2024. This dataset is made available by Snap Inc. for informational purposes only. No license, whether implied or otherwise, is granted in or to such dataset (including any rights to copy, modify, publish, distribute and/or commercialize such dataset), unless you have entered into a separate agreement for such rights. Such dataset is provided as-is, without warranty of any kind, express or implied, including any warranties of merchantability, title, fitness for a particular purpose, non-infringement, or that such dataset is free of defects, errors or viruses. In no event will Snap Inc. be liable for any damages or losses of any kind arising from the dataset or your use thereof.