Our Exclusive AI Dataset


Technical Documentation for the Text-to-Video Dataset “VidData”

This dataset contains 1006 annotated videos of everyday scenes, used for training and evaluating AI models in video generation and recognition. It is structured to meet the needs of Text-to-Video models and motion analysis.

2. Dataset Specifications

2.1. Generation Criteria

  • Maximum video duration: 10 seconds maximum
  • Video themes:
    • Walking
    • Exercising
    • Writing
    • Shopping
    • Sleeping
    • Meditating
    • Working
    • Studying
    • Driving
    • Washing
    • Gardening
    • Calling
    • Listening
    • Organizing
    • Planning
    • Relaxing
    • Teaching
  • Video size: 512×512 pixels

2.2. Dataset Organization

The dataset follows this structure:

└─ VidData
└─ data
    └─ train
        └─ VidData.csv
└─ video
    └─ ---_iRTHryQ_13_0to241.mp4
    └─ ---agFLYkbY_7_0to303.mp4
    └─ --0ETtekpw0_2_18to486.mp4
└─ readme.md
        
  • data/train/: Contains CSV files with metadata associated with the videos.
  • video/: Contains the video files.

3. Data Structure

The dataset is stored as a CSV file and includes the following columns:

Column Type Description
video string Video file name
caption string Textual description of the video
temporal consistency score float64 Temporal consistency score
fps float64 Frame per second
frame int64 Number of frames in the video
seconds float64 Video duration in seconds
motion score float64 Motion score
camera motion string Type of camera motion (e.g., pan_left)

4. Libraries Used

4.1. Library Examples

Here are some example libraries that can be used when analyzing this data:

  • OpenCV: Video manipulation and processing (reading, writing, frame extraction, contour detection, filtering, etc.).
  • Scikit-Image: Calculating the Structural Similarity Index (SSIM) for image quality evaluation and various image transformations (segmentation, filtering, etc.).
  • NumPy: Efficient manipulation of matrices and arrays, essential for calculations on images and videos.
  • Pandas: Managing and structuring metadata associated with videos (e.g., file names, timestamps, annotations).
  • Matplotlib/Seaborn: Visualizing analysis results as graphs.

4.2. Installing Dependencies

Follow the instructions below to install the required libraries:

  1. Create a requirements.txt file and add the following:
    opencv-python==4.8.1.78 # Video manipulation and processing
    scikit-image==0.22.0 # SSIM calculation and image transformations
    numpy==1.26.2 # Efficient manipulation of matrices and arrays
    pandas==2.1.4 # Managing and structuring metadata
    matplotlib==3.8.2 # Visualizing analysis results
    seaborn==0.12.2 # Advanced visualization with enhanced graphics
                    
  2. Run the command: pip install -r requirements.txt

Note: Only include the libraries you need in requirements.txt.

5. Using the Dataset

5.1. Primary Applications

5.1.1. Text-to-Video Generation

  • Train models to generate video based on textual input.
  • Benchmark performance by comparing generated video against dataset entities.

5.1.2. Video Description Models

  • Evaluate models designed to generate textual descriptions from videos.

5.1.3. Temporal Consistency Analysis

  • Test model for maintaining smoothness and coherence in video generation.

5.2. Example Workflow

Load the dataset using Python:

import pandas as pd

dataset = pd.read_csv('VidData.csv')
print(dataset.head())

# Access video metadata:
video = dataset.iloc[0]  # First entry
print(f"Video Name: {video['video_name']}")
print(f"Caption: {video['Caption']}")
print(f"Duration: {video['duration_seconds']} seconds")

# Filter video based on motion:
high_motion_videos = dataset[dataset['motion_score'] > 1.0]
print(high_motion_videos)
        

6. File Format

The dataset is delivered in CSV format, with each column representing a video and its metadata.

7. Sample Entry

video_name caption temporal_consistency_score fps frames duration_seconds motion_score camera_motion
E_1.mp4 The video shows a soccer player kicking a soccer ball. 0.948826 30 195 6.5 0.826522 1.105807

8. Contact

For inquiries, please contact:

9. Hugging Face

If you want to access the complete dataset, including the videos, click the button below. It will redirect you to our dataset publication on Hugging Face.

Download Dataset
Comprehensive Text-to-Video Dataset for AI Training and Multimedia Applications
Chatbot
Chatbot images
Welcome to Databoost. We are at your disposal for any assistance. How can we help you?

Chatbot images

Info

Databoost, registered in the United States, is an international company with offices and subsidiaries in Madagascar. Through this global structure, we provide superior quality solutions by combining American expertise and local Malagasy talent. We emphasize flexibility, creativity, and efficiency, with a commitment to serving our clients on a global scale while remaining deeply rooted in local realities.

Subscribe to our newsletter