What models can I use for Video Classification?

The MCG-NJU/videomae-base-finetuned-kineticsand microsoft/xclip-base-patch32 models can be used for Video Classification.

What datasets can I use for Video Classification?

The and kinetics400 dataset can be used for Video Classification.

What metrics can I use for Video Classification?

The accuracy, recall, precision, and f1 metrics can be used for Video Classification.

Tasks

Video Classification

Video classification is the task of assigning a label or class to an entire video. Videos are expected to have only one class for each video. Video classification models take a video as input and return a prediction about which class the video belongs to.

Inputs

Video Classification Model

Output

Playing Guitar

0.514

Playing Tennis

0.193

Cooking

0.068

About Video Classification

Use Cases

Video classification models can be used to categorize what a video is all about.

Activity Recognition

Video classification models are used to perform activity recognition which is useful for fitness applications. Activity recognition is also helpful for vision-impaired individuals especially when they're commuting.

Video Search

Models trained in video classification can improve user experience by organizing and categorizing video galleries on the phone or in the cloud, on multiple keywords or tags.

Inference

Below you can find code for inferring with a pre-trained video classification model.

from transformers import VideoMAEFeatureExtractor, VideoMAEForVideoClassification
from pytorchvideo.transforms import UniformTemporalSubsample
from pytorchvideo.data.encoded_video import EncodedVideo


# Load the video.
video = EncodedVideo.from_path("path_to_video.mp4")
video_data = video.get_clip(start_sec=0, end_sec=4.0)["video"]

# Sub-sample a fixed set of frames and convert them to a NumPy array.
num_frames = 16
subsampler = UniformTemporalSubsample(num_frames)
subsampled_frames = subsampler(video_data)
video_data_np = subsampled_frames.numpy().transpose(1, 2, 3, 0)

# Preprocess the video frames.
inputs = feature_extractor(list(video_data_np), return_tensors="pt")

# Run inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Model predicts one of the 400 Kinetics 400 classes
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label])
# `eating spaghetti` (if you chose this video:
# https://hf.co/datasets/nielsr/video-demo/resolve/main/eating_spaghetti.mp4)

Useful Resources

Creating your own video classifier in minutes

Fine-tuning tutorial notebook (PyTorch)

Video Classification demo

No example widget is defined for this task.

Note Contribute by proposing a widget for this task !

Models for Video Classification

Browse Models (282)

MCG-NJU/videomae-base-finetuned-kinetics

Video Classification • Updated Apr 22 • 229k • 7

Note Strong Video Classification model trained on the Kinects 400 dataset.

microsoft/xclip-base-patch32

Video Classification • Updated Oct 12, 2022 • 10.9k • 34

Note Strong Video Classification model trained on the Kinects 400 dataset.

Datasets for Video Classification

Browse Datasets (35)

No example dataset is defined for this task.

Note Contribute by proposing a dataset for this task !

Spaces using Video Classification

🎦📝

nateraw/lavila

Note An application that classifies video at different timestamps.

📽

fcakyon/video-classification

Note An application that classifies video.

Metrics for Video Classification

accuracy: Accuracy is the proportion of correct predictions among the total number of cases processed. It can be computed with: Accuracy = (TP + TN) / (TP + TN + FP + FN) Where: TP: True positive TN: True negative FP: False positive FN: False negative

recall: Recall is the fraction of the positive examples that were correctly labeled by the model as positive. It can be computed with the equation: Recall = TP / (TP + FN) Where TP is the true positives and FN is the false negatives.

precision: Precision is the fraction of correctly labeled positive examples out of all of the examples that were labeled as positive. It is computed via the equation: Precision = TP / (TP + FP) where TP is the True positives (i.e. the examples correctly labeled as positive) and FP is the False positive examples (i.e. the examples incorrectly labeled as positive).

f1: The F1 score is the harmonic mean of the precision and recall. It can be computed with the equation: F1 = 2 * (precision * recall) / (precision + recall)