Video Classification
Video classification is the task of assigning a label or class to an entire video. Videos are expected to have only one class for each video. Video classification models take a video as input and return a prediction about which class the video belongs to.
About Video Classification
Use Cases
Video classification models can be used to categorize what a video is all about.
Activity Recognition
Video classification models are used to perform activity recognition which is useful for fitness applications. Activity recognition is also helpful for vision-impaired individuals especially when they're commuting.
Video Search
Models trained in video classification can improve user experience by organizing and categorizing video galleries on the phone or in the cloud, on multiple keywords or tags.
Inference
Below you can find code for inferring with a pre-trained video classification model.
from transformers import VideoMAEFeatureExtractor, VideoMAEForVideoClassification
from pytorchvideo.transforms import UniformTemporalSubsample
from pytorchvideo.data.encoded_video import EncodedVideo
# Load the video.
video = EncodedVideo.from_path("path_to_video.mp4")
video_data = video.get_clip(start_sec=0, end_sec=4.0)["video"]
# Sub-sample a fixed set of frames and convert them to a NumPy array.
num_frames = 16
subsampler = UniformTemporalSubsample(num_frames)
subsampled_frames = subsampler(video_data)
video_data_np = subsampled_frames.numpy().transpose(1, 2, 3, 0)
# Preprocess the video frames.
inputs = feature_extractor(list(video_data_np), return_tensors="pt")
# Run inference
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
# Model predicts one of the 400 Kinetics 400 classes
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label])
# `eating spaghetti` (if you chose this video:
# https://hf.co/datasets/nielsr/video-demo/resolve/main/eating_spaghetti.mp4)
Useful Resources
- Developing a simple video classification model
- Video classification with Transformers
- Building a video archive
- Video classification task guide
Creating your own video classifier in minutes
No example widget is defined for this task.
Note Contribute by proposing a widget for this task !
Note Strong Video Classification model trained on the Kinects 400 dataset.
Note Strong Video Classification model trained on the Kinects 400 dataset.
No example dataset is defined for this task.
Note Contribute by proposing a dataset for this task !
Note An application that classifies video at different timestamps.
Note An application that classifies video.
- accuracy
- Accuracy is the proportion of correct predictions among the total number of cases processed. It can be computed with: Accuracy = (TP + TN) / (TP + TN + FP + FN) Where: TP: True positive TN: True negative FP: False positive FN: False negative
- recall
- Recall is the fraction of the positive examples that were correctly labeled by the model as positive. It can be computed with the equation: Recall = TP / (TP + FN) Where TP is the true positives and FN is the false negatives.
- precision
- Precision is the fraction of correctly labeled positive examples out of all of the examples that were labeled as positive. It is computed via the equation: Precision = TP / (TP + FP) where TP is the True positives (i.e. the examples correctly labeled as positive) and FP is the False positive examples (i.e. the examples incorrectly labeled as positive).
- f1
- The F1 score is the harmonic mean of the precision and recall. It can be computed with the equation: F1 = 2 * (precision * recall) / (precision + recall)