Datasets:

Bingsu
/

Human_Action_Recognition

Tasks:

Image Classification

Languages: English

Size Categories: 10K<n<100K

Source Datasets: original

License: odbl

Dataset card Files Files and versions Community

Dataset Viewer

Auto-converted to Parquet

Go to dataset viewer

Viewer

image image	labels class label 15 classes
	11 (sitting)
	14 (using_laptop)
	7 (hugging)
	12 (sleeping)
	14 (using_laptop)
	12 (sleeping)
	4 (drinking)
	7 (hugging)
	1 (clapping)
	3 (dancing)
	2 (cycling)
	4 (drinking)
	1 (clapping)
	0 (calling)
	12 (sleeping)
	4 (drinking)
	0 (calling)
	8 (laughing)
	14 (using_laptop)
	14 (using_laptop)
	1 (clapping)
	5 (eating)
	6 (fighting)
	9 (listening_to_music)
	3 (dancing)
	4 (drinking)
	2 (cycling)
	8 (laughing)
	4 (drinking)
	3 (dancing)
	9 (listening_to_music)
	2 (cycling)
	11 (sitting)
	11 (sitting)
	12 (sleeping)
	5 (eating)
	10 (running)
	10 (running)
	7 (hugging)
	7 (hugging)
	0 (calling)
	14 (using_laptop)
	9 (listening_to_music)
	2 (cycling)
	12 (sleeping)
	7 (hugging)
	13 (texting)
	12 (sleeping)
	14 (using_laptop)
	10 (running)
	7 (hugging)
	0 (calling)
	14 (using_laptop)
	4 (drinking)
	7 (hugging)
	2 (cycling)
	8 (laughing)
	0 (calling)
	1 (clapping)
	8 (laughing)
	13 (texting)
	11 (sitting)
	8 (laughing)
	14 (using_laptop)
	12 (sleeping)
	9 (listening_to_music)
	2 (cycling)
	1 (clapping)
	13 (texting)
	13 (texting)
	5 (eating)
	2 (cycling)
	4 (drinking)
	0 (calling)
	10 (running)
	4 (drinking)
	13 (texting)
	11 (sitting)
	0 (calling)
	7 (hugging)
	5 (eating)
	12 (sleeping)
	14 (using_laptop)
	14 (using_laptop)
	10 (running)
	0 (calling)
	14 (using_laptop)
	14 (using_laptop)
	9 (listening_to_music)
	0 (calling)
	3 (dancing)
	8 (laughing)
	2 (cycling)
	1 (clapping)
	14 (using_laptop)
	0 (calling)
	3 (dancing)
	4 (drinking)
	13 (texting)
	6 (fighting)

Dataset Summary

A dataset from kaggle. origin: https://dphi.tech/challenges/data-sprint-76-human-activity-recognition/233/data

Introduction

The dataset features 15 different classes of Human Activities.
The dataset contains about 12k+ labelled images including the validation images.
Each image has only one human activity category and are saved in separate folders of the labelled classes

PROBLEM STATEMENT

Human Action Recognition (HAR) aims to understand human behavior and assign a label to each action. It has a wide range of applications, and therefore has been attracting increasing attention in the field of computer vision. Human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, radar, and WiFi signal, which encode different sources of useful yet distinct information and have various advantages depending on the application scenarios.
Consequently, lots of existing works have attempted to investigate different types of approaches for HAR using various modalities.
Your Task is to build an Image Classification Model using CNN that classifies to which class of activity a human is performing.

About Files

Train - contains all the images that are to be used for training your model. In this folder you will find 15 folders namely - 'calling', ’clapping’, ’cycling’, ’dancing’, ‘drinking’, ‘eating’, ‘fighting’, ‘hugging’, ‘laughing’, ‘listeningtomusic’, ‘running’, ‘sitting’, ‘sleeping’, texting’, ‘using_laptop’ which contain the images of the respective human activities.
Test - contains 5400 images of Human Activities. For these images you are required to make predictions as the respective class names -'calling', ’clapping’, ’cycling’, ’dancing’, ‘drinking’, ‘eating’, ‘fighting’, ‘hugging’, ‘laughing’, ‘listeningtomusic’, ‘running’, ‘sitting’, ‘sleeping’, texting’, ‘using_laptop’.
Testing_set.csv - this is the order of the predictions for each image that is to be submitted on the platform. Make sure the predictions you download are with their image’s filename in the same order as given in this file.
sample_submission: This is a csv file that contains the sample submission for the data sprint.

Data Fields

The data instances have the following fields:

image: A PIL.Image.Image object containing the image. Note that when accessing the image column: dataset[0]["image"] the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the "image" column, i.e. dataset[0]["image"] should always be preferred over dataset["image"][0].
labels: an int classification label. All test data is labeled 0.

Class Label Mappings:

{
    'calling': 0,
    'clapping': 1,
    'cycling': 2,
    'dancing': 3,
    'drinking': 4,
    'eating': 5,
    'fighting': 6,
    'hugging': 7,
    'laughing': 8,
    'listening_to_music': 9,
    'running': 10,
    'sitting': 11,
    'sleeping': 12,
    'texting': 13,
    'using_laptop': 14
}

Data Splits

	train	test
# of examples	12600	5400

Data Size

download: 311.96 MiB
generated: 312.59 MiB
total: 624.55 MiB

>>> from datasets import load_dataset

>>> ds = load_dataset("Bingsu/Human_Action_Recognition")
>>> ds
DatasetDict({
    test: Dataset({
        features: ['image', 'labels'],
        num_rows: 5400
    })
    train: Dataset({
        features: ['image', 'labels'],
        num_rows: 12600
    })
})

>>> ds["train"].features
{'image': Image(decode=True, id=None),
 'labels': ClassLabel(num_classes=15, names=['calling', 'clapping', 'cycling', 'dancing', 'drinking', 'eating', 'fighting', 'hugging', 'laughing', 'listening_to_music', 'running', 'sitting', 'sleeping', 'texting', 'using_laptop'], id=None)}
 
>>> ds["train"][0]
{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=240x160>,
 'labels': 11}