Datasets:

ehcalabres
/

ravdess_speech

Tasks:

Audio Classification

Languages: English

Multilinguality: monolingual

Size Categories: 1K<n<10K

Language Creators: found

Annotations Creators: no-annotation

Source Datasets: original

License: cc-by-nc-sa-4.0

Dataset card Files Files and versions Community

The dataset is currently empty. Upload or create new data files. Then, you will be able to explore them in the Dataset Viewer.

YAML Metadata Warning: The task_ids "speech-emotion-recognition" is not in the official list: acceptability-classification, entity-linking-classification, fact-checking, intent-classification, language-identification, multi-class-classification, multi-label-classification, multi-input-text-classification, natural-language-inference, semantic-similarity-classification, sentiment-classification, topic-classification, semantic-similarity-scoring, sentiment-scoring, sentiment-analysis, hate-speech-detection, text-scoring, named-entity-recognition, part-of-speech, parsing, lemmatization, word-sense-disambiguation, coreference-resolution, extractive-qa, open-domain-qa, closed-domain-qa, news-articles-summarization, news-articles-headline-generation, dialogue-generation, dialogue-modeling, language-modeling, text-simplification, explanation-generation, abstractive-qa, open-domain-abstractive-qa, closed-domain-qa, open-book-qa, closed-book-qa, slot-filling, masked-language-modeling, keyword-spotting, speaker-identification, audio-intent-classification, audio-emotion-recognition, audio-language-identification, multi-label-image-classification, multi-class-image-classification, face-detection, vehicle-detection, instance-segmentation, semantic-segmentation, panoptic-segmentation, image-captioning, grasping, task-planning, tabular-multi-class-classification, tabular-multi-label-classification, tabular-single-column-regression, rdf-to-text, multiple-choice-qa, multiple-choice-coreference-resolution, document-retrieval, utterance-retrieval, entity-linking-retrieval, fact-checking-retrieval, univariate-time-series-forecasting, multivariate-time-series-forecasting, visual-question-answering, document-question-answering

Dataset Card for ravdess_speech

Dataset Summary

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. The conditions of the audio files are: 16bit, 48kHz .wav.

Supported Tasks and Leaderboards

audio-classification: The dataset can be used to train a model for Audio Classification tasks, which consists in predict the latent emotion presented on the audios.

Languages

The audios available in the dataset are in English spoken by actors in a neutral North American accent.

Dataset Structure

Data Instances

[Needs More Information]

Data Fields

[Needs More Information]

Data Splits

[Needs More Information]

Dataset Creation

Curation Rationale

[Needs More Information]

Source Data

Initial Data Collection and Normalization

[Needs More Information]

Who are the source language producers?

[Needs More Information]

Annotations

Annotation process

[Needs More Information]

Who are the annotators?

[Needs More Information]

Personal and Sensitive Information

[Needs More Information]

Considerations for Using the Data

Social Impact of Dataset

[Needs More Information]

Discussion of Biases

[Needs More Information]

Other Known Limitations

[Needs More Information]

Additional Information

Dataset Curators

[Needs More Information]

Licensing Information

The RAVDESS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA 4.0

Commercial licenses for the RAVDESS can also be purchased. For more information, please visit our license fee page, or contact us at [email protected].

Citation Information

Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

Downloads last month: 173

Edit dataset card Evaluate models HF Leaderboard

Models trained or fine-tuned on ehcalabres/ravdess_speech

unaidedelf87777/age_emotion_gender-classifier-for-audio-wav2vec2

Audio Classification • Updated May 9 • 63 • 4