Datasets:
The dataset viewer is not available for this subset.
Need help to make the dataset viewer work? Open a discussion for direct support.
Dataset Card for RAVDESS
Dataset Summary
Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) Speech audio-only files (16bit, 48kHz .wav) from the RAVDESS. Full dataset of speech and song, audio and video (24.8 GB) available from Zenodo. Construction and perceptual validation of the RAVDESS is described in our Open Access paper in PLoS ONE.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
English
Dataset Structure
The dataset repository contains only preprocessing scripts. When loaded and a cached version is not found, the dataset will be automatically downloaded and a .tsv file created with all data instances saved as rows in a table.
Data Instances
[More Information Needed]
Data Fields
- "audio": a datasets.Audio representation of the spoken utterance,
- "text": a datasets.Value string representation of spoken utterance,
- "labels": a datasets.ClassLabel representation of the emotion label,
- "speaker_id": a datasets.Value string representation of the speaker ID,
- "speaker_gender": a datasets.Value string representation of the speaker gender
Data Splits
All data is in the train partition.
Dataset Creation
Curation Rationale
[More Information Needed]
Source Data
Original Data from the Zenodo release of the RAVDESS Dataset:
Files
This portion of the RAVDESS contains 1440 files: 60 trials per actor x 24 actors = 1440. The RAVDESS contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech emotions includes calm, happy, sad, angry, fearful, surprise, and disgust expressions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression.
File naming convention
Each of the 1440 files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 03-01-06-01-02-01-12.wav). These identifiers define the stimulus characteristics:
Filename identifiers
Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
Vocal channel (01 = speech, 02 = song).
Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
Repetition (01 = 1st repetition, 02 = 2nd repetition).
Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).
Filename example: 03-01-06-01-02-01-12.wav
Audio-only (03) Speech (01) Fearful (06) Normal intensity (01) Statement "dogs" (02) 1st Repetition (01) 12th Actor (12) Female, as the actor ID number is even.
Initial Data Collection and Normalization
[More Information Needed]
Who are the source language producers?
[More Information Needed]
Annotations
Annotation process
[More Information Needed]
Who are the annotators?
[More Information Needed]
Personal and Sensitive Information
[More Information Needed]
Considerations for Using the Data
Social Impact of Dataset
[More Information Needed]
Discussion of Biases
[More Information Needed]
Other Known Limitations
[More Information Needed]
Additional Information
Dataset Curators
[More Information Needed]
Licensing Information
(CC BY-NC-SA 4.0)[https://creativecommons.org/licenses/by-nc-sa/4.0/]
Citation Information
How to cite the RAVDESS
Academic citation
If you use the RAVDESS in an academic publication, please use the following citation: Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.
All other attributions
If you use the RAVDESS in a form other than an academic publication, such as in a blog post, school project, or non-commercial product, please use the following attribution: "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)" by Livingstone & Russo is licensed under CC BY-NA-SC 4.0.
Contributions
Thanks to @narad for adding this dataset.
- Downloads last month
- 1,609