Datasets:

cdminix
/

libritts-r-aligned

Tasks:

Automatic Speech Recognition

Text-to-Speech

Languages: English

Annotations Creators: crowdsourced

ArXiv:

Tags: speech audio automatic-speech-recognition

License: cc-by-4.0

Dataset card Files Files and versions Community

Dataset Viewer

Go to dataset viewer

Viewer

The dataset viewer is not available for this dataset.

The dataset tries to import a module that is not installed.

Error code:   DatasetModuleNotInstalledError
Exception:    ImportError
Message:      To be able to use cdminix/libritts-r-aligned, you need to install the following dependencies: alignments, phones, torchaudio.
Please install them using 'pip install alignments phones torchaudio' for instance.
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/src/worker/job_runners/dataset/config_names.py", line 56, in compute_config_names_response
                  for config in sorted(get_dataset_config_names(path=dataset, use_auth_token=use_auth_token))
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 325, in get_dataset_config_names
                  dataset_module = dataset_module_factory(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1231, in dataset_module_factory
                  raise e1 from None
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1200, in dataset_module_factory
                  return HubDatasetModuleFactoryWithScript(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 921, in get_module
                  local_imports = _download_additional_modules(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 223, in _download_additional_modules
                  raise ImportError(
              ImportError: To be able to use cdminix/libritts-r-aligned, you need to install the following dependencies: alignments, phones, torchaudio.
              Please install them using 'pip install alignments phones torchaudio' for instance.

Need help to make the dataset viewer work? Open a discussion for direct support.

This dataset is identical to cdminix/libritts-aligned except it uses the newly released LibriTTS-R corpus. Please cite Y. Koizumi, et al., "LibriTTS-R: Restoration of a Large-Scale Multi-Speaker TTS Corpus", Interspeech 2023

When using this dataset to download LibriTTS-R, make sure you agree to the terms on https://www.openslr.org

Dataset Card for LibriTTS-R with Forced Alignments (and Measures)

This dataset downloads LibriTTS-R and preprocesses it on your machine to create alignments using montreal forced aligner. You need to run pip install alignments phones before using this dataset. When running this the first time, it can take an hour or two, but subsequent runs will be lightning fast.

Requirements

pip install alignments phones (required)
pip install speech-collator (optional)

Note: version >=0.0.15 of alignments is required for this corpus

Example Item

{
    'id': '100_122655_000073_000002.wav',
    'speaker': '100',
    'text': 'the day after, diana and mary quitted it for distant b.',
    'start': 0.0,
    'end': 3.6500000953674316, 
    'phones': ['[SILENCE]', 'ð', 'ʌ', '[SILENCE]', 'd', 'eɪ', '[SILENCE]', 'æ', 'f', 't', 'ɜ˞', '[COMMA]', 'd', 'aɪ', 'æ', 'n', 'ʌ', '[SILENCE]', 'æ', 'n', 'd', '[SILENCE]', 'm', 'ɛ', 'ɹ', 'i', '[SILENCE]', 'k', 'w', 'ɪ', 't', 'ɪ', 'd', '[SILENCE]', 'ɪ', 't', '[SILENCE]', 'f', 'ɜ˞', '[SILENCE]', 'd', 'ɪ', 's', 't', 'ʌ', 'n', 't', '[SILENCE]', 'b', 'i', '[FULL STOP]'], 
    'phone_durations': [5, 2, 4, 0, 5, 13, 0, 16, 7, 5, 20, 2, 6, 9, 15, 4, 2, 0, 11, 3, 5, 0, 3, 8, 9, 8, 0, 13, 3, 5, 3, 6, 4, 0, 8, 5, 0, 9, 5, 0, 7, 5, 6, 7, 4, 5, 10, 0, 3, 35, 9],
    'audio': '/dev/shm/metts/train-clean-360-alignments/100/100_122655_000073_000002.wav'
}

The phones are IPA phones, and the phone durations are in frames (assuming a hop length of 256, sample rate of 22050 and window length of 1024). These attributes can be changed using the hop_length, sample_rate and window_length arguments to LibriTTSAlign.

Data Collator

This dataset comes with a data collator which can be used to create batches of data for training. It can be installed using pip install speech-collator (MiniXC/speech-collator) and can be used as follows:

import json
from datasets import load_dataset
from speech_collator import SpeechCollator
from torch.utils.data import DataLoader

dataset = load_dataset('cdminix/libritts-aligned', split="train")

speaker2ixd = json.load(open("speaker2idx.json"))
phone2ixd = json.load(open("phone2idx.json"))

collator = SpeechCollator(
    speaker2ixd=speaker2idx,
    phone2ixd=phone2idx ,
)
dataloader = DataLoader(dataset, collate_fn=collator.collate_fn, batch_size=8)

You can either download the speaker2idx.json and phone2idx.json files from here or create them yourself using the following code:

import json
from datasets import load_dataset
from speech_collator import SpeechCollator, create_speaker2idx, create_phone2idx

dataset = load_dataset("cdminix/libritts-aligned", split="train")

# Create speaker2idx and phone2idx
speaker2idx = create_speaker2idx(dataset, unk_idx=0)
phone2idx = create_phone2idx(dataset, unk_idx=0)

# save to json
with open("speaker2idx.json", "w") as f:
    json.dump(speaker2idx, f)
with open("phone2idx.json", "w") as f:
    json.dump(phone2idx, f)

Measures

When using speech-collator you can also use the measures argument to specify which measures to use. The following example extracts Pitch and Energy on the fly.

import json
from torch.utils.data import DataLoader
from datasets import load_dataset
from speech_collator import SpeechCollator, create_speaker2idx, create_phone2idx
from speech_collator.measures import PitchMeasure, EnergyMeasure

dataset = load_dataset("cdminix/libritts-aligned", split="train")

speaker2idx = json.load(open("data/speaker2idx.json"))
phone2idx = json.load(open("data/phone2idx.json"))

# Create SpeechCollator
speech_collator = SpeechCollator(
    speaker2idx=speaker2idx,
    phone2idx=phone2idx,
    measures=[PitchMeasure(), EnergyMeasure()],
    return_keys=["measures"]
)

# Create DataLoader
dataloader = DataLoader(
    dataset,
    batch_size=8,
    collate_fn=speech_collator.collate_fn,
)

COMING SOON: Detailed documentation on how to use the measures at MiniXC/speech-collator.

Splits

This dataset has the following splits:

train: All the training data, except one sample per speaker which is used for validation.
dev: The validation data, one sample per speaker.
train.clean.100: Training set derived from the original materials of the train-clean-100 subset of LibriSpeech.
train.clean.360: Training set derived from the original materials of the train-clean-360 subset of LibriSpeech.
train.other.500: Training set derived from the original materials of the train-other-500 subset of LibriSpeech.
dev.clean: Validation set derived from the original materials of the dev-clean subset of LibriSpeech.
dev.other: Validation set derived from the original materials of the dev-other subset of LibriSpeech.
test.clean: Test set derived from the original materials of the test-clean subset of LibriSpeech.
test.other: Test set derived from the original materials of the test-other subset of LibriSpeech.

Environment Variables

There are a few environment variable which can be set.

LIBRITTS_VERBOSE: If set, will print out more information about the dataset creation process.
LIBRITTS_MAX_WORKERS: The number of workers to use when creating the alignments. Defaults to cpu_count().
LIBRITTS_PATH: The path to download LibriTTS to. Defaults to the value of HF_DATASETS_CACHE.

Citation

When using LibriTTS-R please cite the following papers:

When using the Measures please cite the following paper (ours):

Evaluating and reducing the distance between synthetic and real speech distributions

Downloads last month: 167

Edit dataset card Evaluate models HF Leaderboard