Datasets:
The dataset viewer is not available for this dataset.
Error code: DatasetModuleNotInstalledError Exception: ImportError Message: To be able to use cdminix/libritts-r-aligned, you need to install the following dependencies: alignments, phones, torchaudio. Please install them using 'pip install alignments phones torchaudio' for instance. Traceback: Traceback (most recent call last): File "/src/services/worker/src/worker/job_runners/dataset/config_names.py", line 56, in compute_config_names_response for config in sorted(get_dataset_config_names(path=dataset, use_auth_token=use_auth_token)) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 325, in get_dataset_config_names dataset_module = dataset_module_factory( File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1231, in dataset_module_factory raise e1 from None File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1200, in dataset_module_factory return HubDatasetModuleFactoryWithScript( File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 921, in get_module local_imports = _download_additional_modules( File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 223, in _download_additional_modules raise ImportError( ImportError: To be able to use cdminix/libritts-r-aligned, you need to install the following dependencies: alignments, phones, torchaudio. Please install them using 'pip install alignments phones torchaudio' for instance.
Need help to make the dataset viewer work? Open a discussion for direct support.
This dataset is identical to cdminix/libritts-aligned except it uses the newly released LibriTTS-R corpus. Please cite Y. Koizumi, et al., "LibriTTS-R: Restoration of a Large-Scale Multi-Speaker TTS Corpus", Interspeech 2023
When using this dataset to download LibriTTS-R, make sure you agree to the terms on https://www.openslr.org
Dataset Card for LibriTTS-R with Forced Alignments (and Measures)
This dataset downloads LibriTTS-R and preprocesses it on your machine to create alignments using montreal forced aligner.
You need to run pip install alignments phones
before using this dataset.
When running this the first time, it can take an hour or two, but subsequent runs will be lightning fast.
Requirements
pip install alignments phones
(required)pip install speech-collator
(optional)
Note: version >=0.0.15 of alignments is required for this corpus
Example Item
{
'id': '100_122655_000073_000002.wav',
'speaker': '100',
'text': 'the day after, diana and mary quitted it for distant b.',
'start': 0.0,
'end': 3.6500000953674316,
'phones': ['[SILENCE]', 'ð', 'ʌ', '[SILENCE]', 'd', 'eɪ', '[SILENCE]', 'æ', 'f', 't', 'ɜ˞', '[COMMA]', 'd', 'aɪ', 'æ', 'n', 'ʌ', '[SILENCE]', 'æ', 'n', 'd', '[SILENCE]', 'm', 'ɛ', 'ɹ', 'i', '[SILENCE]', 'k', 'w', 'ɪ', 't', 'ɪ', 'd', '[SILENCE]', 'ɪ', 't', '[SILENCE]', 'f', 'ɜ˞', '[SILENCE]', 'd', 'ɪ', 's', 't', 'ʌ', 'n', 't', '[SILENCE]', 'b', 'i', '[FULL STOP]'],
'phone_durations': [5, 2, 4, 0, 5, 13, 0, 16, 7, 5, 20, 2, 6, 9, 15, 4, 2, 0, 11, 3, 5, 0, 3, 8, 9, 8, 0, 13, 3, 5, 3, 6, 4, 0, 8, 5, 0, 9, 5, 0, 7, 5, 6, 7, 4, 5, 10, 0, 3, 35, 9],
'audio': '/dev/shm/metts/train-clean-360-alignments/100/100_122655_000073_000002.wav'
}
The phones are IPA phones, and the phone durations are in frames (assuming a hop length of 256, sample rate of 22050 and window length of 1024). These attributes can be changed using the hop_length
, sample_rate
and window_length
arguments to LibriTTSAlign
.
Data Collator
This dataset comes with a data collator which can be used to create batches of data for training.
It can be installed using pip install speech-collator
(MiniXC/speech-collator) and can be used as follows:
import json
from datasets import load_dataset
from speech_collator import SpeechCollator
from torch.utils.data import DataLoader
dataset = load_dataset('cdminix/libritts-aligned', split="train")
speaker2ixd = json.load(open("speaker2idx.json"))
phone2ixd = json.load(open("phone2idx.json"))
collator = SpeechCollator(
speaker2ixd=speaker2idx,
phone2ixd=phone2idx ,
)
dataloader = DataLoader(dataset, collate_fn=collator.collate_fn, batch_size=8)
You can either download the speaker2idx.json
and phone2idx.json
files from here or create them yourself using the following code:
import json
from datasets import load_dataset
from speech_collator import SpeechCollator, create_speaker2idx, create_phone2idx
dataset = load_dataset("cdminix/libritts-aligned", split="train")
# Create speaker2idx and phone2idx
speaker2idx = create_speaker2idx(dataset, unk_idx=0)
phone2idx = create_phone2idx(dataset, unk_idx=0)
# save to json
with open("speaker2idx.json", "w") as f:
json.dump(speaker2idx, f)
with open("phone2idx.json", "w") as f:
json.dump(phone2idx, f)
Measures
When using speech-collator
you can also use the measures
argument to specify which measures to use. The following example extracts Pitch and Energy on the fly.
import json
from torch.utils.data import DataLoader
from datasets import load_dataset
from speech_collator import SpeechCollator, create_speaker2idx, create_phone2idx
from speech_collator.measures import PitchMeasure, EnergyMeasure
dataset = load_dataset("cdminix/libritts-aligned", split="train")
speaker2idx = json.load(open("data/speaker2idx.json"))
phone2idx = json.load(open("data/phone2idx.json"))
# Create SpeechCollator
speech_collator = SpeechCollator(
speaker2idx=speaker2idx,
phone2idx=phone2idx,
measures=[PitchMeasure(), EnergyMeasure()],
return_keys=["measures"]
)
# Create DataLoader
dataloader = DataLoader(
dataset,
batch_size=8,
collate_fn=speech_collator.collate_fn,
)
COMING SOON: Detailed documentation on how to use the measures at MiniXC/speech-collator.
Splits
This dataset has the following splits:
train
: All the training data, except one sample per speaker which is used for validation.dev
: The validation data, one sample per speaker.train.clean.100
: Training set derived from the original materials of the train-clean-100 subset of LibriSpeech.train.clean.360
: Training set derived from the original materials of the train-clean-360 subset of LibriSpeech.train.other.500
: Training set derived from the original materials of the train-other-500 subset of LibriSpeech.dev.clean
: Validation set derived from the original materials of the dev-clean subset of LibriSpeech.dev.other
: Validation set derived from the original materials of the dev-other subset of LibriSpeech.test.clean
: Test set derived from the original materials of the test-clean subset of LibriSpeech.test.other
: Test set derived from the original materials of the test-other subset of LibriSpeech.
Environment Variables
There are a few environment variable which can be set.
LIBRITTS_VERBOSE
: If set, will print out more information about the dataset creation process.LIBRITTS_MAX_WORKERS
: The number of workers to use when creating the alignments. Defaults tocpu_count()
.LIBRITTS_PATH
: The path to download LibriTTS to. Defaults to the value ofHF_DATASETS_CACHE
.
Citation
When using LibriTTS-R please cite the following papers:
- LibriTTS-R: Restoration of a Large-Scale Multi-Speaker TTS Corpus
- LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
- Montreal Forced Aligner: Trainable text-speech alignment using Kaldi
When using the Measures please cite the following paper (ours):
- Downloads last month
- 167