Datasets:
The dataset viewer is not available for this split.
Error code: StreamingRowsError Exception: ValueError Message: The HTTP server doesn't appear to support range requests. Only reading this file from the beginning is supported. Open with block_size=0 for a streaming file interface. Traceback: Traceback (most recent call last): File "/src/services/worker/src/worker/utils.py", line 263, in get_rows_or_raise return get_rows( File "/src/services/worker/src/worker/utils.py", line 204, in decorator return func(*args, **kwargs) File "/src/services/worker/src/worker/utils.py", line 241, in get_rows rows_plus_one = list(itertools.islice(ds, rows_max_number + 1)) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1353, in __iter__ for key, example in ex_iterable: File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 207, in __iter__ yield from self.generate_examples_fn(**self.kwargs) File "/tmp/modules-cache/datasets_modules/datasets/LanceaKing--asvspoof2019/31161b6952eafb56f5c3a720eaffa6db1cfe62b7e0810508b8ede9023d38a6d7/asvspoof2019.py", line 131, in _generate_examples with open(metadata_filepath) as f: File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/streaming.py", line 74, in wrapper return function(*args, download_config=download_config, **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 496, in xopen file_obj = fsspec.open(file, mode=mode, *args, **kwargs).open() File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 439, in open return open_files( File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 282, in open_files fs, fs_token, paths = get_fs_token_paths( File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 606, in get_fs_token_paths fs = filesystem(protocol, **inkwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/registry.py", line 261, in filesystem return cls(**storage_options) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 76, in __call__ obj = super().__call__(*args, **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/zip.py", line 59, in __init__ self.zip = zipfile.ZipFile( File "/usr/local/lib/python3.9/zipfile.py", line 1266, in __init__ self._RealGetContents() File "/usr/local/lib/python3.9/zipfile.py", line 1329, in _RealGetContents endrec = _EndRecData(fp) File "/usr/local/lib/python3.9/zipfile.py", line 273, in _EndRecData data = fpin.read() File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 600, in read return super().read(length) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 1748, in read out = self.cache._fetch(self.loc, self.loc + length) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/caching.py", line 380, in _fetch self.cache = self.fetcher(start, bend) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 115, in wrapper return sync(self.loop, func, *args, **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 100, in sync raise return_result File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 55, in _runner result[0] = await coro File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 671, in async_fetch_range raise ValueError( ValueError: The HTTP server doesn't appear to support range requests. Only reading this file from the beginning is supported. Open with block_size=0 for a streaming file interface.
Need help to make the dataset viewer work? Open a discussion for direct support.
Dataset Card for asvspoof2019
Dataset Summary
This is a database used for the Third Automatic Speaker Verification Spoofing and Countermeasuers Challenge, for short, ASVspoof 2019 (http://www.asvspoof.org) organized by Junichi Yamagishi, Massimiliano Todisco, Md Sahidullah, Héctor Delgado, Xin Wang, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Ville Vestman, and Andreas Nautsch in 2019.
Supported Tasks and Leaderboards
[Needs More Information]
Languages
English
Dataset Structure
Data Instances
{'speaker_id': 'LA_0091',
'audio_file_name': 'LA_T_8529430',
'audio': {'path': 'D:/Users/80304531/.cache/huggingface/datasets/downloads/extracted/8cabb6d5c283b0ed94b2219a8d459fea8e972ce098ef14d8e5a97b181f850502/LA/ASVspoof2019_LA_train/flac/LA_T_8529430.flac',
'array': array([-0.00201416, -0.00234985, -0.0022583 , ..., 0.01309204,
0.01339722, 0.01461792], dtype=float32),
'sampling_rate': 16000},
'system_id': 'A01',
'key': 1}
Data Fields
Logical access (LA):
speaker_id
:LA_****
, a 4-digit speaker IDaudio_file_name
: name of the audio fileaudio
: A dictionary containing the path to the downloaded audio file, the decoded audio array, and the sampling rate. Note that when accessing the audio column:dataset[0]["audio"]
the audio file is automatically decoded and resampled todataset.features["audio"].sampling_rate
. Decoding and resampling of a large number of audio files might take a significant amount of time. Thus it is important to first query the sample index before the"audio"
column, i.e.dataset[0]["audio"]
should always be preferred overdataset["audio"][0]
.system_id
: ID of the speech spoofing system (A01 - A19), or, for bonafide speech SYSTEM-ID is left blank ('-')key
: 'bonafide' for genuine speech, or, 'spoof' for spoofing speech
Physical access (PA):
speaker_id
:PA_****
, a 4-digit speaker IDaudio_file_name
: name of the audio fileaudio
: A dictionary containing the path to the downloaded audio file, the decoded audio array, and the sampling rate. Note that when accessing the audio column:dataset[0]["audio"]
the audio file is automatically decoded and resampled todataset.features["audio"].sampling_rate
. Decoding and resampling of a large number of audio files might take a significant amount of time. Thus it is important to first query the sample index before the"audio"
column, i.e.dataset[0]["audio"]
should always be preferred overdataset["audio"][0]
.environment_id
: a triplet (S,R,D_s), which take one letter in the set {a,b,c} as categorical value, defined asa b c S: Room size (square meters) 2-5 5-10 10-20 R: T60 (ms) 50-200 200-600 600-1000 D_s: Talker-to-ASV distance (cm) 10-50 50-100 100-150 attack_id
: a duple (D_a,Q), which take one letter in the set {A,B,C} as categorical value, defined asA B C Z: Attacker-to-talker distance (cm) 10-50 50-100 > 100 Q: Replay device quality perfect high low for bonafide speech,
attack_id
is left blank ('-')key
: 'bonafide' for genuine speech, or, 'spoof' for spoofing speech
Data Splits
Training set | Development set | Evaluation set | |
---|---|---|---|
Bonafide | 2580 | 2548 | 7355 |
Spoof | 22800 | 22296 | 63882 |
Total | 25380 | 24844 | 71237 |
Dataset Creation
Curation Rationale
[Needs More Information]
Source Data
Initial Data Collection and Normalization
[Needs More Information]
Who are the source language producers?
[Needs More Information]
Annotations
Annotation process
[Needs More Information]
Who are the annotators?
[Needs More Information]
Personal and Sensitive Information
[Needs More Information]
Considerations for Using the Data
Social Impact of Dataset
[Needs More Information]
Discussion of Biases
[Needs More Information]
Other Known Limitations
[Needs More Information]
Additional Information
Dataset Curators
[Needs More Information]
Licensing Information
This ASVspoof 2019 dataset is made available under the Open Data Commons Attribution License: http://opendatacommons.org/licenses/by/1.0/
Citation Information
@InProceedings{Todisco2019,
Title = {{ASV}spoof 2019: {F}uture {H}orizons in {S}poofed and {F}ake {A}udio {D}etection},
Author = {Todisco, Massimiliano and
Wang, Xin and
Sahidullah, Md and
Delgado, H ́ector and
Nautsch, Andreas and
Yamagishi, Junichi and
Evans, Nicholas and
Kinnunen, Tomi and
Lee, Kong Aik},
booktitle = {Proc. of Interspeech 2019},
Year = {2019}
}
- Downloads last month
- 144