Datasets:
id
string
| path
string
| audio
audio
| transcription
string
| duration
float32
0.14
15
| language
string
| original_speaker_id
int64
1
26
| session_id
int64
1
4
| topic
string
|
---|---|---|---|---|---|---|---|---|
"00000" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L2_0.560_1.560.wav" | "我刚刚开始record" | 1.56 | "mixed" | 1 | 1 | "persona" |
|
"00001" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L4_2.440_4.160.wav" | "嗯hello我的名字叫徐妍" | 4.16 | "mixed" | 1 | 1 | "persona" |
|
"00002" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L6_6.720_3.320.wav" | "嗯初次见面nice to meet you嗯" | 3.32 | "mixed" | 1 | 1 | "persona" |
|
"00003" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L8_10.240_5.700.wav" | "今天呢我非常希望能够通过这个机会去跟你make friends" | 5.7 | "mixed" | 1 | 1 | "persona" |
|
"00004" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L10_16.020_2.020.wav" | "嗯你知道就是" | 2.02 | "zh" | 1 | 1 | "persona" |
|
"00005" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L12_18.080_7.680.wav" | "我们平时能够遇见其他stranger的机会其实不是很多所以其实这样的一个机会我还是觉得" | 7.68 | "mixed" | 1 | 1 | "persona" |
|
"00006" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L14_25.880_1.280.wav" | "很honour的" | 1.28 | "mixed" | 1 | 1 | "persona" |
|
"00007" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L16_27.770_4.890.wav" | "对然后嗯我是来自中国北方的一个小城市" | 4.89 | "zh" | 1 | 1 | "persona" |
|
"00008" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L20_34.200_4.120.wav" | "we have the sea shore in the city and we have a lot of" | 4.12 | "en" | 1 | 1 | "persona" |
|
"00009" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L22_38.900_1.480.wav" | "delicious sea food" | 1.48 | "en" | 1 | 1 | "persona" |
|
"00010" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L24_40.680_7.140.wav" | "嗯我不知道好像我不太确定你家是不是也是来自一个similar city所以" | 7.14 | "mixed" | 1 | 1 | "persona" |
|
"00011" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L26_47.920_9.440.wav" | "嗯接下来我们也可以再接下来我们可以讨论一下吃海鲜啊also sea food also the sea shore also the sunlight something like that" | 9.44 | "mixed" | 1 | 1 | "persona" |
|
"00012" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L28_57.380_1.200.wav" | "嗯" | 1.2 | "zh" | 1 | 1 | "persona" |
|
"00013" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L30_58.600_2.680.wav" | "我的hobby是读书" | 2.68 | "mixed" | 1 | 1 | "persona" |
|
"00014" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L32_61.320_2.120.wav" | "um watch some movies" | 2.12 | "en" | 1 | 1 | "persona" |
|
"00015" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L34_63.460_4.740.wav" | "and我也很喜欢哈outdoor那些运动" | 4.74 | "mixed" | 1 | 1 | "persona" |
|
"00016" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L36_68.220_2.320.wav" | "比如说go hiking啊之类的" | 2.32 | "mixed" | 1 | 1 | "persona" |
|
"00017" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L38_70.920_2.640.wav" | "嗯所以啊" | 2.64 | "zh" | 1 | 1 | "persona" |
|
"00018" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L40_73.580_2.050.wav" | "what about your what about you" | 2.05 | "en" | 1 | 1 | "persona" |
|
"00019" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L42_86.8090_1.6600.wav" | "ok嗯" | 1.66 | "mixed" | 2 | 1 | "persona" |
|
"00020" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L44_88.4890_2.7600.wav" | "南方其实也能吃海鲜啊其实我" | 2.76 | "zh" | 2 | 1 | "persona" |
|
"00021" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L46_91.5290_3.0600.wav" | "小时候当然吃的不多但还是吃得到" | 3.06 | "zh" | 2 | 1 | "persona" |
|
"00022" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L48_94.6690_2.7200.wav" | "吃到那种大螃蟹又超超级好吃" | 2.72 | "zh" | 2 | 1 | "persona" |
|
"00023" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L50_97.4090_0.9600.wav" | "I really like" | 0.96 | "en" | 2 | 1 | "persona" |
|
"00024" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L52_98.6490_3.4400.wav" | "so um about my hobby I" | 3.44 | "en" | 2 | 1 | "persona" |
|
"00025" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L54_102.1090_4.7800.wav" | "i like to do some sports with my friends and" | 4.78 | "en" | 2 | 1 | "persona" |
|
"00026" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L56_108.1890_4.4800.wav" | "我有时候也会比较喜欢和朋友一起打游戏一起这种之类的事情" | 4.48 | "zh" | 2 | 1 | "persona" |
|
"00027" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L58_112.7690_4.8600.wav" | "然后哦忘了告诉你哦我刚已经告诉你了我是来自南南方" | 4.86 | "zh" | 2 | 1 | "persona" |
|
"00028" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L60_117.7090_1.5600.wav" | "I come from south" | 1.56 | "en" | 2 | 1 | "persona" |
|
"00029" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L62_119.3290_2.5700.wav" | "uh is it funny no" | 2.57 | "en" | 2 | 1 | "persona" |
|
"00030" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L64_121.9690_2.0600.wav" | "um so um" | 2.06 | "en" | 2 | 1 | "persona" |
|
"00031" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L66_125.7890_4.3000.wav" | "呃我我有我的其他的关于我其他的一些hobby啊" | 4.3 | "mixed" | 2 | 1 | "persona" |
|
"00032" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L68_130.3890_1.1600.wav" | "i like reading" | 1.16 | "en" | 2 | 1 | "persona" |
|
"00033" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L70_131.5690_6.6200.wav" | "a bit but depends um for some for some books very interesting books i can really like" | 6.62 | "en" | 2 | 1 | "persona" |
|
"00034" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L72_138.9290_3.0800.wav" | "呃花好长的时间大概" | 3.08 | "zh" | 2 | 1 | "persona" |
|
"00035" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L74_142.0290_1.8400.wav" | "连续会看三四个小时" | 1.84 | "zh" | 2 | 1 | "persona" |
|
"00036" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L76_144.1890_0.7400.wav" | "嗯" | 0.74 | "zh" | 2 | 1 | "persona" |
|
"00037" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L78_144.9490_1.8400.wav" | "but but depends" | 1.84 | "en" | 2 | 1 | "persona" |
|
"00038" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L80_147.1090_3.5600.wav" | "i like i like武侠小小说超超级喜欢" | 3.56 | "mixed" | 2 | 1 | "persona" |
|
"00039" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L82_150.9090_4.6000.wav" | "um but when i very busy maybe i i" | 4.6 | "en" | 2 | 1 | "persona" |
|
"00040" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L84_155.6690_1.7900.wav" | "will not spend too much time on this" | 1.79 | "en" | 2 | 1 | "persona" |
|
"00041" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L86_157.8490_5.3800.wav" | "i remember呃在我本科的时候我花过挺长的时间" | 5.38 | "mixed" | 2 | 1 | "persona" |
|
"00042" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L88_163.3090_2.8800.wav" | "去看完了好几本金庸的小说" | 2.88 | "zh" | 2 | 1 | "persona" |
|
"00043" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L90_167.3290_1.0000.wav" | "嗯" | 1 | "zh" | 2 | 1 | "persona" |
|
"00044" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L92_169.1090_2.0600.wav" | "关于其他的事情嗯" | 2.06 | "zh" | 2 | 1 | "persona" |
|
"00045" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L94_173.4490_0.9000.wav" | "哦对" | 0.9 | "zh" | 2 | 1 | "persona" |
|
"00046" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L96_175.5290_3.4600.wav" | "嗯你可以你你可能有相相似的爱好" | 3.46 | "zh" | 2 | 1 | "persona" |
|
"00047" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L98_170.480_3.340.wav" | "嗯对啊对啊我也很喜欢reading books" | 3.34 | "mixed" | 1 | 1 | "persona" |
|
"00048" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L100_173.900_3.420.wav" | "呃当然我本来刚刚想问你喜欢" | 3.42 | "zh" | 1 | 1 | "persona" |
|
"00049" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L102_177.340_4.920.wav" | "看谁的呃一些novel所以你也跟我说了你喜欢看金庸的嘛" | 4.92 | "mixed" | 1 | 1 | "persona" |
|
"00050" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L104_182.320_1.020.wav" | "但是" | 1.02 | "zh" | 1 | 1 | "persona" |
|
"00051" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L106_183.360_5.080.wav" | "嗯actually i read i didn't read a lot of books about jin yong" | 5.08 | "mixed" | 1 | 1 | "persona" |
|
"00052" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L108_188.460_4.860.wav" | "because嗯其实金庸他呃相对来说" | 4.86 | "mixed" | 1 | 1 | "persona" |
|
"00053" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L110_193.380_2.140.wav" | "嗯比较old" | 2.14 | "mixed" | 1 | 1 | "persona" |
|
"00054" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L112_195.620_0.800.wav" | "对" | 0.8 | "zh" | 1 | 1 | "persona" |
|
"00055" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L114_196.440_3.000.wav" | "然后对于我来说嘛女孩子" | 3 | "zh" | 1 | 1 | "persona" |
|
"00056" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L116_199.460_2.220.wav" | "for girls we don't like that" | 2.22 | "en" | 1 | 1 | "persona" |
|
"00057" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L118_201.700_3.360.wav" | "cruel thing like武打呀" | 3.36 | "mixed" | 1 | 1 | "persona" |
|
"00058" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L120_205.080_3.760.wav" | "like呃江湖啊" | 3.76 | "mixed" | 1 | 1 | "persona" |
|
"00059" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L122_209.120_5.020.wav" | "呃something like that所以我基本上我看到小的时候看的都是一种" | 5.02 | "mixed" | 1 | 1 | "persona" |
|
"00060" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L124_214.480_1.420.wav" | "爱情类小说" | 1.42 | "zh" | 1 | 1 | "persona" |
|
"00061" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L126_215.920_2.420.wav" | "love stories或者是" | 2.42 | "mixed" | 1 | 1 | "persona" |
|
"00062" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L128_218.600_1.780.wav" | "一些玄幻类的" | 1.78 | "zh" | 1 | 1 | "persona" |
|
"00063" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L130_220.600_3.040.wav" | "呃比如说关于一些神啊关于god" | 3.04 | "mixed" | 1 | 1 | "persona" |
|
"00064" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L132_223.660_4.920.wav" | "关于一些其他的奇奇怪怪的religions some something like that" | 4.92 | "mixed" | 1 | 1 | "persona" |
|
"00065" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L134_228.600_2.900.wav" | "嗯所以我们还是挺不一样的" | 2.9 | "zh" | 1 | 1 | "persona" |
|
"00066" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L136_242.8290_1.7000.wav" | "哦我可以问一下你" | 1.7 | "zh" | 2 | 1 | "persona" |
|
"00067" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L138_244.6690_3.0000.wav" | "具体哪一些类型嘛可能我也会喜欢像" | 3 | "zh" | 2 | 1 | "persona" |
|
"00068" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L140_247.6890_3.3600.wav" | "像一些神话类的有一些小说也蛮有意思" | 3.36 | "zh" | 2 | 1 | "persona" |
|
"00069" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L142_241.820_5.120.wav" | "嗯说实话我读的最神话的可能就是北欧神话" | 5.12 | "zh" | 1 | 1 | "persona" |
|
"00070" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L144_247.400_3.920.wav" | "对但是这个是我很很很很久之后才读的但是" | 3.92 | "zh" | 1 | 1 | "persona" |
|
"00071" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L146_251.380_2.740.wav" | "嗯关于玄幻类的比如说three body" | 2.74 | "mixed" | 1 | 1 | "persona" |
|
"00072" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L148_254.140_2.460.wav" | "in Chinese is 三体" | 2.46 | "mixed" | 1 | 1 | "persona" |
|
"00073" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L150_256.620_1.600.wav" | "啊是" | 1.6 | "zh" | 1 | 1 | "persona" |
|
"00074" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L152_258.540_1.640.wav" | "刘慈欣的作品" | 1.64 | "zh" | 1 | 1 | "persona" |
|
"00075" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L154_260.200_3.600.wav" | "对我非常非常的喜欢他关于一些比如说" | 3.6 | "zh" | 1 | 1 | "persona" |
|
"00076" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L156_263.820_3.520.wav" | "嗯black forest theory" | 3.52 | "mixed" | 1 | 1 | "persona" |
|
"00077" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L158_267.460_4.120.wav" | "something like that i really think is oh creative" | 4.12 | "en" | 1 | 1 | "persona" |
|
"00078" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L160_271.820_4.020.wav" | "他真的太我就觉得我觉得他真的太有想象力了" | 4.02 | "zh" | 1 | 1 | "persona" |
|
"00079" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L162_275.860_2.580.wav" | "他的imagination is so good" | 2.58 | "mixed" | 1 | 1 | "persona" |
|
"00080" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L164_289.3090_4.2800.wav" | "哦我也看过三体小说的一个情节" | 4.28 | "zh" | 2 | 1 | "persona" |
|
"00081" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L166_293.9090_2.9400.wav" | "但我现在已经忘记掉就是我知道它是一本" | 2.94 | "zh" | 2 | 1 | "persona" |
|
"00082" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L168_296.8690_2.9400.wav" | "very very interesting and very" | 2.94 | "en" | 2 | 1 | "persona" |
|
"00083" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L170_300.0490_1.3400.wav" | "appearing book" | 1.34 | "en" | 2 | 1 | "persona" |
|
"00084" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L172_302.1390_3.6300.wav" | "嗯我当时没有看他的原因好像是" | 3.63 | "zh" | 2 | 1 | "persona" |
|
"00085" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L174_306.1090_2.2800.wav" | "当时有一段时间还挺忙的" | 2.28 | "zh" | 2 | 1 | "persona" |
|
"00086" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L176_308.5690_3.6400.wav" | "哦对然后关于小说的话" | 3.64 | "zh" | 2 | 1 | "persona" |
|
"00087" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L178_312.2490_4.8200.wav" | "um sometimes i will prefer to watch movie or tv series" | 4.82 | "en" | 2 | 1 | "persona" |
|
"00088" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L180_317.0890_2.8800.wav" | "because i think is much better than reading books" | 2.88 | "en" | 2 | 1 | "persona" |
|
"00089" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L182_320.1690_1.0200.wav" | "but uh" | 1.02 | "en" | 2 | 1 | "persona" |
|
"00090" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L184_321.5090_1.5400.wav" | "depends be uh" | 1.54 | "en" | 2 | 1 | "persona" |
|
"00091" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L186_323.0690_3.7400.wav" | "um i sometimes i [UNK] start reading for a while" | 3.74 | "en" | 2 | 1 | "persona" |
|
"00092" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L188_326.8290_2.2600.wav" | "嗯可能我就会" | 2.26 | "zh" | 2 | 1 | "persona" |
|
"00093" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L190_329.3890_4.0800.wav" | "就是花更多的时间就是越来越去看这些书" | 4.08 | "zh" | 2 | 1 | "persona" |
|
"00094" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L192_333.8890_2.7000.wav" | "这就比方说我已经" | 2.7 | "zh" | 2 | 1 | "persona" |
|
"00095" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L194_336.6090_2.0000.wav" | "看了这本书的第一个章节" | 2 | "zh" | 2 | 1 | "persona" |
|
"00096" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L196_338.6690_3.2600.wav" | "and then I may be very likely" | 3.26 | "en" | 2 | 1 | "persona" |
|
"00097" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L198_342.1690_2.1400.wav" | "to keep reading more" | 2.14 | "en" | 2 | 1 | "persona" |
|
"00098" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L200_344.7090_3.6800.wav" | "because i was already被被这本书吸引" | 3.68 | "mixed" | 2 | 1 | "persona" |
|
"00099" | "/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L202_340.740_2.720.wav" | "嗯对我有的时候也是这样所以" | 2.72 | "zh" | 1 | 1 | "persona" |
Dataset Card for ASCEND
Dataset Summary
ASCEND (A Spontaneous Chinese-English Dataset) introduces a high-quality resource of spontaneous multi-turn conversational dialogue Chinese-English code-switching corpus collected in Hong Kong. ASCEND consists of 10.62 hours of spontaneous speech with a total of ~12.3K utterances. The corpus is split into 3 sets: training, validation, and test with a ratio of 8:1:1 while maintaining a balanced gender proportion on each set.
Supported Tasks and Leaderboards
Code-switching
Languages
Chinese and English
Usage
To obtain the full dataset (complete with train, validation, and test set), simply run this:
import datasets
dataset = datasets.load_dataset("CAiRE/ASCEND")
Dataset Structure
A typical data point comprises the path to the audio file, the loaded audio array, and its transcription. Additional fields include datapoint id, duration, language, speaker id, session id, and topic.
{
'id': '00644',
'path': '.cache/huggingface/datasets/downloads/extracted/f0b33b5266cd9452ee310eef3577cf7adb7f29aa54dbff74b9a8ee406a55d614/waves/ses2_spk3_L13101_189.900_5.490.wav',
'audio': {
'path': '.cache/huggingface/datasets/downloads/extracted/f0b33b5266cd9452ee310eef3577cf7adb7f29aa54dbff74b9a8ee406a55d614/waves/ses2_spk3_L13101_189.900_5.490.wav',
'array': array([-6.1035156e-05, -1.8310547e-04, 3.0517578e-05, ...,
0.0000000e+00, -3.0517578e-05, 0.0000000e+00
], dtype = float32),
'sampling_rate': 16000
},
'transcription': '因为你不可能邀你的female friends去说走我们去play basketball',
'duration': 5.489999771118164,
'language': 'mixed',
'original_speaker_id': 3,
'session_id': 2,
'topic': 'sports'
}
Data Splits
Number of utterances: 9,869 train, 1,130 validation, and 1,315 test.
Additional Information
For comprehensive explanations, please check our paper.
Licensing Information
Creative Common Attribution Share-Alike 4.0 International (CC-BY-SA 4.0)
Citation Information
If you use our dataset, please cite us:
@inproceedings{lovenia2022ascend,
title={ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation},
author={Lovenia, Holy and Cahyawijaya, Samuel and Winata, Genta Indra and Xu, Peng and Yan, Xu and Liu, Zihan and Frieske, Rita and Yu, Tiezheng and Dai, Wenliang and Barezi, Elham J and others},
booktitle={Proceedings of the 13th Language Resources and Evaluation Conference (LREC)},
year={2022}
- Downloads last month
- 487