The dataset is currently empty. Upload or create new data files. Then, you will be able to explore them in the Dataset Viewer.

Here is a collective list of instruction dataset used for Neural Chat fine-tuning. The total number of instruction samples and tokens are about 1.1M and 3M respectively.

Type Language Dataset Number
HC3 en HC3 24K
dolly en databricks-dolly-15k 15K
alpaca-zh zh tigerbot-alpaca-zh-0.5m 500K
alpaca-en en TigerResearch/tigerbot-alpaca-en-50k 50K
math en tigerbot-gsm-8k-en 8K
general en tigerbot-stackexchange-qa-en-0.5m 500K

The collective dataset has been validated on multiple LLMs (such as MPT, LLama) by the NeuralChat team (Kaokao Lv, Wenxin Zhang, Xuhui Ren, and Haihao Shen) from Intel/SATG/AIA/AIPT. Thanks to Hello-SimpleAI, databricks, TigerResearch/TigerBot for releasing the open-source instruction dataset.

Downloads last month
1