The dataset is currently empty. Upload or create new data files. Then, you will be able to explore them in the Dataset Viewer.
Here is a collective list of instruction dataset used for Neural Chat fine-tuning. The total number of instruction samples and tokens are about 1.1M and 3M respectively.
Type | Language | Dataset | Number |
---|---|---|---|
HC3 | en | HC3 | 24K |
dolly | en | databricks-dolly-15k | 15K |
alpaca-zh | zh | tigerbot-alpaca-zh-0.5m | 500K |
alpaca-en | en | TigerResearch/tigerbot-alpaca-en-50k | 50K |
math | en | tigerbot-gsm-8k-en | 8K |
general | en | tigerbot-stackexchange-qa-en-0.5m | 500K |
The collective dataset has been validated on multiple LLMs (such as MPT, LLama) by the NeuralChat team (Kaokao Lv, Wenxin Zhang, Xuhui Ren, and Haihao Shen) from Intel/SATG/AIA/AIPT. Thanks to Hello-SimpleAI, databricks, TigerResearch/TigerBot for releasing the open-source instruction dataset.
- Downloads last month
- 1