Datasets:
Tasks:
Text Generation
Languages:
English
Size Categories:
10K<n<100K
Source Datasets:
nampdn-ai/tiny-en
License:
cc-by-sa-4.0
Tiny Lessons
The dataset is designed to help causal language models learn more effectively from raw web text. It is augmented from public web text and contains two key components: theoretical concepts and practical examples.
The theoretical concepts provide a foundation for understanding the underlying principles and ideas behind the information contained in the raw web text. The practical examples demonstrate how these theoretical concepts can be applied in real-world situations.
This dataset is an ideal resource for ML researchers working with causal language models. I hope you find it useful and welcome any feedback or suggestions you may have.
- Downloads last month
- 15