Tiny Lessons

The dataset is designed to help causal language models learn more effectively from raw web text. It is augmented from public web text and contains two key components: theoretical concepts and practical examples.

The theoretical concepts provide a foundation for understanding the underlying principles and ideas behind the information contained in the raw web text. The practical examples demonstrate how these theoretical concepts can be applied in real-world situations.

This dataset is an ideal resource for ML researchers working with causal language models. I hope you find it useful and welcome any feedback or suggestions you may have.

View Nomic Atlas

Datasets:

nampdn-ai
/

tiny-lessons

You need to agree to share your contact information to access this dataset

Tiny Lessons