Adding new datasets
Any Model Database user can create a dataset! You can start by creating your dataset repository and choosing one of the following methods to upload your dataset:
- Add files manually to the repository through the UI
- Push files with the
push_to_hub
method from 🤗 Datasets - Use Git to commit and push your dataset files
While in many cases it’s possible to just add raw data to your dataset repo in any supported formats (JSON, CSV, Parquet, text, images, audio files, …), for some large datasets you may want to create a loading script. This script defines the different configurations and splits of your dataset, as well as how to download and process the data.
Datasets outside a namespace
Datasets outside a namespace are maintained by the Model Database team. Unlike the naming convention used for community datasets (username/dataset_name
or org/dataset_name
), datasets outside a namespace can be referenced directly by their name (e.g. glue
). If you find that an improvement is needed, use their “Community” tab to open a discussion or submit a PR on the Hub to propose edits.