Check dataset validity
Before you download a dataset from the Hub, it is helpful to know if a specific dataset you’re interested in is available. Datasets Server provides the /is-valid
endpoint to check if a specific dataset works without any errors.
The API endpoint will return an error for datasets that cannot be loaded with the 🤗 Datasets library, for example, because the data hasn’t been uploaded or the format is not supported.
preview
field in the
response of /is-valid
to check if a dataset is partially
supported.
This guide shows you how to check dataset validity programmatically, but free to try it out with Postman, RapidAPI, or ReDoc.
Check if a dataset is valid
/is-valid
checks whether a specific dataset loads without any error. This endpoint’s query parameter requires you to specify the name of the dataset:
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://datasets-server.huggingface.co/is-valid?dataset=rotten_tomatoes"
def query():
response = requests.get(API_URL, headers=headers)
return response.json()
data = query()
The response looks like this if a dataset is valid:
{
"viewer": true,
"preview": true
}
If only the first rows of a dataset are available, then the response looks like:
{
"viewer": false,
"preview": true
}
Finally, if the dataset is not valid at all, then the response is:
{
"viewer": false,
"preview": false
}
Some cases where a dataset is not valid are:
- the dataset viewer is disabled
- the dataset is gated but the access is not granted: no token is passed or the passed token is not authorized
- the dataset is private
- the dataset contains no data or the data format is not supported