Preview a dataset
Datasets Server provides a /first-rows
endpoint for visualizing the first 100 rows of a dataset. This’ll give you a good idea of the data types and example data contained in a dataset.
This guide shows you how to use Datasets Server’s /first-rows
endpoint to preview a dataset. Feel free to also try it out with Postman, RapidAPI, or ReDoc.
The /first-rows
endpoint accepts three query parameters:
dataset
: the dataset name, for exampleglue
ormozilla-foundation/common_voice_10_0
config
: the configuration name, for examplecola
split
: the split name, for exampletrain
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://datasets-server.huggingface.co/first-rows?dataset=duorc&config=SelfRC&split=train"
def query():
response = requests.get(API_URL, headers=headers)
return response.json()
data = query()
The endpoint response is a JSON containing two keys:
- The
features
of a dataset, including the column’s name and data type. - The first 100
rows
of a dataset and the content contained in each column of a specific row.
For example, here are the features
and the first 100 rows
of the duorc
/SelfRC
train split:
{
"dataset": "duorc",
"config": "SelfRC",
"split": "train",
"features": [
{
"feature_idx": 0,
"name": "plot_id",
"type": { "dtype": "string", "_type": "Value" }
},
{
"feature_idx": 1,
"name": "plot",
"type": { "dtype": "string", "_type": "Value" }
},
{
"feature_idx": 2,
"name": "title",
"type": { "dtype": "string", "_type": "Value" }
},
{
"feature_idx": 3,
"name": "question_id",
"type": { "dtype": "string", "_type": "Value" }
},
{
"feature_idx": 4,
"name": "question",
"type": { "dtype": "string", "_type": "Value" }
},
{
"feature_idx": 5,
"name": "answers",
"type": {
"feature": { "dtype": "string", "_type": "Value" },
"_type": "Sequence"
}
},
{
"feature_idx": 6,
"name": "no_answer",
"type": { "dtype": "bool", "_type": "Value" }
}
],
"rows": [
{
"row_idx": 0,
"row": {
"plot_id": "/m/03vyhn",
"plot": "200 years in the future, Mars has been colonized by a high-tech company.\nMelanie Ballard (Natasha Henstridge) arrives by train to a Mars mining camp which has cut all communication links with the company headquarters. She's not alone, as she is with a group of fellow police officers. They find the mining camp deserted except for a person in the prison, Desolation Williams (Ice Cube), who seems to laugh about them because they are all going to die. They were supposed to take Desolation to headquarters, but decide to explore first to find out what happened.They find a man inside an encapsulated mining car, who tells them not to open it. However, they do and he tries to kill them. One of the cops witnesses strange men with deep scarred and heavily tattooed faces killing the remaining survivors. The cops realise they need to leave the place fast.Desolation explains that the miners opened a kind of Martian construction in the soil which unleashed red dust. Those who breathed that dust became violent psychopaths who started to build weapons and kill the uninfected. They changed genetically, becoming distorted but much stronger.The cops and Desolation leave the prison with difficulty, and devise a plan to kill all the genetically modified ex-miners on the way out. However, the plan goes awry, and only Melanie and Desolation reach headquarters alive. Melanie realises that her bosses won't ever believe her. However, the red dust eventually arrives to headquarters, and Melanie and Desolation need to fight once again.",
"title": "Ghosts of Mars",
"question_id": "b440de7d-9c3f-841c-eaec-a14bdff950d1",
"question": "How did the police arrive at the Mars mining camp?",
"answers": ["They arrived by train."],
"no_answer": false
},
"truncated_cells": []
},
{
"row_idx": 1,
"row": {
"plot_id": "/m/03vyhn",
"plot": "200 years in the future, Mars has been colonized by a high-tech company.\nMelanie Ballard (Natasha Henstridge) arrives by train to a Mars mining camp which has cut all communication links with the company headquarters. She's not alone, as she is with a group of fellow police officers. They find the mining camp deserted except for a person in the prison, Desolation Williams (Ice Cube), who seems to laugh about them because they are all going to die. They were supposed to take Desolation to headquarters, but decide to explore first to find out what happened.They find a man inside an encapsulated mining car, who tells them not to open it. However, they do and he tries to kill them. One of the cops witnesses strange men with deep scarred and heavily tattooed faces killing the remaining survivors. The cops realise they need to leave the place fast.Desolation explains that the miners opened a kind of Martian construction in the soil which unleashed red dust. Those who breathed that dust became violent psychopaths who started to build weapons and kill the uninfected. They changed genetically, becoming distorted but much stronger.The cops and Desolation leave the prison with difficulty, and devise a plan to kill all the genetically modified ex-miners on the way out. However, the plan goes awry, and only Melanie and Desolation reach headquarters alive. Melanie realises that her bosses won't ever believe her. However, the red dust eventually arrives to headquarters, and Melanie and Desolation need to fight once again.",
"title": "Ghosts of Mars",
"question_id": "a9f95c0d-121f-3ca9-1595-d497dc8bc56c",
"question": "Who has colonized Mars 200 years in the future?",
"answers": [
"A high-tech company has colonized Mars 200 years in the future."
],
"no_answer": false
},
"truncated_cells": []
}
...
],
"truncated": false
}
Truncated responses
For some datasets, the response size from /first-rows
may exceed 1MB, in which case the response is truncated until the size is under 1MB. This means you may not get 100 rows in your response because the rows are truncated, in which case the truncated
field would be true
.
In some cases, if even the first few rows generate a response that exceeds 1MB, some of the columns are truncated and converted to a string. You’ll see these listed in the truncated_cells
field.
For example, the ett
dataset only returns 10 rows, and the target
and feat_dynamic_real
columns are truncated:
...
"rows": [
{
"row_idx": 0,
"row": {
"start": "2016-07-01T00:00:00",
"target": "[38.6619987487793,38.222999572753906,37.34400177001953,37.124000549316406,37.124000549316406,36.9039",
"feat_static_cat": [0],
"feat_dynamic_real": "[[41.130001068115234,39.62200164794922,38.86800003051758,35.518001556396484,37.52799987792969,37.611",
"item_id": "OT"
},
"truncated_cells": ["target", "feat_dynamic_real"]
},
{
"row_idx": 1,
"row": {
"start": "2016-07-01T00:00:00",
"target": "[38.6619987487793,38.222999572753906,37.34400177001953,37.124000549316406,37.124000549316406,36.9039",
"feat_static_cat": [0],
"feat_dynamic_real": "[[41.130001068115234,39.62200164794922,38.86800003051758,35.518001556396484,37.52799987792969,37.611",
"item_id": "OT"
},
"truncated_cells": ["target", "feat_dynamic_real"]
},
...
],
truncated: true