Datasets:
The dataset viewer is not available for this dataset.
Error code: ConfigNamesError Exception: TypeError Message: invalidate_caches() missing 1 required positional argument: 'self' Traceback: Traceback (most recent call last): File "/src/services/worker/src/worker/job_runners/dataset/config_names.py", line 56, in compute_config_names_response for config in sorted(get_dataset_config_names(path=dataset, use_auth_token=use_auth_token)) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 325, in get_dataset_config_names dataset_module = dataset_module_factory( File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1231, in dataset_module_factory raise e1 from None File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1200, in dataset_module_factory return HubDatasetModuleFactoryWithScript( File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 944, in get_module importlib.invalidate_caches() File "/usr/local/lib/python3.9/importlib/__init__.py", line 71, in invalidate_caches finder.invalidate_caches() TypeError: invalidate_caches() missing 1 required positional argument: 'self'
Need help to make the dataset viewer work? Open a discussion for direct support.
Dataset Card for GEM/e2e_nlg
Link to Main Data Card
You can find the main data card on the GEM Website.
Dataset Summary
The E2E NLG dataset is an English benchmark dataset for data-to-text models that verbalize a set of 2-9 key-value attribute pairs in the restaurant domain. The version used for GEM is the cleaned E2E NLG dataset, which filters examples with hallucinations and outputs that don't fully cover all input attributes.
You can load the dataset via:
import datasets
data = datasets.load_dataset('GEM/e2e_nlg')
The data loader can be found here.
website
paper
First data release, Detailed E2E Challenge writeup, Cleaned E2E version
authors
Jekaterina Novikova, Ondrej Dusek and Verena Rieser
Dataset Overview
Where to find the Data and its Documentation
Webpage
Download
Paper
First data release, Detailed E2E Challenge writeup, Cleaned E2E version
BibTex
@inproceedings{e2e_cleaned,
address = {Tokyo, Japan},
title = {Semantic {Noise} {Matters} for {Neural} {Natural} {Language} {Generation}},
url = {https://www.aclweb.org/anthology/W19-8652/},
booktitle = {Proceedings of the 12th {International} {Conference} on {Natural} {Language} {Generation} ({INLG} 2019)},
author = {Dušek, Ondřej and Howcroft, David M and Rieser, Verena},
year = {2019},
pages = {421--426},
}
Contact Name
Ondrej Dusek
Contact Email
Has a Leaderboard?
no
Languages and Intended Use
Multilingual?
no
Covered Dialects
Dialect-specific data was not collected and the language is general British English.
Covered Languages
English
Whose Language?
The original dataset was collected using the CrowdFlower (now Appen) platform using native English speakers (self-reported). No demographic information was provided, but the collection was geographically limited to English-speaking countries.
License
cc-by-sa-4.0: Creative Commons Attribution Share Alike 4.0 International
Intended Use
The dataset was collected to test neural model on a very well specified realization task.
Primary Task
Data-to-Text
Communicative Goal
Producing a text informing/recommending a restaurant, given all and only the attributes specified on the input.
Credit
Curation Organization Type(s)
academic
Curation Organization(s)
Heriot-Watt University
Dataset Creators
Jekaterina Novikova, Ondrej Dusek and Verena Rieser
Funding
This research received funding from the EPSRC projects DILiGENt (EP/M005429/1) and MaDrIgAL (EP/N017536/1).
Who added the Dataset to GEM?
Simon Mille wrote the initial data card and Yacine Jernite the data loader. Sebastian Gehrmann migrated the data card to the v2 format and moved the data loader to the hub.
Dataset Structure
Data Fields
The data is in a CSV format, with the following fields:
mr
-- the meaning representation (MR, input)ref
-- reference, i.e. the corresponding natural-language description (output)
There are additional fields (fixed
, orig_mr
) indicating whether the data was modified in the
cleaning process and what was the original MR before cleaning, but these aren't used for NLG.
The MR has a flat structure -- attribute-value pairs are comma separated, with values enclosed in brackets (see example above). There are 8 attributes:
name
-- restaurant namenear
-- a landmark close to the restaurantarea
-- location (riverside, city centre)food
-- food type / cuisine (e.g. Japanese, Indian, English etc.)eatType
-- restaurant type (restaurant, coffee shop, pub)priceRange
-- price range (low, medium, high, <£20, £20-30, >£30)rating
-- customer rating (low, medium, high, 1/5, 3/5, 5/5)familyFriendly
-- is the restaurant family-friendly (yes/no)
The same MR is often repeated multiple times with different synonymous references.
How were labels chosen?
The source MRs were generated automatically at random from a set of valid attribute values. The labels were crowdsourced and are natural language
Example Instance
{
"input": "name[Alimentum], area[riverside], familyFriendly[yes], near[Burger King]",
"target": "Alimentum is a kids friendly place in the riverside area near Burger King."
}
Data Splits
MRs | Distinct MRs | References | |
---|---|---|---|
Training | 12,568 | 8,362 | 33,525 |
Development | 1,484 | 1,132 | 4,299 |
Test | 1,847 | 1,358 | 4,693 |
Total | 15,899 | 10,852 | 42,517 |
“Distinct MRs” are MRs that remain distinct even if restaurant/place names (attributes name
, near
)
are delexicalized, i.e., replaced with a placeholder.
Splitting Criteria
The data are divided so that MRs in different splits do not overlap.
Dataset in GEM
Rationale for Inclusion in GEM
Why is the Dataset in GEM?
The E2E dataset is one of the largest limited-domain NLG datasets and is frequently used as a data-to-text generation benchmark. The E2E Challenge included 20 systems of very different architectures, with system outputs available for download.
Similar Datasets
yes
Unique Language Coverage
no
Difference from other GEM datasets
The dataset is much cleaner than comparable datasets, and it is also a relatively easy task, making for a straightforward evaluation.
Ability that the Dataset measures
surface realization.
GEM-Specific Curation
Modificatied for GEM?
yes
Additional Splits?
yes
Split Information
4 special test sets for E2E were added to the GEM evaluation suite.
- We created subsets of the training and development sets of ~500 randomly selected inputs each.
- We applied input scrambling on a subset of 500 randomly selected test instances; the order of the input properties was randomly reassigned.
- For the input size, we created subpopulations based on the number of restaurant properties in the input.
Input length | Frequency English |
---|---|
2 | 5 |
3 | 120 |
4 | 389 |
5 | 737 |
6 | 1187 |
7 | 1406 |
8 | 774 |
9 | 73 |
10 | 2 |
Split Motivation
Generalization and robustness
Getting Started with the Task
Previous Results
Previous Results
Measured Model Abilities
Surface realization.
Metrics
BLEU
, METEOR
, ROUGE
Proposed Evaluation
The official evaluation script combines the MT-Eval and COCO Captioning libraries with the following metrics.
- BLEU
- CIDEr
- NIST
- METEOR
- ROUGE-L
Previous results available?
yes
Other Evaluation Approaches
Most previous results, including the shared task results, used the library provided by the dataset creators. The shared task also conducted a human evaluation using the following two criteria:
Quality
: When collecting quality ratings, system outputs were presented to crowd workers together with the corresponding meaning representation, which implies that correctness of the NL utterance relative to the MR should also influence this ranking. The crowd workers were asked: “How do you judge the overall quality of the utterance in terms of its grammatical correctness, fluency, adequacy and other important factors?”Naturalness
: When collecting naturalness ratings, system outputs were presented to crowd workers without the corresponding meaning representation. The crowd workers were asked: “Could the utterance have been produced by a native speaker?”
Relevant Previous Results
The shared task writeup has in-depth evaluations of systems (https://www.sciencedirect.com/science/article/pii/S0885230819300919)
Dataset Curation
Original Curation
Original Curation Rationale
The dataset was collected to showcase/test neural NLG models. It is larger and contains more lexical richness and syntactic variation than previous closed-domain NLG datasets.
Communicative Goal
Producing a text informing/recommending a restaurant, given all and only the attributes specified on the input.
Sourced from Different Sources
no
Language Data
How was Language Data Obtained?
Crowdsourced
Where was it crowdsourced?
Other crowdworker platform
Language Producers
Human references describing the MRs were collected by crowdsourcing on the CrowdFlower (now Appen) platform, with either textual or pictorial MRs as a baseline. The pictorial MRs were used in 20% of cases -- these yield higher lexical variation but introduce noise.
Topics Covered
The dataset is focused on descriptions of restaurants.
Data Validation
validated by data curator
Data Preprocessing
There were basic checks (length, valid characters, repetition).
Was Data Filtered?
algorithmically
Filter Criteria
The cleaned version of the dataset which we are using in GEM was algorithmically filtered. They used regular expressions to match all human-generated references with a more accurate input when attributes were hallucinated or dropped. Additionally, train-test overlap stemming from the transformation was removed. As a result, this data is much cleaner than the original dataset but not perfect (about 20% of instances may have misaligned slots, compared to 40% of the original data.
Structured Annotations
Additional Annotations?
none
Annotation Service?
no
Consent
Any Consent Policy?
yes
Consent Policy Details
Since a crowdsourcing platform was used, the involved raters waived their rights to the data and are aware that the produced annotations can be publicly released.
Private Identifying Information (PII)
Contains PII?
no PII
Justification for no PII
The dataset is artificial and does not contain any description of people.
Maintenance
Any Maintenance Plan?
no
Broader Social Context
Previous Work on the Social Impact of the Dataset
Usage of Models based on the Data
no
Impact on Under-Served Communities
Addresses needs of underserved Communities?
no
Discussion of Biases
Any Documented Social Biases?
no
Are the Language Producers Representative of the Language?
The source data is generated randomly, so it should not contain biases. The human references may be biased by the workers' demographic, but that was not investigated upon data collection.
Considerations for Using the Data
PII Risks and Liability
Licenses
Copyright Restrictions on the Dataset
open license - commercial use allowed
Copyright Restrictions on the Language Data
open license - commercial use allowed
Known Technical Limitations
Technical Limitations
The cleaned version still has data points with hallucinated or omitted attributes.
Unsuited Applications
The data only pertains to the restaurant domain and the included attributes. A model cannot be expected to handle other domains or attributes.
- Downloads last month
- 846