Datasets:
gem_id
string
| gem_parent_id
string
| input
list
| target
string
| references
list
| category
string
| webnlg_id
string
|
---|---|---|---|---|---|---|
"web_nlg_en-train-0" | "web_nlg_en-train-0" | [
"Aarhus_Airport | cityServed | \"Aarhus, Denmark\""
] | "The Aarhus is the airport of Aarhus, Denmark." | [] | "Airport" | "train/Airport/1/Id1" |
"web_nlg_en-train-1" | "web_nlg_en-train-1" | [
"Aarhus_Airport | cityServed | \"Aarhus, Denmark\""
] | "Aarhus Airport serves the city of Aarhus, Denmark." | [] | "Airport" | "train/Airport/1/Id1" |
"web_nlg_en-train-2" | "web_nlg_en-train-2" | [
"Aarhus_Airport | cityServed | Aarhus"
] | "Aarhus airport serves the city of Aarhus." | [] | "Airport" | "train/Airport/1/Id2" |
"web_nlg_en-train-3" | "web_nlg_en-train-3" | [
"Aarhus_Airport | elevationAboveTheSeaLevel | 25.0"
] | "Aarhus Airport is 25 metres above sea level." | [] | "Airport" | "train/Airport/1/Id3" |
"web_nlg_en-train-4" | "web_nlg_en-train-4" | [
"Aarhus_Airport | elevationAboveTheSeaLevel | 25.0"
] | "Aarhus airport is at an elevation of 25 metres above seal level." | [] | "Airport" | "train/Airport/1/Id3" |
"web_nlg_en-train-5" | "web_nlg_en-train-5" | [
"Aarhus_Airport | elevationAboveTheSeaLevel | 25.0"
] | "Aarhus Airport is 25.0 metres above the sea level." | [] | "Airport" | "train/Airport/1/Id3" |
"web_nlg_en-train-6" | "web_nlg_en-train-6" | [
"Aarhus_Airport | location | Tirstrup"
] | "Aarhus Airport is located in Tirstrup." | [] | "Airport" | "train/Airport/1/Id4" |
"web_nlg_en-train-7" | "web_nlg_en-train-7" | [
"Aarhus_Airport | location | Tirstrup"
] | "The location of Aarhus Airport is Tirstrup." | [] | "Airport" | "train/Airport/1/Id4" |
"web_nlg_en-train-8" | "web_nlg_en-train-8" | [
"Aarhus_Airport | operatingOrganisation | \"Aarhus Lufthavn A/S\""
] | "Aarhus Airport is operated by Aarhus Lufthavn A/S." | [] | "Airport" | "train/Airport/1/Id5" |
"web_nlg_en-train-9" | "web_nlg_en-train-9" | [
"Aarhus_Airport | operatingOrganisation | \"Aarhus Lufthavn A/S\""
] | "Aarhus Lufthavn A/S is the operation organisation of Aarhus Airport." | [] | "Airport" | "train/Airport/1/Id5" |
"web_nlg_en-train-10" | "web_nlg_en-train-10" | [
"Aarhus_Airport | operatingOrganisation | Aktieselskab"
] | "Aktieselskab is the operating organisation for Aarhus Airport." | [] | "Airport" | "train/Airport/1/Id6" |
"web_nlg_en-train-11" | "web_nlg_en-train-11" | [
"Aarhus_Airport | operatingOrganisation | Aktieselskab"
] | "Aktieselskab operates Aarhus Airport." | [] | "Airport" | "train/Airport/1/Id6" |
"web_nlg_en-train-12" | "web_nlg_en-train-12" | [
"Aarhus_Airport | operatingOrganisation | Aktieselskab"
] | "Aarhus Airport is operated by the Aktieselskab organisation." | [] | "Airport" | "train/Airport/1/Id6" |
"web_nlg_en-train-13" | "web_nlg_en-train-13" | [
"Aarhus_Airport | runwayLength | 2776.0"
] | "Aarhus Airport runway length is 2776.0." | [] | "Airport" | "train/Airport/1/Id7" |
"web_nlg_en-train-14" | "web_nlg_en-train-14" | [
"Aarhus_Airport | runwayLength | 2776.0"
] | "Aarhus Airport has a runway length of 2776.0." | [] | "Airport" | "train/Airport/1/Id7" |
"web_nlg_en-train-15" | "web_nlg_en-train-15" | [
"Aarhus_Airport | runwayLength | 2776.0"
] | "The length of the runway at Aarhus Airport is 2776.0." | [] | "Airport" | "train/Airport/1/Id7" |
"web_nlg_en-train-16" | "web_nlg_en-train-16" | [
"Aarhus_Airport | runwayLength | 2777.0"
] | "The runway length at Aarhus Airport is 2777.0 meters." | [] | "Airport" | "train/Airport/1/Id8" |
"web_nlg_en-train-17" | "web_nlg_en-train-17" | [
"Aarhus_Airport | runwayLength | 2777.0"
] | "The runway length at Aarhus airport is 2777.0." | [] | "Airport" | "train/Airport/1/Id8" |
"web_nlg_en-train-18" | "web_nlg_en-train-18" | [
"Aarhus_Airport | runwayName | \"10L/28R\""
] | "Aarhus Airport runway name is 10L/28R." | [] | "Airport" | "train/Airport/1/Id9" |
"web_nlg_en-train-19" | "web_nlg_en-train-19" | [
"Aarhus_Airport | runwayName | \"10L/28R\""
] | "10L/28R is the runway name of the Aarhus Airport." | [] | "Airport" | "train/Airport/1/Id9" |
"web_nlg_en-train-20" | "web_nlg_en-train-20" | [
"Aarhus_Airport | runwayName | \"10L/28R\""
] | "The runway name of Aarhus Airport is 10L/28R." | [] | "Airport" | "train/Airport/1/Id9" |
"web_nlg_en-train-21" | "web_nlg_en-train-21" | [
"Aarhus_Airport | runwayName | \"10R/28L\""
] | "The runway name at Aarhus Airport is "10R/28L"." | [] | "Airport" | "train/Airport/1/Id10" |
"web_nlg_en-train-22" | "web_nlg_en-train-22" | [
"Aarhus_Airport | runwayName | \"10R/28L\""
] | "10R/28L is the runway name at Aarhus airport." | [] | "Airport" | "train/Airport/1/Id10" |
"web_nlg_en-train-23" | "web_nlg_en-train-23" | [
"Aarhus_Airport | runwayName | \"10R/28L\""
] | "The runway name of Aarhus Airport is 10R/28L." | [] | "Airport" | "train/Airport/1/Id10" |
"web_nlg_en-train-24" | "web_nlg_en-train-24" | [
"Abilene,_Texas | country | United_States"
] | "Abilene, Texas is in the United States." | [] | "Airport" | "train/Airport/1/Id11" |
"web_nlg_en-train-25" | "web_nlg_en-train-25" | [
"Abilene,_Texas | country | United_States"
] | "Abilene Texas is in the United States." | [] | "Airport" | "train/Airport/1/Id11" |
"web_nlg_en-train-26" | "web_nlg_en-train-26" | [
"Abilene,_Texas | country | United_States"
] | "Abilene, Texas is located in the United States." | [] | "Airport" | "train/Airport/1/Id11" |
"web_nlg_en-train-27" | "web_nlg_en-train-27" | [
"Abilene,_Texas | isPartOf | Jones_County,_Texas"
] | "Abilene, Texas is part of Jones County, Texas." | [] | "Airport" | "train/Airport/1/Id12" |
"web_nlg_en-train-28" | "web_nlg_en-train-28" | [
"Abilene,_Texas | isPartOf | Jones_County,_Texas"
] | "Abilene is part of Jones County, Texas." | [] | "Airport" | "train/Airport/1/Id12" |
"web_nlg_en-train-29" | "web_nlg_en-train-29" | [
"Abilene,_Texas | isPartOf | Taylor_County,_Texas"
] | "Abilene, Texas is part of Taylor County, Texas." | [] | "Airport" | "train/Airport/1/Id13" |
"web_nlg_en-train-30" | "web_nlg_en-train-30" | [
"Abilene,_Texas | isPartOf | Taylor_County,_Texas"
] | "Abilene is a part of Taylor County, Texas." | [] | "Airport" | "train/Airport/1/Id13" |
"web_nlg_en-train-31" | "web_nlg_en-train-31" | [
"Abilene,_Texas | isPartOf | Texas"
] | "Abilene, Texas is part of Texas." | [] | "Airport" | "train/Airport/1/Id14" |
"web_nlg_en-train-32" | "web_nlg_en-train-32" | [
"Abilene,_Texas | isPartOf | Texas"
] | "Abilene is part of Texas." | [] | "Airport" | "train/Airport/1/Id14" |
"web_nlg_en-train-33" | "web_nlg_en-train-33" | [
"Abilene_Regional_Airport | 1stRunwayLengthFeet | 3678"
] | "The length of the 1st runway at Abilene Regional airport is 3678 feet." | [] | "Airport" | "train/Airport/1/Id15" |
"web_nlg_en-train-34" | "web_nlg_en-train-34" | [
"Abilene_Regional_Airport | 1stRunwaySurfaceType | Asphalt"
] | "The first runway at Abilene Regional Airport is made from asphalt." | [] | "Airport" | "train/Airport/1/Id16" |
"web_nlg_en-train-35" | "web_nlg_en-train-35" | [
"Abilene_Regional_Airport | 1stRunwaySurfaceType | Asphalt"
] | "The 1st runway at Abilene Regional Airport is made of Asphalt." | [] | "Airport" | "train/Airport/1/Id16" |
"web_nlg_en-train-36" | "web_nlg_en-train-36" | [
"Abilene_Regional_Airport | 3rdRunwayLengthFeet | 7202"
] | "The third runway at Abilene Regional Airport is 7,202 feet long." | [] | "Airport" | "train/Airport/1/Id17" |
"web_nlg_en-train-37" | "web_nlg_en-train-37" | [
"Abilene_Regional_Airport | 3rdRunwayLengthFeet | 7202"
] | "The 3rd runway at Abilene Regional airport is 7202 feet." | [] | "Airport" | "train/Airport/1/Id17" |
"web_nlg_en-train-38" | "web_nlg_en-train-38" | [
"Abilene_Regional_Airport | 3rdRunwayLengthFeet | 7202"
] | "The Abilene Regional Airport's 3rd runway length is ft is 7202." | [] | "Airport" | "train/Airport/1/Id17" |
"web_nlg_en-train-39" | "web_nlg_en-train-39" | [
"Abilene_Regional_Airport | icaoLocationIdentifier | \"KABI\""
] | "Abilene Regional Airport ICAO Location Identifier is KABI." | [] | "Airport" | "train/Airport/1/Id18" |
"web_nlg_en-train-40" | "web_nlg_en-train-40" | [
"Abilene_Regional_Airport | icaoLocationIdentifier | \"KABI\""
] | "KABI is the ICAO location identifier of Abilene Regional Airport." | [] | "Airport" | "train/Airport/1/Id18" |
"web_nlg_en-train-41" | "web_nlg_en-train-41" | [
"Abilene_Regional_Airport | icaoLocationIdentifier | \"KABI\""
] | "The ICAO Location Identifier of Abilene Regional Airport is KABI." | [] | "Airport" | "train/Airport/1/Id18" |
"web_nlg_en-train-42" | "web_nlg_en-train-42" | [
"Abilene_Regional_Airport | elevationAboveTheSeaLevel | 546"
] | "Abilene Regional Airport elevation above the sea level in metres is 546." | [] | "Airport" | "train/Airport/1/Id19" |
"web_nlg_en-train-43" | "web_nlg_en-train-43" | [
"Abilene_Regional_Airport | elevationAboveTheSeaLevel | 546"
] | "The Abilene Regional Airport is 546 metres above sea level." | [] | "Airport" | "train/Airport/1/Id19" |
"web_nlg_en-train-44" | "web_nlg_en-train-44" | [
"Abilene_Regional_Airport | elevationAboveTheSeaLevel | 546"
] | "Abilene Regional Airport is located 546 metres above sea level." | [] | "Airport" | "train/Airport/1/Id19" |
"web_nlg_en-train-45" | "web_nlg_en-train-45" | [
"Abilene_Regional_Airport | locationIdentifier | \"ABI\""
] | "The location identifier for Abilene Regional airport is ABI." | [] | "Airport" | "train/Airport/1/Id20" |
"web_nlg_en-train-46" | "web_nlg_en-train-46" | [
"Abilene_Regional_Airport | locationIdentifier | \"ABI\""
] | "The Abilene Regional Airport's location id is "ABI"." | [] | "Airport" | "train/Airport/1/Id20" |
"web_nlg_en-train-47" | "web_nlg_en-train-47" | [
"Abilene_Regional_Airport | locationIdentifier | \"ABI\""
] | "The location Identifier of Abilene Regional Airport is ABI." | [] | "Airport" | "train/Airport/1/Id20" |
"web_nlg_en-train-48" | "web_nlg_en-train-48" | [
"Abilene_Regional_Airport | runwayLength | 1121.0"
] | "The runway length of Abilene Regional Airport is 1,121." | [] | "Airport" | "train/Airport/1/Id21" |
"web_nlg_en-train-49" | "web_nlg_en-train-49" | [
"Abilene_Regional_Airport | runwayLength | 1121.0"
] | "Abilene Regional airport has a runway length of 1121.0." | [] | "Airport" | "train/Airport/1/Id21" |
"web_nlg_en-train-50" | "web_nlg_en-train-50" | [
"Abilene_Regional_Airport | runwayLength | 1121.0"
] | "The runway length of Abilene Regional Airport is 1121.0." | [] | "Airport" | "train/Airport/1/Id21" |
"web_nlg_en-train-51" | "web_nlg_en-train-51" | [
"Abilene_Regional_Airport | runwayLength | 2194.0"
] | "The runway length of Abilene Regional Airport is 2194.0." | [] | "Airport" | "train/Airport/1/Id22" |
"web_nlg_en-train-52" | "web_nlg_en-train-52" | [
"Abilene_Regional_Airport | runwayLength | 2195.0"
] | "The runway length of Abilene Regional Airport is 2,195." | [] | "Airport" | "train/Airport/1/Id23" |
"web_nlg_en-train-53" | "web_nlg_en-train-53" | [
"Abilene_Regional_Airport | runwayLength | 2195.0"
] | "The runway length of Abilene Regional Airport is 2195.0." | [] | "Airport" | "train/Airport/1/Id23" |
"web_nlg_en-train-54" | "web_nlg_en-train-54" | [
"Abilene_Regional_Airport | runwayName | \"17L/35R\""
] | "Abilene Regional Airport runway name is 17L/35R." | [] | "Airport" | "train/Airport/1/Id24" |
"web_nlg_en-train-55" | "web_nlg_en-train-55" | [
"Abilene_Regional_Airport | runwayName | \"17L/35R\""
] | "17L/35R is the runway name of Abilene Regional Airport." | [] | "Airport" | "train/Airport/1/Id24" |
"web_nlg_en-train-56" | "web_nlg_en-train-56" | [
"Abilene_Regional_Airport | runwayName | \"17L/35R\""
] | "Abilene Regional Airport has the runway name 17L/35R." | [] | "Airport" | "train/Airport/1/Id24" |
"web_nlg_en-train-57" | "web_nlg_en-train-57" | [
"Abilene_Regional_Airport | runwayName | \"17R/35L\""
] | "17R/35L is the runway name at Abilene Regional airport." | [] | "Airport" | "train/Airport/1/Id25" |
"web_nlg_en-train-58" | "web_nlg_en-train-58" | [
"Abilene_Regional_Airport | runwayName | \"17R/35L\""
] | "The name of the runway at Abilene Regional Airport is 17R/35L." | [] | "Airport" | "train/Airport/1/Id25" |
"web_nlg_en-train-59" | "web_nlg_en-train-59" | [
"Abilene_Regional_Airport | runwayName | \"17R/35L\""
] | "The runway name of Abilene Regional Airport is 17R/35L." | [] | "Airport" | "train/Airport/1/Id25" |
"web_nlg_en-train-60" | "web_nlg_en-train-60" | [
"Adirondack_Regional_Airport | 1stRunwayLengthFeet | 6573"
] | "The length of the first runway at Adirondack Regional Airport is 6,573 feet." | [] | "Airport" | "train/Airport/1/Id26" |
"web_nlg_en-train-61" | "web_nlg_en-train-61" | [
"Adirondack_Regional_Airport | 1stRunwayLengthFeet | 6573"
] | "6573 feet is the length of the first runway at Adirondack Regional Airport." | [] | "Airport" | "train/Airport/1/Id26" |
"web_nlg_en-train-62" | "web_nlg_en-train-62" | [
"Adirondack_Regional_Airport | 1stRunwayLengthFeet | 6573"
] | "The 1st runway length in feet of Adirondack Regional Airport is 6573." | [] | "Airport" | "train/Airport/1/Id26" |
"web_nlg_en-train-63" | "web_nlg_en-train-63" | [
"Adirondack_Regional_Airport | cityServed | Lake_Placid,_New_York"
] | "Lake Placid, N.Y. is served by the Adirondack Regional Airport." | [] | "Airport" | "train/Airport/1/Id27" |
"web_nlg_en-train-64" | "web_nlg_en-train-64" | [
"Adirondack_Regional_Airport | cityServed | Lake_Placid,_New_York"
] | "Adirondack Regional Airport serves the city of Lake Placid, New York." | [] | "Airport" | "train/Airport/1/Id27" |
"web_nlg_en-train-65" | "web_nlg_en-train-65" | [
"Adirondack_Regional_Airport | cityServed | Saranac_Lake,_New_York"
] | "Adirondack Regional Airport serves the city of Saranac Lake, New York." | [] | "Airport" | "train/Airport/1/Id28" |
"web_nlg_en-train-66" | "web_nlg_en-train-66" | [
"Adirondack_Regional_Airport | locationIdentifier | \"SLK\""
] | "Adirondack Regional Airport location identifier is SLK." | [] | "Airport" | "train/Airport/1/Id29" |
"web_nlg_en-train-67" | "web_nlg_en-train-67" | [
"Adirondack_Regional_Airport | locationIdentifier | \"SLK\""
] | "SLK is the I.D. of the Adirondack Regional Airport." | [] | "Airport" | "train/Airport/1/Id29" |
"web_nlg_en-train-68" | "web_nlg_en-train-68" | [
"Adirondack_Regional_Airport | locationIdentifier | \"SLK\""
] | "The Adirondack Regional Airport location identifier is SLK." | [] | "Airport" | "train/Airport/1/Id29" |
"web_nlg_en-train-69" | "web_nlg_en-train-69" | [
"Adirondack_Regional_Airport | runwayLength | 1219.0"
] | "The runway length of Adirondack Regional Airport is 1,219." | [] | "Airport" | "train/Airport/1/Id30" |
"web_nlg_en-train-70" | "web_nlg_en-train-70" | [
"Adirondack_Regional_Airport | runwayLength | 1219.0"
] | "The runway length at Adirondack Regional Airport is 1219.0." | [] | "Airport" | "train/Airport/1/Id30" |
"web_nlg_en-train-71" | "web_nlg_en-train-71" | [
"Adirondack_Regional_Airport | runwayLength | 1219.0"
] | "The runway length of Adirondack Regional Airport is 1219.0." | [] | "Airport" | "train/Airport/1/Id30" |
"web_nlg_en-train-72" | "web_nlg_en-train-72" | [
"Adirondack_Regional_Airport | runwayLength | 2003.0"
] | "The runway length of Adirondack Regional Airport is 2003.0." | [] | "Airport" | "train/Airport/1/Id31" |
"web_nlg_en-train-73" | "web_nlg_en-train-73" | [
"Adirondack_Regional_Airport | runwayLength | 2003.0"
] | "The length of the runway at Adirondack Regional Airport is 2003.0." | [] | "Airport" | "train/Airport/1/Id31" |
"web_nlg_en-train-74" | "web_nlg_en-train-74" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | elevationAboveTheSeaLevel | 610.0"
] | "Adolfo Suárez Madrid-Barajas Airport has an elevation of 610.0 metres above sea level." | [] | "Airport" | "train/Airport/1/Id32" |
"web_nlg_en-train-75" | "web_nlg_en-train-75" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | elevationAboveTheSeaLevel | 610.0"
] | "Adolfo Suárez Madrid–Barajas Airport is elevated 610 metres above sea level." | [] | "Airport" | "train/Airport/1/Id32" |
"web_nlg_en-train-76" | "web_nlg_en-train-76" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | location | Alcobendas"
] | "Adolfo Suárez Madrid–Barajas Airport is in Alcobendas." | [] | "Airport" | "train/Airport/1/Id33" |
"web_nlg_en-train-77" | "web_nlg_en-train-77" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | location | Alcobendas"
] | "Adolfo Suárez Madrid Barajas Airport is found in Alcobendas." | [] | "Airport" | "train/Airport/1/Id33" |
"web_nlg_en-train-78" | "web_nlg_en-train-78" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | location | Alcobendas"
] | "Adolfo Suárez Madrid–Barajas Airport is located in Alcobendas." | [] | "Airport" | "train/Airport/1/Id33" |
"web_nlg_en-train-79" | "web_nlg_en-train-79" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | location | Madrid"
] | "Adolfo Suárez Madrid–Barajas Airport is found in Madrid." | [] | "Airport" | "train/Airport/1/Id34" |
"web_nlg_en-train-80" | "web_nlg_en-train-80" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | location | Madrid"
] | "The Adolfo Suárez Madrid–Barajas Airport is in Madrid." | [] | "Airport" | "train/Airport/1/Id34" |
"web_nlg_en-train-81" | "web_nlg_en-train-81" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | location | Madrid"
] | "Adolfo Suarez Madrid-Barajas Airport is located in Madrid." | [] | "Airport" | "train/Airport/1/Id34" |
"web_nlg_en-train-82" | "web_nlg_en-train-82" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | location | Paracuellos_de_Jarama"
] | "Adolfo Suárez Madrid–Barajas Airport can be found in Paracuellos de Jarama." | [] | "Airport" | "train/Airport/1/Id35" |
"web_nlg_en-train-83" | "web_nlg_en-train-83" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | location | Paracuellos_de_Jarama"
] | "Adolfo Suarez Madrid-Barajas airport is located at Paracuellos de Jarama." | [] | "Airport" | "train/Airport/1/Id35" |
"web_nlg_en-train-84" | "web_nlg_en-train-84" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | location | Paracuellos_de_Jarama"
] | "The Adolfo Suárez Madrid–Barajas Airport is in Paracuellos de Jarama." | [] | "Airport" | "train/Airport/1/Id35" |
"web_nlg_en-train-85" | "web_nlg_en-train-85" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | operatingOrganisation | ENAIRE"
] | "The Adolfo Suárez Madrid–Barajas Airport is operated by ENAIRE." | [] | "Airport" | "train/Airport/1/Id36" |
"web_nlg_en-train-86" | "web_nlg_en-train-86" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | operatingOrganisation | ENAIRE"
] | "ENAIRE is the operating organisation for Adolfo Suarez Madrid-Barajas airport." | [] | "Airport" | "train/Airport/1/Id36" |
"web_nlg_en-train-87" | "web_nlg_en-train-87" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | operatingOrganisation | ENAIRE"
] | "Adolfo Suarez Madrid-Barajas Airport is operated by ENAIRE." | [] | "Airport" | "train/Airport/1/Id36" |
"web_nlg_en-train-88" | "web_nlg_en-train-88" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | runwayLength | 3500.0"
] | "The runway length of Adolfo Suárez Madrid–Barajas Airport is 3,500." | [] | "Airport" | "train/Airport/1/Id37" |
"web_nlg_en-train-89" | "web_nlg_en-train-89" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | runwayLength | 3500.0"
] | "The runway length at Adolfo Suarez Madrid-Barajas airport is 3500.0." | [] | "Airport" | "train/Airport/1/Id37" |
"web_nlg_en-train-90" | "web_nlg_en-train-90" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | runwayLength | 3500.0"
] | "The Adolfo Suárez Madrid–Barajas Airport's runway length is 3500." | [] | "Airport" | "train/Airport/1/Id37" |
"web_nlg_en-train-91" | "web_nlg_en-train-91" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | runwayLength | 4100.0"
] | "The runway length of Adolfo Suárez Madrid–Barajas Airport is 4,100." | [] | "Airport" | "train/Airport/1/Id38" |
"web_nlg_en-train-92" | "web_nlg_en-train-92" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | runwayLength | 4100.0"
] | "The runway length of Adolfo Suarez Madrid-Barajas airport is 4100.0." | [] | "Airport" | "train/Airport/1/Id38" |
"web_nlg_en-train-93" | "web_nlg_en-train-93" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | runwayLength | 4100.0"
] | "The length of the runway at Adolfo Suarez Madrid Barajas Airport is 4100.0." | [] | "Airport" | "train/Airport/1/Id38" |
"web_nlg_en-train-94" | "web_nlg_en-train-94" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | runwayLength | 4349.0"
] | "The runway length of Adolfo Suárez Madrid–Barajas Airport is 4,349." | [] | "Airport" | "train/Airport/1/Id39" |
"web_nlg_en-train-95" | "web_nlg_en-train-95" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | runwayLength | 4349.0"
] | "Adolfo Suárez Madrid–Barajas Airport has a runway that is 4349 metres long." | [] | "Airport" | "train/Airport/1/Id39" |
"web_nlg_en-train-96" | "web_nlg_en-train-96" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | runwayLength | 4349.0"
] | "The runway length of Adolfo Suarez Madrid-Barajas Airport is 4349.0." | [] | "Airport" | "train/Airport/1/Id39" |
"web_nlg_en-train-97" | "web_nlg_en-train-97" | [
"Adolfo_Suárez_Madrid–Barajas_Airport | runwayName | \"18R/36L\""
] | "18R/36L is the runway name of the Adolfo Suárez Madrid-Barajas Airport." | [] | "Airport" | "train/Airport/1/Id40" |
"web_nlg_en-train-98" | "web_nlg_en-train-98" | [
"Afonso_Pena_International_Airport | elevationAboveTheSeaLevelInFeet | 2988"
] | "Afonso Pena International Airport is elevated 2988 feet above sea level." | [] | "Airport" | "train/Airport/1/Id41" |
"web_nlg_en-train-99" | "web_nlg_en-train-99" | [
"Afonso_Pena_International_Airport | elevationAboveTheSeaLevelInFeet | 2988"
] | "Afonso Pena International Airport has an elevation above the sea level (in feet) of 2988." | [] | "Airport" | "train/Airport/1/Id41" |
Dataset Card for GEM/web_nlg
Link to Main Data Card
You can find the main data card on the GEM Website.
Dataset Summary
WebNLG is a bi-lingual dataset (English, Russian) of parallel DBpedia triple sets and short texts that cover about 450 different DBpedia properties. The WebNLG data was originally created to promote the development of RDF verbalisers able to generate short text and to handle micro-planning (i.e., sentence segmentation and ordering, referring expression generation, aggregation); the goal of the task is to generate texts starting from 1 to 7 input triples which have entities in common (so the input is actually a connected Knowledge Graph). The dataset contains about 17,000 triple sets and 45,000 crowdsourced texts in English, and 7,000 triples sets and 19,000 crowdsourced texts in Russian. A challenging test set section with entities and/or properties that have not been seen at training time is available.
You can load the dataset via:
import datasets
data = datasets.load_dataset('GEM/web_nlg')
The data loader can be found here.
website
paper
First Dataset Release, WebNLG Challenge 2017 Report, WebNLG Challenge 2020 Report
authors
The principle curator of the dataset is Anastasia Shimorina (Université de Lorraine / LORIA, France). Throughout the WebNLG releases, several people contributed to their construction: Claire Gardent (CNRS / LORIA, France), Shashi Narayan (Google, UK), Laura Perez-Beltrachini (University of Edinburgh, UK), Elena Khasanova, and Thiago Castro Ferreira (Federal University of Minas Gerais, Brazil).
Dataset Overview
Where to find the Data and its Documentation
Webpage
Download
Paper
First Dataset Release, WebNLG Challenge 2017 Report, WebNLG Challenge 2020 Report
BibTex
Initial release of the dataset:
@inproceedings{gardent2017creating,
author = "Gardent, Claire
and Shimorina, Anastasia
and Narayan, Shashi
and Perez-Beltrachini, Laura",
title = "Creating Training Corpora for NLG Micro-Planners",
booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2017",
publisher = "Association for Computational Linguistics",
pages = "179--188",
location = "Vancouver, Canada",
doi = "10.18653/v1/P17-1017",
url = "http://www.aclweb.org/anthology/P17-1017"
}
The latest version 3.0:
@inproceedings{castro-ferreira20:bilin-bi-direc-webnl-shared,
title={The 2020 Bilingual, Bi-Directional WebNLG+ Shared Task Overview and Evaluation Results (WebNLG+ 2020)},
author={Castro Ferreira, Thiago and
Gardent, Claire and
Ilinykh, Nikolai and
van der Lee, Chris and
Mille, Simon and
Moussallem, Diego and
Shimorina, Anastasia},
booktitle = {Proceedings of the 3rd WebNLG Workshop on Natural Language Generation from the Semantic Web (WebNLG+ 2020)},
pages = "55--76",
year = 2020,
address = {Dublin, Ireland (Virtual)},
publisher = {Association for Computational Linguistics}}
Contact Email
Has a Leaderboard?
yes
Leaderboard Link
Leaderboard Details
The model outputs are evaluated against the crowdsourced references; the leaderboard reports BLEU-4, METEOR, chrF++, TER, BERTScore and BLEURT scores.
Languages and Intended Use
Multilingual?
yes
Covered Languages
Russian
, English
License
cc-by-nc-4.0: Creative Commons Attribution Non Commercial 4.0 International
Intended Use
The WebNLG dataset was created to promote the development (i) of RDF verbalisers and (ii) of microplanners able to handle a wide range of linguistic constructions. The dataset aims at covering knowledge in different domains ("categories"). The same properties and entities can appear in several categories.
Primary Task
Data-to-Text
Communicative Goal
A model should verbalize all and only the provided input triples in natural language.
Credit
Curation Organization Type(s)
academic
Curation Organization(s)
Université de Lorraine / LORIA, France, CNRS / LORIA, France, University of Edinburgh, UK, Federal University of Minas Gerais, Brazil
Dataset Creators
The principle curator of the dataset is Anastasia Shimorina (Université de Lorraine / LORIA, France). Throughout the WebNLG releases, several people contributed to their construction: Claire Gardent (CNRS / LORIA, France), Shashi Narayan (Google, UK), Laura Perez-Beltrachini (University of Edinburgh, UK), Elena Khasanova, and Thiago Castro Ferreira (Federal University of Minas Gerais, Brazil).
Funding
The dataset construction was funded by the French National Research Agency (ANR).
Who added the Dataset to GEM?
Simon Mille and Sebastian Gehrmann added the dataset and wrote the data card.
Dataset Structure
Data Fields
entry
: a data instance of the benchmark. Each entry has five attributes: a DBpedia category (category
), entry ID (eid
), shape, shape type, and triple set size (size
).
shape
: a string representation of the RDF tree with nested parentheses whereX
is a node (see Newick tree format).shape_type
: a type of the tree shape. We identify three types of tree shapes:chain
(the object of one triple is the subject of the other);sibling
(triples with a shared subject);mixed
(bothchain
andsibling
types present).
eid
: an entry ID. It is unique only within a category and a size.category
: a DBpedia category (Astronaut, City, MusicalWork, Politician, etc.).size
: the number of RDF triples in a set. Ranges from 1 to 7.
Each entry
has three fields: originaltripleset
, modifiedtripleset
, and lexs
.
originaltripleset
: a set of RDF triples as extracted from DBpedia. Each set of RDF triples is a tree. Triples have the subject-predicate-object structure.
modifiedtripleset
: a set of RDF triples as presented to crowdworkers (for more details on modifications, see below).
Original and modified triples serve different purposes: the original triples — to link data to a knowledge base (DBpedia), whereas the modified triples — to ensure consistency and homogeneity throughout the data. To train models, the modified triples should be used.
lexs
(shortened for lexicalisations): a natural language text verbalising the triples. Each lexicalisation has two attributes: a comment (comment
), and a lexicalisation ID (lid
). By default, comments have the value good
, except rare cases when they were manually marked as toFix
. That was done during the corpus creation, when it was seen that a lexicalisation did not exactly match a triple set.
Russian data has additional optional fields comparing to English:
<dbpedialinks>
: RDF triples extracted from DBpedia between English and Russian entities by means of the property sameAs
.
<links>
: RDF triples created manually for some entities to serve as pointers to translators. There are two types of them:
with
sameAs
(Spaniards | sameAs | испанцы
)with
includes
(Tomatoes, guanciale, cheese, olive oil | includes | гуанчиале
). Those were mostly created for string literals to translate some parts of them.
Lexicalisations in the Russian WebNLG have a new parameter lang
(values: en
, ru
) because original English texts were kept in the Russian version (see the example above).
Example Instance
{
"entry": {
"category": "Company",
"size": "4",
"shape": "(X (X) (X) (X) (X))",
"shape_type": "sibling",
"eid": "Id21",
"lexs": [
{
"comment": "good",
"lex": "Trane, which was founded on January 1st 1913 in La Crosse, Wisconsin, is based in Ireland. It has 29,000 employees.",
"lid": "Id1"
}
],
"modifiedtripleset": [
{
"subject": "Trane",
"property": "foundingDate",
"object": "1913-01-01"
},
{
"subject": "Trane",
"property": "location",
"object": "Ireland"
},
{
"subject": "Trane",
"property": "foundationPlace",
"object": "La_Crosse,_Wisconsin"
},
{
"subject": "Trane",
"property": "numberOfEmployees",
"object": "29000"
}
],
"originaltriplesets": {
"originaltripleset": [
{
"subject": "Trane",
"property": "foundingDate",
"object": "1913-01-01"
},
{
"subject": "Trane",
"property": "location",
"object": "Ireland"
},
{
"subject": "Trane",
"property": "foundationPlace",
"object": "La_Crosse,_Wisconsin"
},
{
"subject": "Trane",
"property": "numberOfEmployees",
"object": "29000"
}
]
}
}
}
The XML-formatted example is here.
Data Splits
English (v3.0) | Train | Dev | Test |
---|---|---|---|
triple sets | 13,211 | 1,667 | 1,779 |
texts | 35,426 | 4,464 | 5,150 |
properties | 372 | 290 | 220 |
Russian (v3.0) | Train | Dev | Test |
---|---|---|---|
triple sets | 5,573 | 790 | 1,102 |
texts | 14,239 | 2,026 | 2,780 |
properties | 226 | 115 | 192 |
Dataset in GEM
Rationale for Inclusion in GEM
Why is the Dataset in GEM?
Due to the constrained generation task, this dataset can be used to evaluate very specific and narrow generation capabilities.
Similar Datasets
yes
Unique Language Coverage
yes
Difference from other GEM datasets
The RDF-triple format is unique to WebNLG.
Ability that the Dataset measures
surface realization
GEM-Specific Curation
Modificatied for GEM?
yes
GEM Modifications
other
Modification Details
No changes to the main content of the dataset. The version 3.0 of the dataset is used.
Additional Splits?
yes
Split Information
23 special test sets for WebNLG were added to the GEM evaluation suite, 12 for English and 11 for Russian. For both languages, we created subsets of the training and development sets of ~500 randomly selected inputs each. The inputs were sampled proportionally from each category.
Two types of transformations have been applied to WebNLG: (i) input scrambling (English and Russian) and (ii) numerical value replacements (English); in both cases, a subset of about 500 inputs was randomly selected. For (i), the order of the triples was randomly reassigned (each triple kept the same Subject-Property-Object internal order). For (ii), the change was performed respecting the format of the current cardinal value (e.g., alpha, integer, or floating-point) and replacing it with a new random value. The new number is lower-bounded between zero and upper bounded to be within to the highest power of 10 unit for the given value (e.g., replacing 54 would result in a random value between 0-100). Floating values maintain the degree of precision.
For both languages, we did identify different subsets of the test set that we could compare to each other so that we would have a better understanding of the results. There are currently 8 selections that we have made:
Selection 1 (size): input length. This selection corresponds to the number of predicates in the input. By comparing inputs of different lengths, we can see to what extent NLG systems are able to handle different input sizes. The table below provides the relevant frequencies. Please be aware that comparing selections with fewer than 100 items may result in unreliable comparisons.
Input length | Frequency English | Frequency Russian |
---|---|---|
1 | 369 | 254 |
2 | 349 | 200 |
3 | 350 | 214 |
4 | 305 | 214 |
5 | 213 | 159 |
6 | 114 | 32 |
7 | 79 | 29 |
Selection 2 (frequency): seen/unseen single predicates. This selection corresponds to the inputs with only one predicate. We compare which predicates are seen/unseen in the training data. The table below provides the relevant frequencies. Note that the comparison is only valid for English. Not for Russian, since there is only one example of unseen single predicates.
_ in training | Frequency English | Frequency Russian |
---|---|---|
Seen | 297 | 253 |
Unseen | 72 | 1 |
Selection 3 (frequency): seen/unseen combinations of predicates. This selection checks for all combinations of predicates whether that combination has been seen in the training data. For example: if the combination of predicates A and B is seen, that means that there is an input in the training data consisting of two triples, where one triple uses predicate A and the other uses predicate B. If the combination is unseen, then the converse is true. The table below provides the relevant frequencies.
_ in training | Frequency English | Frequency Russian |
---|---|---|
unseen | 1295 | 354 |
seen | 115 | 494 |
Selection 4 (frequency): seen/unseen arguments. This selection checks for all input whether or not all arg1s and arg2s in the input have been seen during the training phase. For this selection, Seen is the default. Only if all arg1 instances for a particular input are unseen, do we count the arg1s of the input as unseen. The same holds for arg2. So "seen" here really means that at least some of the arg1s or arg2s are seen in the input. The table below provides the relevant frequencies. Note that the comparison is only valid for English. Not for Russian, since there are very few examples of unseen combinations of predicates.
Arguments seen in training? | Frequency English | Frequency Russian |
---|---|---|
both_seen | 518 | 1075 |
both_unseen | 1177 | 4 |
arg1_unseen | 56 | 19 |
arg2_unseen | 28 | 4 |
Selection 5 (shape): repeated subjects. For this selection, the subsets are based on the times a subject is repeated in the input; it only takes into account the maximum number of times a subject is repeated, that is, if in one input a subject appears 3 times and a different subject 2 times, this input will be in the "3_subjects_same' split. Unique_subjects means all subjects are different.
Max num. of repeated subjects | Frequency English | Frequency Russian |
---|---|---|
unique_subjects | 453 | 339 |
2_subjects_same | 414 | 316 |
3_subjects_same | 382 | 217 |
4_subjects_same | 251 | 143 |
5_subjects_same | 158 | 56 |
6_subjects_same | 80 | 19 |
7_subjects_same | 41 | 12 |
Selection 6 (shape): repeated objects. Same as for subjects above, but for objects. There are much less cases of repeated objects, so there are only two categories for this selection, unique_objects and some_objects_repeated; for the latter, we have up to 3 coreferring objects in English, and XXX in Russian.
Max num. of repeated objects | Frequency English | Frequency Russian |
---|---|---|
unique_objects | 1654 | 1099 |
some_objects_same | 125 | 3 |
Selection 7 (shape): repeated properties. Same as for objects above, but for properties; up to two properties can be the same in English, up to XXX in Russian.
Max num. of repeated properties | Frequency English | Frequency Russian |
---|---|---|
unique_properties | 1510 | 986 |
some_properties_same | 269 | 116 |
Selection 8 (shape): entities that appear both as subject and object. For this selection, we grouped together the inputs in which no entity is found as both subject and object, and on the other side inputs in which one or more entity/ies appear both as subject and as object. We found up to two such entities per input in English, and up to XXX in Russian.
Max num. of objects and subjects in common | Frequency English | Frequency Russian |
---|---|---|
unique_properties | 1322 | 642 |
some_properties_same | 457 | 460 |
Split Motivation
Robustness
Getting Started with the Task
Pointers to Resources
Dataset construction: main dataset paper, RDF triple extraction, Russian translation
WebNLG Challenge 2017: webpage, paper
WebNLG Challenge 2020: webpage, paper
Enriched version of WebNLG: repository, paper
Related research papers: webpage
Previous Results
Previous Results
Proposed Evaluation
For both languages, the participating systems are automatically evaluated in a multi-reference scenario. Each English hypothesis is compared to a maximum of 5 references, and each Russian one to a maximum of 7 references. On average, English data has 2.89 references per test instance, and Russian data has 2.52 references per instance.
In a human evaluation, example are uniformly sampled across size of triple sets and the following dimensions are assessed (on MTurk and Yandex.Toloka):
- Data Coverage: Does the text include descriptions of all predicates presented in the data?
- Relevance: Does the text describe only such predicates (with related subjects and objects), which are found in the data?
- Correctness: When describing predicates which are found in the data, does the text mention correct the objects and adequately introduces the subject for this specific predicate?
- Text Structure: Is the text grammatical, well-structured, written in acceptable English language?
- Fluency: Is it possible to say that the text progresses naturally, forms a coherent whole and it is easy to understand the text?
For additional information like the instructions, we refer to the original paper.
Previous results available?
yes
Other Evaluation Approaches
We evaluated a wide range of models as part of the GEM benchmark.
Relevant Previous Results
Results can be found on the GEM website.
Broader Social Context
Previous Work on the Social Impact of the Dataset
Usage of Models based on the Data
yes - related tasks
Social Impact Observations
We do not foresee any negative social impact in particular from this dataset or task.
Positive outlooks: Being able to generate good quality text from RDF data would permit, e.g., making this data more accessible to lay users, enriching existing text with information drawn from knowledge bases such as DBpedia or describing, comparing and relating entities present in these knowledge bases.
Impact on Under-Served Communities
Addresses needs of underserved Communities?
no
Discussion of Biases
Any Documented Social Biases?
yes
Links and Summaries of Analysis Work
This dataset is created using DBpedia RDF triples which naturally exhibit biases that have been found to exist in Wikipedia such as some forms of, e.g., gender bias.
The choice of entities, described by RDF trees, was not controlled. As such, they may contain gender biases; for instance, all the astronauts described by RDF triples are male. Hence, in texts, pronouns he/him/his occur more often. Similarly, entities can be related to the Western culture more often than to other cultures.
Are the Language Producers Representative of the Language?
In English, the dataset is limited to the language that crowdraters speak. In Russian, the language is heavily biased by the translationese of the translation system that is post-edited.
Considerations for Using the Data
PII Risks and Liability
Potential PII Risk
There is no PII in this dataset.
Licenses
Copyright Restrictions on the Dataset
non-commercial use only
Copyright Restrictions on the Language Data
public domain
Known Technical Limitations
Technical Limitations
The quality of the crowdsourced references is limited, in particular in terms of fluency/naturalness of the collected texts.
Russian data was machine-translated and then post-edited by crowdworkers, so some examples may still exhibit issues related to bad translations.
Unsuited Applications
Only a limited number of domains are covered in this dataset. As a result, it cannot be used as a general-purpose realizer.
- Downloads last month
- 670