Datasets:
The dataset viewer is not available for this split.
Error code: StreamingRowsError Exception: FileNotFoundError Message: https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Apparel_v1_00.tsv.gz Traceback: Traceback (most recent call last): File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 417, in _info await _file_info( File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 837, in _file_info r.raise_for_status() File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1005, in raise_for_status raise ClientResponseError( aiohttp.client_exceptions.ClientResponseError: 403, message='Forbidden', url=URL('https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Apparel_v1_00.tsv.gz') The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/src/services/worker/src/worker/utils.py", line 263, in get_rows_or_raise return get_rows( File "/src/services/worker/src/worker/utils.py", line 204, in decorator return func(*args, **kwargs) File "/src/services/worker/src/worker/utils.py", line 241, in get_rows rows_plus_one = list(itertools.islice(ds, rows_max_number + 1)) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1353, in __iter__ for key, example in ex_iterable: File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 207, in __iter__ yield from self.generate_examples_fn(**self.kwargs) File "/tmp/modules-cache/datasets_modules/datasets/amazon_us_reviews/17b2481be59723469538adeb8fd0a68b0ba363bbbdd71090e72c325ee6c7e563/amazon_us_reviews.py", line 176, in _generate_examples with open(file_path, "r", encoding="utf-8") as tsvfile: File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/streaming.py", line 74, in wrapper return function(*args, download_config=download_config, **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 496, in xopen file_obj = fsspec.open(file, mode=mode, *args, **kwargs).open() File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 134, in open return self.__enter__() File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 102, in __enter__ f = self.fs.open(self.path, mode=mode) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 1199, in open f = self._open( File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/filesystems/compression.py", line 82, in _open return self.file.open() File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 134, in open return self.__enter__() File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 102, in __enter__ f = self.fs.open(self.path, mode=mode) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 1199, in open f = self._open( File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 356, in _open size = size or self.info(path, **kwargs)["size"] File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 115, in wrapper return sync(self.loop, func, *args, **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 100, in sync raise return_result File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 55, in _runner result[0] = await coro File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 430, in _info raise FileNotFoundError(url) from exc FileNotFoundError: https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Apparel_v1_00.tsv.gz
Need help to make the dataset viewer work? Open a discussion for direct support.
Dataset Card for "amazon_us_reviews"
Dataset Summary
Amazon Customer Reviews (a.k.a. Product Reviews) is one of Amazons iconic products. In a period of over two decades since the first review in 1995, millions of Amazon customers have contributed over a hundred million reviews to express opinions and describe their experiences regarding products on the Amazon.com website. This makes Amazon Customer Reviews a rich source of information for academic researchers in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML), amongst others. Accordingly, we are releasing this data to further research in multiple disciplines related to understanding customer product experiences. Specifically, this dataset was constructed to represent a sample of customer evaluations and opinions, variation in the perception of a product across geographical regions, and promotional intent or bias in reviews. Over 130+ million customer reviews are available to researchers as part of this release. The data is available in TSV files in the amazon-reviews-pds S3 bucket in AWS US East Region. Each line in the data files corresponds to an individual review (tab delimited, with no quote and escape characters). Each Dataset contains the following columns : marketplace - 2 letter country code of the marketplace where the review was written. customer_id - Random identifier that can be used to aggregate reviews written by a single author. review_id - The unique ID of the review. product_id - The unique Product ID the review pertains to. In the multilingual dataset the reviews for the same product in different countries can be grouped by the same product_id. product_parent - Random identifier that can be used to aggregate reviews for the same product. product_title - Title of the product. product_category - Broad product category that can be used to group reviews (also used to group the dataset into coherent parts). star_rating - The 1-5 star rating of the review. helpful_votes - Number of helpful votes. total_votes - Number of total votes the review received. vine - Review was written as part of the Vine program. verified_purchase - The review is on a verified purchase. review_headline - The title of the review. review_body - The review text. review_date - The date the review was written.
Supported Tasks and Leaderboards
Languages
Dataset Structure
Data Instances
Apparel_v1_00
- Size of downloaded dataset files: 648.64 MB
- Size of the generated dataset: 2254.36 MB
- Total amount of disk used: 2903.00 MB
An example of 'train' looks as follows.
{
"customer_id": "45223824",
"helpful_votes": 0,
"marketplace": "US",
"product_category": "Apparel",
"product_id": "B016PUU3VO",
"product_parent": "893588059",
"product_title": "Fruit of the Loom Boys' A-Shirt (Pack of 4)",
"review_body": "I ordered the same size as I ordered last time, and these shirts were much larger than the previous order. They were also about 6 inches longer. It was like they sent men's shirts instead of boys' shirts. I'll be returning these...",
"review_date": "2015-01-01",
"review_headline": "Sizes not correct, too big overall and WAY too long",
"review_id": "R1N3Z13931J3O9",
"star_rating": 2,
"total_votes": 0,
"verified_purchase": 1,
"vine": 0
}
Automotive_v1_00
- Size of downloaded dataset files: 582.15 MB
- Size of the generated dataset: 1518.88 MB
- Total amount of disk used: 2101.03 MB
An example of 'train' looks as follows.
{
"customer_id": "16825098",
"helpful_votes": 0,
"marketplace": "US",
"product_category": "Automotive",
"product_id": "B000E4PCGE",
"product_parent": "694793259",
"product_title": "00-03 NISSAN SENTRA MIRROR RH (PASSENGER SIDE), Power, Non-Heated (2000 00 2001 01 2002 02 2003 03) NS35ER 963015M000",
"review_body": "Product was as described, new and a great look. Only bad thing is that one of the screws was stripped so I couldn't tighten all three.",
"review_date": "2015-08-31",
"review_headline": "new and a great look. Only bad thing is that one of ...",
"review_id": "R2RUIDUMDKG7P",
"star_rating": 3,
"total_votes": 0,
"verified_purchase": 1,
"vine": 0
}
Baby_v1_00
- Size of downloaded dataset files: 357.40 MB
- Size of the generated dataset: 956.30 MB
- Total amount of disk used: 1313.70 MB
An example of 'train' looks as follows.
This example was too long and was cropped:
{
"customer_id": "23299101",
"helpful_votes": 2,
"marketplace": "US",
"product_category": "Baby",
"product_id": "B00SN6F9NG",
"product_parent": "3470998",
"product_title": "Rhoost Nail Clipper for Baby - Ergonomically Designed and Easy to Use Baby Nail Clipper, Natural Wooden Bamboo - Baby Health and Personal Care Kits",
"review_body": "\"This is an absolute MUST item to have! I was scared to death to clip my baby's nails. I tried other baby nail clippers and th...",
"review_date": "2015-08-31",
"review_headline": "If fits so comfortably in my hand and I feel like I have ...",
"review_id": "R2DRL5NRODVQ3Z",
"star_rating": 5,
"total_votes": 2,
"verified_purchase": 1,
"vine": 0
}
Beauty_v1_00
- Size of downloaded dataset files: 914.08 MB
- Size of the generated dataset: 2397.39 MB
- Total amount of disk used: 3311.47 MB
An example of 'train' looks as follows.
{
"customer_id": "24655453",
"helpful_votes": 1,
"marketplace": "US",
"product_category": "Beauty",
"product_id": "B00SAQ9DZY",
"product_parent": "292127037",
"product_title": "12 New, High Quality, Amber 2 ml (5/8 Dram) Glass Bottles, with Orifice Reducer and Black Cap.",
"review_body": "These are great for small mixtures for EO's, especially for traveling. I only gave this 4 stars because of the orifice reducer. The hole is so small it is hard to get the oil out. Just needs to be slightly bigger.",
"review_date": "2015-08-31",
"review_headline": "Good Product",
"review_id": "R2A30ALEGLMCGN",
"star_rating": 4,
"total_votes": 1,
"verified_purchase": 1,
"vine": 0
}
Books_v1_00
- Size of downloaded dataset files: 2740.34 MB
- Size of the generated dataset: 7193.86 MB
- Total amount of disk used: 9934.20 MB
An example of 'train' looks as follows.
This example was too long and was cropped:
{
"customer_id": "49735028",
"helpful_votes": 0,
"marketplace": "US",
"product_category": "Books",
"product_id": "0664254969",
"product_parent": "248307276",
"product_title": "Presbyterian Creeds: A Guide to the Book of Confessions",
"review_body": "\"The Presbyterian Book of Confessions contains multiple Creeds for use by the denomination. This guidebook helps he lay person t...",
"review_date": "2015-08-31",
"review_headline": "The Presbyterian Book of Confessions contains multiple Creeds for use ...",
"review_id": "R2G519UREHRO8M",
"star_rating": 3,
"total_votes": 1,
"verified_purchase": 1,
"vine": 0
}
Data Fields
The data fields are the same among all splits.
Apparel_v1_00
marketplace
: astring
feature.customer_id
: astring
feature.review_id
: astring
feature.product_id
: astring
feature.product_parent
: astring
feature.product_title
: astring
feature.product_category
: astring
feature.star_rating
: aint32
feature.helpful_votes
: aint32
feature.total_votes
: aint32
feature.vine
: a classification label, with possible values includingY
(0),N
(1).verified_purchase
: a classification label, with possible values includingY
(0),N
(1).review_headline
: astring
feature.review_body
: astring
feature.review_date
: astring
feature.
Automotive_v1_00
marketplace
: astring
feature.customer_id
: astring
feature.review_id
: astring
feature.product_id
: astring
feature.product_parent
: astring
feature.product_title
: astring
feature.product_category
: astring
feature.star_rating
: aint32
feature.helpful_votes
: aint32
feature.total_votes
: aint32
feature.vine
: a classification label, with possible values includingY
(0),N
(1).verified_purchase
: a classification label, with possible values includingY
(0),N
(1).review_headline
: astring
feature.review_body
: astring
feature.review_date
: astring
feature.
Baby_v1_00
marketplace
: astring
feature.customer_id
: astring
feature.review_id
: astring
feature.product_id
: astring
feature.product_parent
: astring
feature.product_title
: astring
feature.product_category
: astring
feature.star_rating
: aint32
feature.helpful_votes
: aint32
feature.total_votes
: aint32
feature.vine
: a classification label, with possible values includingY
(0),N
(1).verified_purchase
: a classification label, with possible values includingY
(0),N
(1).review_headline
: astring
feature.review_body
: astring
feature.review_date
: astring
feature.
Beauty_v1_00
marketplace
: astring
feature.customer_id
: astring
feature.review_id
: astring
feature.product_id
: astring
feature.product_parent
: astring
feature.product_title
: astring
feature.product_category
: astring
feature.star_rating
: aint32
feature.helpful_votes
: aint32
feature.total_votes
: aint32
feature.vine
: a classification label, with possible values includingY
(0),N
(1).verified_purchase
: a classification label, with possible values includingY
(0),N
(1).review_headline
: astring
feature.review_body
: astring
feature.review_date
: astring
feature.
Books_v1_00
marketplace
: astring
feature.customer_id
: astring
feature.review_id
: astring
feature.product_id
: astring
feature.product_parent
: astring
feature.product_title
: astring
feature.product_category
: astring
feature.star_rating
: aint32
feature.helpful_votes
: aint32
feature.total_votes
: aint32
feature.vine
: a classification label, with possible values includingY
(0),N
(1).verified_purchase
: a classification label, with possible values includingY
(0),N
(1).review_headline
: astring
feature.review_body
: astring
feature.review_date
: astring
feature.
Data Splits
name | train |
---|---|
Apparel_v1_00 | 5906333 |
Automotive_v1_00 | 3514942 |
Baby_v1_00 | 1752932 |
Beauty_v1_00 | 5115666 |
Books_v1_00 | 10319090 |
Books_v1_01 | 6106719 |
Books_v1_02 | 3105520 |
Camera_v1_00 | 1801974 |
Digital_Ebook_Purchase_v1_00 | 12520722 |
Digital_Ebook_Purchase_v1_01 | 5101693 |
Digital_Music_Purchase_v1_00 | 1688884 |
Digital_Software_v1_00 | 102084 |
Digital_Video_Download_v1_00 | 4057147 |
Digital_Video_Games_v1_00 | 145431 |
Electronics_v1_00 | 3093869 |
Furniture_v1_00 | 792113 |
Gift_Card_v1_00 | 149086 |
Grocery_v1_00 | 2402458 |
Health_Personal_Care_v1_00 | 5331449 |
Home_Entertainment_v1_00 | 705889 |
Home_Improvement_v1_00 | 2634781 |
Home_v1_00 | 6221559 |
Jewelry_v1_00 | 1767753 |
Kitchen_v1_00 | 4880466 |
Lawn_and_Garden_v1_00 | 2557288 |
Luggage_v1_00 | 348657 |
Major_Appliances_v1_00 | 96901 |
Mobile_Apps_v1_00 | 5033376 |
Mobile_Electronics_v1_00 | 104975 |
Music_v1_00 | 4751577 |
Musical_Instruments_v1_00 | 904765 |
Office_Products_v1_00 | 2642434 |
Outdoors_v1_00 | 2302401 |
PC_v1_00 | 6908554 |
Personal_Care_Appliances_v1_00 | 85981 |
Pet_Products_v1_00 | 2643619 |
Shoes_v1_00 | 4366916 |
Software_v1_00 | 341931 |
Sports_v1_00 | 4850360 |
Tools_v1_00 | 1741100 |
Toys_v1_00 | 4864249 |
Video_DVD_v1_00 | 5069140 |
Video_Games_v1_00 | 1785997 |
Video_v1_00 | 380604 |
Watches_v1_00 | 960872 |
Wireless_v1_00 | 9002021 |
Dataset Creation
Curation Rationale
Source Data
Initial Data Collection and Normalization
Who are the source language producers?
Annotations
Annotation process
Who are the annotators?
Personal and Sensitive Information
Considerations for Using the Data
Social Impact of Dataset
Discussion of Biases
Other Known Limitations
Additional Information
Dataset Curators
Licensing Information
https://s3.amazonaws.com/amazon-reviews-pds/LICENSE.txt
By accessing the Amazon Customer Reviews Library ("Reviews Library"), you agree that the Reviews Library is an Amazon Service subject to the Amazon.com Conditions of Use and you agree to be bound by them, with the following additional conditions:
In addition to the license rights granted under the Conditions of Use, Amazon or its content providers grant you a limited, non-exclusive, non-transferable, non-sublicensable, revocable license to access and use the Reviews Library for purposes of academic research. You may not resell, republish, or make any commercial use of the Reviews Library or its contents, including use of the Reviews Library for commercial research, such as research related to a funding or consultancy contract, internship, or other relationship in which the results are provided for a fee or delivered to a for-profit organization. You may not (a) link or associate content in the Reviews Library with any personal information (including Amazon customer accounts), or (b) attempt to determine the identity of the author of any content in the Reviews Library. If you violate any of the foregoing conditions, your license to access and use the Reviews Library will automatically terminate without prejudice to any of the other rights or remedies Amazon may have.
Citation Information
No citation information.
Contributions
Thanks to @joeddav for adding this dataset.
- Downloads last month
- 79,492