The dataset viewer is not available for this split.
Error code: StreamingRowsError Exception: FileNotFoundError Message: [Errno 2] Unable to open file (unable to open file: name = 'https://huggingface.co/datasets/DarthReca/california_burned_areas/resolve/main/raw/patched/512x512.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0) Traceback: Traceback (most recent call last): File "/src/services/worker/src/worker/utils.py", line 257, in get_rows_or_raise return get_rows( File "/src/services/worker/src/worker/utils.py", line 198, in decorator return func(*args, **kwargs) File "/src/services/worker/src/worker/utils.py", line 235, in get_rows rows_plus_one = list(itertools.islice(ds, rows_max_number + 1)) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1379, in __iter__ for key, example in ex_iterable: File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 233, in __iter__ yield from self.generate_examples_fn(**self.kwargs) File "/tmp/modules-cache/datasets_modules/datasets/DarthReca--california_burned_areas/b366661cf1081924dd360ebcc89b085a0e0e2ba2db9b60ea6f0e5b527bbc7b98/california_burned_areas.py", line 139, in _generate_examples with h5py.File(filepath, "r") as f: File "/src/services/worker/.venv/lib/python3.9/site-packages/h5py/_hl/files.py", line 567, in __init__ fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr) File "/src/services/worker/.venv/lib/python3.9/site-packages/h5py/_hl/files.py", line 231, in make_fid fid = h5f.open(name, flags, fapl=fapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5f.pyx", line 106, in h5py.h5f.open FileNotFoundError: [Errno 2] Unable to open file (unable to open file: name = 'https://huggingface.co/datasets/DarthReca/california_burned_areas/resolve/main/raw/patched/512x512.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
Need help to make the dataset viewer work? Open a discussion for direct support.
California Burned Areas Dataset
Working on adding more data
Dataset Summary
This dataset contains images from Sentinel-2 satellites taken before and after a wildfire. The ground truth masks are provided by the California Department of Forestry and Fire Protection and they are mapped on the images.
Supported Tasks
The dataset is designed to do binary semantic segmentation of burned vs unburned areas.
Dataset Structure
We opted to use HDF5 to grant better portability and lower file size than GeoTIFF.
Dataset opening
Using the dataset library, you download only the pre-patched raw version for simplicity.
from dataset import load_dataset
# There are two available configurations, "post-fire" and "pre-post-fire."
dataset = load_dataset("DarthReca/california_burned_areas", name="post-fire")
The dataset was compressed using h5py
and BZip2 from hdf5plugin
. WARNING: hdf5plugin
is necessary to extract data.
Data Instances
Each matrix has a shape of 5490x5490xC, where C is 12 for pre-fire and post-fire images, while it is 0 for binary masks. Pre-patched version is provided with matrices of size 512x512xC, too. In this case, only mask with at least one positive pixel is present.
You can find two versions of the dataset: raw (without any transformation) and normalized (with data normalized in the range 0-255). Our suggestion is to use the raw version to have the possibility to apply any wanted pre-processing step.
Data Fields
In each standard HDF5 file, you can find post-fire, pre-fire images, and binary masks. The file is structured in this way:
βββ foldn
β βββ uid0
β β βββ pre_fire
β β βββ post_fire
β β βββ mask
β βββ uid1
β βββ post_fire
β βββ mask
β
βββ foldm
βββ uid2
β βββ post_fire
β βββ mask
βββ uid3
βββ pre_fire
βββ post_fire
βββ mask
...
where foldn
and foldm
are fold names and uidn
is a unique identifier for the wildfire.
For the pre-patched version, the structure is:
root
|
|-- uid0_x: {post_fire, pre_fire, mask}
|
|-- uid0_y: {post_fire, pre_fire, mask}
|
|-- uid1_x: {post_fire, mask}
|
...
the fold name is stored as an attribute.
Data Splits
There are 5 random splits whose names are: 0, 1, 2, 3, and 4.
Source Data
Data are collected directly from Copernicus Open Access Hub through the API. The band files are aggregated into one single matrix.
Additional Information
Licensing Information
This work is under OpenRAIL license.
Citation Information
If you plan to use this dataset in your work please give the credit to Sentinel-2 mission and the California Department of Forestry and Fire Protection and cite using this BibTex:
@article{cabuar,
title={Ca{B}u{A}r: California {B}urned {A}reas dataset for delineation},
author={Rege Cambrin, Daniele and Colomba, Luca and Garza, Paolo},
journal={IEEE Geoscience and Remote Sensing Magazine},
doi={10.1109/MGRS.2023.3292467},
year={2023}
}
- Downloads last month
- 21