Datasets:
UK PV dataset
PV solar generation data from the UK. This dataset contains data from 1311 PV systems from 2018 to 2021. Time granularity varies from 2 minutes to 30 minutes.
This data is collected from live PV systems in the UK. We have obfuscated the location of the PV systems for privacy. If you are the owner of a PV system in the dataset, and do not want this data to be shared, please do get in contact with [email protected].
Files
- metadata.csv: Data about the PV systems, e.g location
- 2min.parquet: Power output for PV systems every 2 minutes.
- 5min.parquet: Power output for PV systems every 5 minutes.
- 30min.parquet: Power output for PV systems every 30 minutes.
- pv.netcdf: (legacy) Time series of PV solar generation every 5 minutes
metadata.csv
Metadata of the different PV systems.
Note that there are extra PV systems in this metadata that do not appear in the PV time-series data.
The csv columns are:
- ss_id: the id of the system
- latitude_rounded: latitude of the PV system, but rounded to approximately the nearest km
- longitude_rounded: latitude of the PV system, but rounded to approximately the nearest km
- llsoacd: TODO
- orientation: The orientation of the PV system
- tilt: The tilt of the PV system
- kwp: The capacity of the PV system
- operational_at: the datetime the PV system started working
{2,5,30}min.parquet
Time series of solar generation for a number of sytems. Each file includes the systems for which there is enough granularity. In particular the systems in 2min.parquet and 5min.parquet are also in 30min.parquet.
The files contain 3 columns:
- ss_id: the id of the system
- timestamp: the timestamp
- generation_wh: the generated power (in kW) at the given timestamp for the given system
pv.netcdf (legacy)
Time series data of PV solar generation data is in an xarray format.
The data variables are the same as 'ss_id' in the metadata. Each data variable contains the solar generation (in kW) for that PV system. The ss_id's here are a subset of all the ss_id's in the metadata The coordinates of the date are tagged as 'datetime' which is the datetime of the solar generation reading.
This is a subset of the more recent 5min.parquet
file.
example
using Model Database Datasets
from datasets import load_dataset
dataset = load_dataset("openclimatefix/uk_pv")
useful links
https://huggingface.co/docs/datasets/share - this repo was made by following this tutorial
- Downloads last month
- 8