Overview
Datasets Server automatically converts and publishes public datasets less than 5GB on the Hub as Parquet files. Parquet files are column-based and they shine when you’re working with big data. There are several different libraries you can use to work with the published Parquet files:
- Polars, a Rust based DataFrame library
- Pandas, a data analysis tool for working with data structures
- DuckDB, a high-performance SQL database for analytical queries
- ClickHouse, a column-oriented database management system for online analytical processing