dataio¶

Important

ARTPARK’s dataio is in beta (v0.4.0b14)! You can now use it and report your issues and feedback to (sneha) / (akhil) (AT) artpark (dot) in.

Overview¶

ARTPARK’s DataIO is a platform for managing and sharing data. It consists of our internal API server which manages the catalogue, and a python SDK and CLI for users. This documentation is for the SDK, which you can use to access our data. You can find us on PyPI here.

Please contact us for getting API keys.

Quickstart¶

You can start by reading the Quick Start guide for the SDK, or the CLI Guide for the CLI. The package is available on PyPI, and you can install it using pip or uv.

venv .venv
source .venv/bin/activate

pip install dataio-artpark

or using uv:

uv init
uv add dataio-artpark

Key Features¶

DataIO provides a Python SDK and a CLI for accessing and managing datasets with these core capabilities:

Dataset Discovery - List and search available datasets
Data Download - Download complete datasets or individual tables
Tag-based Filtering - Find datasets by categories like “Livestock”
Shapefile Support - Download geographic boundary data
Metadata Access - Get comprehensive dataset information

We are currently working on a front end to view and interact with the data. For now, you can view the list of datasets using the CLI or SDK.

CLI:

uv run dataio init
uv run dataio list-datasets

SDK:

from dataio import DataIOAPI

client = DataIOAPI()
datasets = client.list_datasets()
print(datasets)

Terminology¶

DataIO uses the following terminology:

Term	Description	Example
Table	A table is usually a csv file, but can also be a parquet file. This is a collection of records for a specific topic.	Karnataka livestock census district level data
Dataset	A dataset is a collection of tables, usually related to a specific overarching topic.	State Livestock Census Data, containing tables for Karnataka and Maharashtra
Bucket Type	A bucket type can be either `STANDARDISED` or `PREPROCESSED`: Standardised: The data is in a standardised format, ready to be used. This is the default bucket type and the data made available to analysts. Preprocessed: The data has been preprocessed by the team and stripped of PII/sensitive information. Not generally made available to analysts.

Endpoints¶

The API endpoints are documented in the Endpoints page.

Todo

Add support for additional file types
Comply with NDAP/NITI Aayog’s Data Standardisation Protocols for all geospatial data
Add additional advanced datasets
Build a front end to view and interact with the data.