SDK API Guide¶
Complete reference for the DataIOAPI
client class.
DataIOAPI Class¶
from dataio import DataIOAPI
The main client class for interacting with the DataIO API.
Constructor¶
DataIOAPI(base_url=None, api_key=None, data_dir=None)
¶
Initialize a new DataIO API client.
Parameters:
base_url
(str, optional): The base URL of the DataIO API. If not provided, uses theDATAIO_API_BASE_URL
environment variable.api_key
(str, optional): The API key for authentication. If not provided, uses theDATAIO_API_KEY
environment variable.data_dir
(str, optional): The directory to download the data to. If not provided, uses theDATAIO_DATA_DIR
environment variable.
Raises:
ValueError
: If neither environment variables nor parameters are provided for base_url or api_key.
Example:
# Using environment variables
client = DataIOAPI()
# Passing credentials directly
client = DataIOAPI(
base_url="https://dataio.artpark.ai/api/v1",
api_key="your_api_key",
data_dir="data"
)
Dataset Methods¶
list_datasets(limit=None)
¶
Get a list of all datasets available to the authenticated user.
Parameters:
limit
(int, optional): Maximum number of datasets to return. Defaults to 100 if not specified.
Returns:
list
: List of dataset dictionaries containing metadata for each dataset.
Example:
# Get all datasets (up to 100)
datasets = client.list_datasets()
# Get first 10 datasets
datasets = client.list_datasets(limit=10)
# Each dataset contains:
# - ds_id: Unique dataset identifier
# - title: Dataset title
# - description: Dataset description
# - tags: List of tag dictionaries with 'id' and 'tag_name'
# - collection: Collection information
get_dataset_details(dataset_id)
¶
Get detailed metadata for a specific dataset.
Parameters:
dataset_id
(str or int): The dataset ID. Can be the full ds_id or just the numeric part.
Returns:
dict
: Complete dataset metadata including title, description, collection, and other fields.
Raises:
ValueError
: If the dataset with the specified ID is not found.
Example:
# Using full dataset ID
details = client.get_dataset_details("TS0001DS9999")
# Using just the numeric part (will be zero-padded)
details = client.get_dataset_details("9999")
details = client.get_dataset_details(9999)
list_dataset_tables(dataset_id, bucket_type="STANDARDISED")
¶
Get a list of tables within a dataset, including download links.
Parameters:
dataset_id
(str): The dataset ID to get tables for.bucket_type
(str, optional): Type of bucket. Either “STANDARDISED” or “PREPROCESSED”. Defaults to “STANDARDISED”.
Note
Currently, only “STANDARDISED” datasets are available. “PREPROCESSED” datasets are not yet accessible through the API.
Returns:
list
: List of table dictionaries, each containing:table_name
: Name of the tabledownload_link
: Signed URL for downloading (expires in 1 hour)metadata
: Table-level metadata
Example:
# Get tables for a dataset
tables = client.list_dataset_tables("TS0001DS9999")
# Get preprocessed tables
tables = client.list_dataset_tables("TS0001DS9999", bucket_type="PREPROCESSED")
for table in tables:
print(f"Table: {table['table_name']}")
print(f"Download: {table['download_link']}")
download_dataset(dataset_id, **kwargs)
¶
Download a complete dataset with all its tables and metadata.
Parameters:
dataset_id
(str): The dataset ID to download.bucket_type
(str, optional): Bucket type to download. Defaults to “STANDARDISED”.root_dir
(str, optional): Root directory for downloads. Defaults to “data”.get_metadata
(bool, optional): Whether to download metadata file. Defaults to True.metadata_format
(str, optional): Format for metadata (“yaml” or “json”). Defaults to “yaml”.update_sync_history
(bool, optional): Whether to update sync history. Defaults to True.sync_history_file
(str, optional): Name of sync history file. Defaults to “sync-history.yaml”.
Returns:
str
: Path to the downloaded dataset directory.
Example:
# Basic download
path = client.download_dataset("TS0001DS9999")
# Download to custom directory with JSON metadata
path = client.download_dataset(
"TS0001DS9999",
root_dir="my_datasets",
metadata_format="json"
)
# Download without metadata
path = client.download_dataset(
"TS0001DS9999",
get_metadata=False
)
Directory Structure:
root_dir/
├── sync-history.yaml (if update_sync_history=True)
└── TS0001DS9999-Dataset_Title/
├── table1.csv
├── table2.csv
├── table3.csv
└── metadata.yaml (if get_metadata=True)
construct_dataset_metadata(dataset_details, bucket_type="STANDARDISED")
¶
Build comprehensive metadata combining dataset and table-level information.
Parameters:
dataset_details
(dict): Dataset details fromget_dataset_details()
.bucket_type
(str, optional): Bucket type for table metadata. Defaults to “STANDARDISED”.
Returns:
dict
: Combined metadata with dataset and table information.
Required fields in dataset_details:
title
: Dataset titledescription
: Dataset descriptioncollection
: Collection object withcategory_name
andcollection_name
Example:
dataset_details = client.get_dataset_details("TS0001DS9999")
metadata = client.construct_dataset_metadata(dataset_details)
# Metadata structure:
# - dataset_title: Title of the dataset
# - dataset_description: Description
# - category: Category name
# - collection: Collection name
# - dataset_tables: Dict of table metadata keyed by table name
Region and Shapefile Methods¶
get_children_regions(region_id)
¶
Get all direct children regions for a given parent region.
Parameters:
region_id
(str): The ID of the parent region to get children for.
Returns:
list
: List of region dictionaries containing metadata for each child region.
Example:
# Get children of a state region
children = client.get_children_regions("state_29")
for child in children:
print(f"Region ID: {child['region_id']}")
print(f"Name: {child['region_name']}")
print(f"Parent: {child['parent_region_id']}")
API Endpoint: GET /api/v1/regions/{region_id}/children
get_shapefile_list()
¶
Get a list of all available shapefiles.
Returns:
list
: List of shapefile dictionaries containing metadata for each shapefile.
Example:
shapefiles = client.get_shapefile_list()
for shapefile in shapefiles:
print(f"Region ID: {shapefile['region_id']}")
print(f"Name: {shapefile['region_name']}")
download_shapefile(region_id, shp_folder="data/GS0012DS0051-Shapefiles_India")
¶
Download a shapefile for a specific region.
Parameters:
region_id
(str): ID of the region to download shapefile for.shp_folder
(str, optional): Directory to save the shapefile. Defaults to “{data_dir}/GS0012DS0051-Shapefiles_India”, where data_dir is derived from the API client.
Returns:
str
: Path to the downloaded GeoJSON file.
Raises:
ValueError
: If shapefile for the specified region is not found.
Example:
# Download shapefile for a state
path = client.download_shapefile("state_29")
# Download to custom folder
path = client.download_shapefile(
"state_29",
shp_folder="my_shapefiles"
)
Note: Shapefiles are downloaded as GeoJSON format, not traditional .shp files.
Error Handling¶
The DataIO API client raises standard Python exceptions:
ValueError
: For invalid parameters or missing datarequests.HTTPError
: For HTTP-related errors (authentication, not found, etc.)requests.ConnectionError
: For network connectivity issues
Example:
try:
datasets = client.list_datasets()
except requests.HTTPError as e:
if e.response.status_code == 401:
print("Authentication failed - check your API key")
elif e.response.status_code == 403:
print("Access forbidden - insufficient permissions")
else:
print(f"HTTP error: {e}")
except ValueError as e:
print(f"Invalid parameter: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Environment Variables¶
The client uses these environment variables:
DATAIO_API_BASE_URL
: Base URL for the DataIO APIDATAIO_API_KEY
: API key for authenticationDATAIO_DATA_DIR
: Directory to download the data to.
Set these in a .env
file:
DATAIO_API_BASE_URL=https://dataio.artpark.ai/api/v1
DATAIO_API_KEY=your_api_key_here
DATAIO_DATA_DIR=data