dataio.sdk.user

Module Contents

Classes

DatasetList

DataIOAPI

API Client for interacting with the DataIO API.

API

class dataio.sdk.user.DatasetList[source]

Bases: list

__str__()[source]
class dataio.sdk.user.DataIOAPI(base_url: Optional[str] = None, api_key: Optional[str] = None, data_dir: Optional[str] = None)[source]

API Client for interacting with the DataIO API.

Parameters:
  • base_url (str) – The base URL of the DataIO API. Defaults to the value of the DATAIO_API_BASE_URL environment variable.

  • api_key (str) – The API key for the DataIO API. Defaults to the value of the DATAIO_API_KEY environment variable.

  • data_dir (str) – The directory to download the data to. Defaults to the value of the DATAIO_DATA_DIR environment variable.

Initialization

_request(method, endpoint, **kwargs)[source]

Make a request to the DataIO API.

Parameters:
  • method – The HTTP method to use.

  • endpoint – The endpoint to request.

  • kwargs – Additional keyword arguments to pass to the request.

list_datasets(limit=None)[source]

Get a list of all datasets.

Parameters:

limit (int) – The maximum number of datasets to return. Defaults to None, which returns 100 datasets by default.

Returns:

A list of datasets.

Return type:

list

list_dataset_tables(dataset_id, bucket_type='STANDARDISED')[source]

Get a list of tables for a given dataset, with download links for each table

Parameters:
  • dataset_id (str) – The ID of the dataset to get tables for. This is the ds_id field in the dataset metadata.

  • bucket_type (str) – The type of bucket to get tables for. Defaults to “STANDARDISED”. Other option is “PREPROCESSED”.

Returns:

A list of tables.

Return type:

list

_get_file(url)[source]

Get a file from a URL

Parameters:

url (str) – The URL to get the file from.

Returns:

The file content.

Return type:

bytes

get_dataset_details(dataset_id: Union[str, int])[source]

Get the details of a dataset - this is the dataset level metadata.

Parameters:

dataset_id (str) – The ID of the dataset to get details for. This is the ds_id field in the dataset metadata.

Returns:

The dataset details.

Get download links for a dataset.

Parameters:
  • dataset_id (str) – The ID of the dataset to get download links for. This is the ds_id field in the dataset metadata.

  • bucket_type (str) – The type of bucket to get download links for. Defaults to “STANDARDISED”. Other option is “PREPROCESSED”.

Returns:

A dictionary of download links.

Return type:

dict

construct_dataset_metadata(dataset_details: Optional[dict] = None, bucket_type='STANDARDISED')[source]

Get the metadata for a dataset. This combines dataset level metadata with table level metadata.

Parameters:
  • dataset_details (dict) – The dataset details. This will be validated for the presence of the following fields: title, description, collection, category_name, collection_name.

  • bucket_type (str) – The type of bucket to get the table metadata for. Defaults to “STANDARDISED”. Other option is “PREPROCESSED”.

Returns:

The dataset metadata. This includes the dataset title, description, category, collection, and tables with their table-level metadata.

Return type:

dict

download_dataset(dataset_id, bucket_type='STANDARDISED', root_dir=None, get_metadata=True, metadata_format='yaml', update_sync_history=True, sync_history_file='sync-history.yaml')[source]

Download a dataset, along with its metadata.

Parameters:
  • dataset_id (str) – The unique identifier of the dataset to download. This is the ds_id field in the dataset metadata.

  • bucket_type (str (default: “STANDARDISED”)) – The type of bucket to download. Defaults to “STANDARDISED”. Other option is “PREPROCESSED”.

  • root_dir (str (default: “data”)) – The directory to download the dataset to. Defaults to “data”.

  • get_metadata (bool (default: True)) – Whether to include metadata in the download links. Defaults to True.

  • metadata_format (str (default: “yaml”)) – The format to download the metadata in. Defaults to “yaml”. Other option is “json”.

Returns:

The directory the dataset was downloaded to.

Return type:

str

get_children_regions(region_id: str)[source]

Get all direct children regions for a given parent region.

Parameters:

region_id (str) – The ID of the parent region to get children for.

Returns:

A list of child regions with their metadata.

Return type:

list

get_shapefile_list()[source]

Get a list of all shapefiles.

Returns:

A list of shapefiles.

Return type:

list

download_shapefile(region_id: str, shp_folder: str = None)[source]

Download a shapefile.

Parameters:
  • region_id (str) – The ID of the region to download the shapefile for.

  • shp_folder (str) – The folder with the data directory to download the shapefile to. Defaults to “{data_dir}/GS0012DS0051-Shapefiles_India”, where data_dir is derived from the API client.

  • compress (bool) – Whether to compress the shapefile. Defaults to True.

Returns:

The shapefile.

Return type:

bytes