`dataio.sdk.user`¶

Module Contents¶

Classes¶

`DatasetList`
`DataIOAPI`	API Client for interacting with the DataIO API.

API¶

class dataio.sdk.user.DatasetList[source]¶

Bases: list

__str__()[source]¶

class dataio.sdk.user.DataIOAPI(base_url: Optional[str] = None, api_key: Optional[str] = None, data_dir: Optional[str] = None)[source]¶

API Client for interacting with the DataIO API.

Parameters:

base_url (str) – The base URL of the DataIO API. Defaults to the value of the DATAIO_API_BASE_URL environment variable.
api_key (str) – The API key for the DataIO API. Defaults to the value of the DATAIO_API_KEY environment variable.
data_dir (str) – The directory to download the data to. Defaults to the value of the DATAIO_DATA_DIR environment variable.

Initialization

_request(method, endpoint, **kwargs)[source]¶

Make a request to the DataIO API.

Parameters:

method – The HTTP method to use.
endpoint – The endpoint to request.
kwargs – Additional keyword arguments to pass to the request.

list_datasets(limit=None)[source]¶

Get a list of all datasets.

Parameters:: limit (int) – The maximum number of datasets to return. Defaults to None, which returns 100 datasets by default.
Returns:: A list of datasets.
Return type:: list

list_dataset_tables(dataset_id, bucket_type='STANDARDISED')[source]¶

Get a list of tables for a given dataset, with download links for each table

Parameters:

dataset_id (str) – The ID of the dataset to get tables for. This is the ds_id field in the dataset metadata.
bucket_type (str) – The type of bucket to get tables for. Defaults to “STANDARDISED”. Other option is “PREPROCESSED”.

Returns:

A list of tables.

Return type:

list

_get_file(url)[source]¶

Get a file from a URL

Parameters:: url (str) – The URL to get the file from.
Returns:: The file content.
Return type:: bytes

get_dataset_details(dataset_id: Union[str, int])[source]¶

Get the details of a dataset - this is the dataset level metadata.

Parameters:: dataset_id (str) – The ID of the dataset to get details for. This is the ds_id field in the dataset metadata.
Returns:: The dataset details.

_get_download_links(dataset_id, bucket_type='STANDARDISED')[source]¶

Get download links for a dataset.

Parameters:

dataset_id (str) – The ID of the dataset to get download links for. This is the ds_id field in the dataset metadata.
bucket_type (str) – The type of bucket to get download links for. Defaults to “STANDARDISED”. Other option is “PREPROCESSED”.

Returns:

A dictionary of download links.

Return type:

dict

construct_dataset_metadata(dataset_details: Optional[dict] = None, bucket_type='STANDARDISED')[source]¶

Get the metadata for a dataset. This combines dataset level metadata with table level metadata.

Parameters:

dataset_details (dict) – The dataset details. This will be validated for the presence of the following fields: title, description, collection, category_name, collection_name.
bucket_type (str) – The type of bucket to get the table metadata for. Defaults to “STANDARDISED”. Other option is “PREPROCESSED”.

Returns:

The dataset metadata. This includes the dataset title, description, category, collection, and tables with their table-level metadata.

Return type:

dict

download_dataset(dataset_id, bucket_type='STANDARDISED', root_dir=None, get_metadata=True, metadata_format='yaml', update_sync_history=True, sync_history_file='sync-history.yaml')[source]¶

Download a dataset, along with its metadata.

Parameters:

dataset_id (str) – The unique identifier of the dataset to download. This is the ds_id field in the dataset metadata.
bucket_type (str (default: “STANDARDISED”)) – The type of bucket to download. Defaults to “STANDARDISED”. Other option is “PREPROCESSED”.
root_dir (str (default: “data”)) – The directory to download the dataset to. Defaults to “data”.
get_metadata (bool (default: True)) – Whether to include metadata in the download links. Defaults to True.
metadata_format (str (default: “yaml”)) – The format to download the metadata in. Defaults to “yaml”. Other option is “json”.

Returns:

The directory the dataset was downloaded to.

Return type:

str

get_children_regions(region_id: str)[source]¶

Get all direct children regions for a given parent region.

Parameters:: region_id (str) – The ID of the parent region to get children for.
Returns:: A list of child regions with their metadata.
Return type:: list

get_shapefile_list()[source]¶

Get a list of all shapefiles.

Returns:: A list of shapefiles.
Return type:: list

download_shapefile(region_id: str, shp_folder: str = None)[source]¶

Download a shapefile.

Parameters:

region_id (str) – The ID of the region to download the shapefile for.
shp_folder (str) – The folder with the data directory to download the shapefile to. Defaults to “{data_dir}/GS0012DS0051-Shapefiles_India”, where data_dir is derived from the API client.
compress (bool) – Whether to compress the shapefile. Defaults to True.

Returns:

The shapefile.

Return type:

bytes

dataio.sdk.user¶

Module Contents¶

Classes¶

API¶

`dataio.sdk.user`¶