dataio.sdk.user
¶
Module Contents¶
Classes¶
API Client for interacting with the DataIO API. |
API¶
- class dataio.sdk.user.DataIOAPI(base_url: Optional[str] = None, api_key: Optional[str] = None, data_dir: Optional[str] = None)[source]¶
API Client for interacting with the DataIO API.
- Parameters:
base_url (str) – The base URL of the DataIO API. Defaults to the value of the
DATAIO_API_BASE_URL
environment variable.api_key (str) – The API key for the DataIO API. Defaults to the value of the
DATAIO_API_KEY
environment variable.data_dir (str) – The directory to download the data to. Defaults to the value of the
DATAIO_DATA_DIR
environment variable.
Initialization
- _request(method, endpoint, **kwargs)[source]¶
Make a request to the DataIO API.
- Parameters:
method – The HTTP method to use.
endpoint – The endpoint to request.
kwargs – Additional keyword arguments to pass to the request.
- list_datasets(limit=None)[source]¶
Get a list of all datasets.
- Parameters:
limit (int) – The maximum number of datasets to return. Defaults to None, which returns 100 datasets by default.
- Returns:
A list of datasets.
- Return type:
list
- list_dataset_tables(dataset_id, bucket_type='STANDARDISED')[source]¶
Get a list of tables for a given dataset, with download links for each table
- Parameters:
dataset_id (str) – The ID of the dataset to get tables for. This is the
ds_id
field in the dataset metadata.bucket_type (str) – The type of bucket to get tables for. Defaults to “STANDARDISED”. Other option is “PREPROCESSED”.
- Returns:
A list of tables.
- Return type:
list
- _get_file(url)[source]¶
Get a file from a URL
- Parameters:
url (str) – The URL to get the file from.
- Returns:
The file content.
- Return type:
bytes
- get_dataset_details(dataset_id: Union[str, int])[source]¶
Get the details of a dataset - this is the dataset level metadata.
- Parameters:
dataset_id (str) – The ID of the dataset to get details for. This is the
ds_id
field in the dataset metadata.- Returns:
The dataset details.
- _get_download_links(dataset_id, bucket_type='STANDARDISED')[source]¶
Get download links for a dataset.
- Parameters:
dataset_id (str) – The ID of the dataset to get download links for. This is the
ds_id
field in the dataset metadata.bucket_type (str) – The type of bucket to get download links for. Defaults to “STANDARDISED”. Other option is “PREPROCESSED”.
- Returns:
A dictionary of download links.
- Return type:
dict
- construct_dataset_metadata(dataset_details: Optional[dict] = None, bucket_type='STANDARDISED')[source]¶
Get the metadata for a dataset. This combines dataset level metadata with table level metadata.
- Parameters:
dataset_details (dict) – The dataset details. This will be validated for the presence of the following fields: title, description, collection, category_name, collection_name.
bucket_type (str) – The type of bucket to get the table metadata for. Defaults to “STANDARDISED”. Other option is “PREPROCESSED”.
- Returns:
The dataset metadata. This includes the dataset title, description, category, collection, and tables with their table-level metadata.
- Return type:
dict
- download_dataset(dataset_id, bucket_type='STANDARDISED', root_dir=None, get_metadata=True, metadata_format='yaml', update_sync_history=True, sync_history_file='sync-history.yaml')[source]¶
Download a dataset, along with its metadata.
- Parameters:
dataset_id (str) – The unique identifier of the dataset to download. This is the
ds_id
field in the dataset metadata.bucket_type (str (default: “STANDARDISED”)) – The type of bucket to download. Defaults to “STANDARDISED”. Other option is “PREPROCESSED”.
root_dir (str (default: “data”)) – The directory to download the dataset to. Defaults to “data”.
get_metadata (bool (default: True)) – Whether to include metadata in the download links. Defaults to True.
metadata_format (str (default: “yaml”)) – The format to download the metadata in. Defaults to “yaml”. Other option is “json”.
- Returns:
The directory the dataset was downloaded to.
- Return type:
str
- get_children_regions(region_id: str)[source]¶
Get all direct children regions for a given parent region.
- Parameters:
region_id (str) – The ID of the parent region to get children for.
- Returns:
A list of child regions with their metadata.
- Return type:
list
- get_shapefile_list()[source]¶
Get a list of all shapefiles.
- Returns:
A list of shapefiles.
- Return type:
list
- download_shapefile(region_id: str, shp_folder: str = None)[source]¶
Download a shapefile.
- Parameters:
region_id (str) – The ID of the region to download the shapefile for.
shp_folder (str) – The folder with the data directory to download the shapefile to. Defaults to “{data_dir}/GS0012DS0051-Shapefiles_India”, where data_dir is derived from the API client.
compress (bool) – Whether to compress the shapefile. Defaults to True.
- Returns:
The shapefile.
- Return type:
bytes