dataio.sdk.admin

Admin SDK for DataIO - Dataset upload and management operations.

Module Contents

Classes

DataIOAdminAPI

Admin API Client for uploading datasets to DataIO.

Data

API

dataio.sdk.admin.console

‘Console(…)’

class dataio.sdk.admin.DataIOAdminAPI(base_url: Optional[str] = None, api_key: Optional[str] = None, data_dir: Optional[str] = None)[source]

Admin API Client for uploading datasets to DataIO.

Parameters:
  • base_url – The base URL of the DataIO API. Defaults to DATAIO_API_BASE_URL env var.

  • api_key – The API key for admin access. Defaults to DATAIO_API_KEY env var.

  • data_dir – The directory containing datasets. Defaults to DATAIO_DATA_DIR env var.

Initialization

_request(method: str, endpoint: str, **kwargs) Any[source]

Make a request to the DataIO API.

_parse_dataset_folder(folder_path: str) Dict[str, Any][source]

Parse a dataset folder and extract all required information.

Expected folder structure:

  • {ds_id}-{title}/

    • info.yml (dataset-level metadata)

    • metadata.yaml (table-level metadata)

    • *.csv (data files)

create_data_owner(name: str, contact_person: Optional[str] = None, contact_person_email: Optional[str] = None) Dict[source]

Create a data owner entry.

Parameters:
  • name – Name of the data owner (must be unique)

  • contact_person – Optional contact person name

  • contact_person_email – Optional contact person email

Returns:

Created data owner object

ensure_data_owner_exists(name: str) bool[source]

Ensure a data owner exists, creating it if necessary.

Parameters:

name – Name of the data owner

Returns:

True if created, False if already existed

create_raw_dataset(rds_id: str, title: str, source: str) Dict[source]

Create a raw dataset entry.

Parameters:
  • rds_id – Raw dataset identifier

  • title – Raw dataset title

  • source – Raw dataset source (URL or description)

Returns:

Created raw dataset object

create_dataset(ds_id: str, title: str, collection_id: str, data_owner_name: str, raw_dataset_ids: List[str], description: Optional[str] = None, spatial_coverage_region_id: Optional[str] = None, spatial_resolution: Optional[str] = None, temporal_coverage_start_date: Optional[str] = None, temporal_coverage_end_date: Optional[str] = None, temporal_resolution: Optional[str] = None, access_level: str = 'NONE', tags: Optional[List[str]] = None, additional_metadata: Optional[Dict] = None) Dict[source]

Create a dataset.

Parameters:
  • ds_id – Dataset identifier (12 chars, must start with collection_id)

  • title – Dataset title

  • collection_id – Collection identifier (first 6 chars of ds_id)

  • data_owner_name – Name of data owner (must exist in database)

  • raw_dataset_ids – List of raw dataset IDs to link

  • description – Optional description

  • spatial_coverage_region_id – Optional region ID

  • spatial_resolution – Optional spatial resolution enum

  • temporal_coverage_start_date – Optional start date (YYYY or YYYY-MM-DD)

  • temporal_coverage_end_date – Optional end date

  • temporal_resolution – Optional temporal resolution enum

  • access_level – Access level (NONE|VIEW|DOWNLOAD)

  • tags – Optional list of tags

  • additional_metadata – Optional additional metadata dict

Returns:

Created dataset object

upload_table(dataset_id: str, bucket_type: str, csv_path: pathlib.Path, table_metadata: Dict) Dict[source]

Upload a table (CSV file) to a dataset.

Parameters:
  • dataset_id – Dataset ID to upload to

  • bucket_type – Bucket type (PREPROCESSED or STANDARDISED)

  • csv_path – Path to the CSV file

  • table_metadata – Table metadata dict with table_name, description, data_dictionary

Returns:

Upload response

upload_dataset_folder(folder_path: str, bucket_type: str = 'STANDARDISED', dry_run: bool = False) Dict[str, Any][source]

Upload an entire dataset folder (create dataset + upload all tables).

Parameters:
  • folder_path – Path to the dataset folder

  • bucket_type – Bucket type (PREPROCESSED or STANDARDISED)

  • dry_run – If True, only validate without making API calls

Returns:

Summary of upload operation

upload_all_datasets(data_dir: Optional[str] = None, bucket_type: str = 'STANDARDISED', dry_run: bool = False) List[Dict[str, Any]][source]

Upload all dataset folders in the data directory.

Parameters:
  • data_dir – Directory containing dataset folders

  • bucket_type – Bucket type for all uploads

  • dry_run – If True, only validate without making API calls

Returns:

List of upload results for each dataset