dataio.sdk.admin¶
Admin SDK for DataIO - Dataset upload and management operations.
Module Contents¶
Classes¶
Admin API Client for uploading datasets to DataIO. |
Data¶
API¶
- dataio.sdk.admin.console¶
‘Console(…)’
- class dataio.sdk.admin.DataIOAdminAPI(base_url: Optional[str] = None, api_key: Optional[str] = None, data_dir: Optional[str] = None)[source]¶
Admin API Client for uploading datasets to DataIO.
- Parameters:
base_url – The base URL of the DataIO API. Defaults to DATAIO_API_BASE_URL env var.
api_key – The API key for admin access. Defaults to DATAIO_API_KEY env var.
data_dir – The directory containing datasets. Defaults to DATAIO_DATA_DIR env var.
Initialization
- _parse_dataset_folder(folder_path: str) Dict[str, Any][source]¶
Parse a dataset folder and extract all required information.
Expected folder structure:
{ds_id}-{title}/
info.yml (dataset-level metadata)
metadata.yaml (table-level metadata)
*.csv (data files)
- create_data_owner(name: str, contact_person: Optional[str] = None, contact_person_email: Optional[str] = None) Dict[source]¶
Create a data owner entry.
- Parameters:
name – Name of the data owner (must be unique)
contact_person – Optional contact person name
contact_person_email – Optional contact person email
- Returns:
Created data owner object
- ensure_data_owner_exists(name: str) bool[source]¶
Ensure a data owner exists, creating it if necessary.
- Parameters:
name – Name of the data owner
- Returns:
True if created, False if already existed
- create_raw_dataset(rds_id: str, title: str, source: str) Dict[source]¶
Create a raw dataset entry.
- Parameters:
rds_id – Raw dataset identifier
title – Raw dataset title
source – Raw dataset source (URL or description)
- Returns:
Created raw dataset object
- create_dataset(ds_id: str, title: str, collection_id: str, data_owner_name: str, raw_dataset_ids: List[str], description: Optional[str] = None, spatial_coverage_region_id: Optional[str] = None, spatial_resolution: Optional[str] = None, temporal_coverage_start_date: Optional[str] = None, temporal_coverage_end_date: Optional[str] = None, temporal_resolution: Optional[str] = None, access_level: str = 'NONE', tags: Optional[List[str]] = None, additional_metadata: Optional[Dict] = None) Dict[source]¶
Create a dataset.
- Parameters:
ds_id – Dataset identifier (12 chars, must start with collection_id)
title – Dataset title
collection_id – Collection identifier (first 6 chars of ds_id)
data_owner_name – Name of data owner (must exist in database)
raw_dataset_ids – List of raw dataset IDs to link
description – Optional description
spatial_coverage_region_id – Optional region ID
spatial_resolution – Optional spatial resolution enum
temporal_coverage_start_date – Optional start date (YYYY or YYYY-MM-DD)
temporal_coverage_end_date – Optional end date
temporal_resolution – Optional temporal resolution enum
access_level – Access level (NONE|VIEW|DOWNLOAD)
tags – Optional list of tags
additional_metadata – Optional additional metadata dict
- Returns:
Created dataset object
- upload_table(dataset_id: str, bucket_type: str, csv_path: pathlib.Path, table_metadata: Dict) Dict[source]¶
Upload a table (CSV file) to a dataset.
- Parameters:
dataset_id – Dataset ID to upload to
bucket_type – Bucket type (PREPROCESSED or STANDARDISED)
csv_path – Path to the CSV file
table_metadata – Table metadata dict with table_name, description, data_dictionary
- Returns:
Upload response
- upload_dataset_folder(folder_path: str, bucket_type: str = 'STANDARDISED', dry_run: bool = False) Dict[str, Any][source]¶
Upload an entire dataset folder (create dataset + upload all tables).
- Parameters:
folder_path – Path to the dataset folder
bucket_type – Bucket type (PREPROCESSED or STANDARDISED)
dry_run – If True, only validate without making API calls
- Returns:
Summary of upload operation
- upload_all_datasets(data_dir: Optional[str] = None, bucket_type: str = 'STANDARDISED', dry_run: bool = False) List[Dict[str, Any]][source]¶
Upload all dataset folders in the data directory.
- Parameters:
data_dir – Directory containing dataset folders
bucket_type – Bucket type for all uploads
dry_run – If True, only validate without making API calls
- Returns:
List of upload results for each dataset