libera_utils.aws.s3_utilities#
Module for S3 cli utilities
Functions
|
Stage data product files to the Ingest Dropbox and emit a single NewFilesAvailable event. |
|
Emit a single NewFilesAvailable event to the SDC event bus. |
|
CLI handler function for s3-utils cp CLI subcommand. |
|
List all files in an archive S3 bucket for a given processing step. |
|
CLI handler function for s3-utils list CLI subcommand. |
|
CLI handler function for the |
|
Block until every staged file is confirmed fully ingested, or raise on timeout. |
- libera_utils.aws.s3_utilities._archive_object_exists(s3_client, bucket: str, key: str) bool#
Return whether an object exists at the given bucket/key (read-only
head_object).
- libera_utils.aws.s3_utilities._log_ingestion_verification_summary(file_specs: list[dict], pending: set) None#
Log a per-file PASS/MISSING summary of the ingestion verification checks.
- libera_utils.aws.s3_utilities._validate_filename_for_ingest(path: CloudPath | Path) L0Filename | LiberaDataProductFilename#
Validate a path as a Libera L0 or data product filename eligible for manual ingest.
Manifest and any other filename types are rejected.
- Parameters:
path (Path or S3Path) – Path to the file to validate.
- Returns:
The parsed filename object.
- Return type:
- libera_utils.aws.s3_utilities.manual_ingest_data_products(paths_to_files: list[Path], *, boto_session: Session) list[L0Filename | LiberaDataProductFilename]#
Stage data product files to the Ingest Dropbox and emit a single NewFilesAvailable event.
The SDC Data Ingester picks up the staged files and handles archiving them in the correct bucket as well as creating file metadata and data availability records.
- Parameters:
paths_to_files (list of Path) – Local filesystem paths to the files to ingest. Each must be a validly named Libera L0 or data product file.
boto_session (boto3.Session) – Boto3 session used for all AWS interactions. Created once by the CLI handler and passed in so that the same authenticated session is used throughout the workflow.
- Returns:
The validated filename objects for the staged files (useful for subsequent verification).
- Return type:
- libera_utils.aws.s3_utilities.put_new_files_available_event(files: list[dict], *, boto_session: Session) None#
Emit a single NewFilesAvailable event to the SDC event bus.
- libera_utils.aws.s3_utilities.s3_copy_cli_handler(parsed_args: Namespace) None#
CLI handler function for s3-utils cp CLI subcommand.
- libera_utils.aws.s3_utilities.s3_list_archive_files(data_product_id: str | DataProductIdentifier, *, profile_name: str = None) list#
List all files in an archive S3 bucket for a given processing step.
- libera_utils.aws.s3_utilities.s3_list_cli_handler(parsed_args: Namespace) None#
CLI handler function for s3-utils list CLI subcommand.
- libera_utils.aws.s3_utilities.s3_put_cli_handler(parsed_args: Namespace) None#
CLI handler function for the
s3-utils putsubcommand.Stages one or more Libera data product files into the SDC Ingest Dropbox bucket and emits a single
NewFilesAvailableevent to the SDC event bus. The SDC Data Ingester service then archives the files and creates the associated file metadata and data availability records. This is the manual analog of the automated ingest that happens for files produced by SDC processing steps.
- libera_utils.aws.s3_utilities.verify_ingestion(libera_filenames: list[L0Filename | LiberaDataProductFilename], *, boto_session: Session, timeout: float = 300.0, poll_interval: float = 10.0) None#
Block until every staged file is confirmed fully ingested, or raise on timeout.
For each file, up to three read-only checks are polled until they pass:
The file exists in its expected archive bucket (at its filename-derived archive prefix).
A Data Availability record exists for the data product/version/applicable-date (skipped for L0 PDS/CR files, which the SDC does not write availability records for).
A File Metadata record exists for the file basename.
All required AWS resources are resolved once up front; finding zero or more than one of any resource raises immediately (it indicates a mismatch between Libera Utils and the deployed SDC). Checks are polled every
poll_intervalseconds and each check stops being polled as soon as it passes. A per-file summary is always logged; if any check is still unsatisfied attimeout, aTimeoutErroris raised.- Parameters:
libera_filenames (list of L0Filename or LiberaDataProductFilename) – The validated filenames staged for ingest (as returned by
manual_ingest_data_products).boto_session (boto3.Session) – Boto3 session used for all (read-only) AWS interactions.
timeout (float, optional) – Maximum number of seconds to wait for full ingestion. Default 300 (5 minutes).
poll_interval (float, optional) – Number of seconds between polling passes. Default 10.
- Raises:
TimeoutError – If any expected record/object is still missing when the timeout elapses.