libera_utils#

libera_utils high-level package initialization for common utilities.

class libera_utils.LiberaDataProductDefinition(*, coordinates: dict[str, LiberaVariableDefinition], variables: dict[str, LiberaVariableDefinition], attributes: dict[str, Any])#

Pydantic model for a Libera data product definition.

Used for validating existing data product Datasets with helper methods for creating valid Datasets and DataArrays.

data_variables#

A dictionary of variable names and their corresponding LiberaVariable objects, which contain metadata and data.

Type:: dict[str, LiberaVariable]

product_metadata#

The metadata associated with the data product, including dynamic metadata and spatio-temporal metadata.

Type:: ProductMetadata | None

Attributes:

dynamic_attributes: Return product-level attributes that are dynamically defined (null values) in the data product definition
model_extra: Get extra fields set during validation.
model_fields_set: Returns the set of fields that have been explicitly set on this model instance.
static_attributes: Return product-level attributes that are statically defined (have values) in the data product definition

Methods

`check_dataset_conformance`(dataset[, strict])	Check the conformance of a Dataset object against a DataProductDefinition
`copy`(*[, include, exclude, update, deep])	Returns a copy of the model.
`create_conforming_dataset`(data[, ...])	Create a Dataset from numpy arrays that is valid against the data product definition
`enforce_dataset_conformance`(dataset)	Analyze and update a Dataset to conform to the expectations of the DataProductDefinition
`from_yaml`(product_definition_filepath)	Create a DataProductDefinition from a Libera data product definition YAML file.
`generate_data_product_filename`(dataset, ...)	Generate a standardized Libera data product filename.
`model_construct`([_fields_set])	Creates a new instance of the Model class with validated data.
`model_copy`(*[, update, deep])	!!! abstract "Usage Documentation"
`model_dump`(*[, mode, include, exclude, ...])	!!! abstract "Usage Documentation"
`model_dump_json`(*[, indent, ensure_ascii, ...])	!!! abstract "Usage Documentation"
`model_json_schema`([by_alias, ref_template, ...])	Generates a JSON schema for a model class.
`model_parametrized_name`(params)	Compute the class name for parametrizations of generic classes.
`model_post_init`(context, /)	Override this method to perform additional initialization after __init__ and model_construct.
`model_rebuild`(*[, force, raise_errors, ...])	Try to rebuild the pydantic-core schema for the model.
`model_validate`(obj, *[, strict, extra, ...])	Validate a pydantic model instance.
`model_validate_json`(json_data, *[, strict, ...])	!!! abstract "Usage Documentation"
`model_validate_strings`(obj, *[, strict, ...])	Validate the given object with string data against the Pydantic model.

construct
dict
from_orm
json
parse_file
parse_obj
parse_raw
schema
schema_json
update_forward_refs
validate

_check_dataset_attrs(dataset_attrs: dict[str, Any]) → list[str]#

Validate the product level attributes of a Dataset against the product definition

Static attributes must match exactly. Some special attributes have their values checked for validity.

Parameters:: dataset_attrs (dict[str, Any]) – Dataset attributes to validate
Returns:: List of error messages describing problems found. Empty list if no problems.
Return type:: list[str]

static _get_static_project_attributes(file_path=None)#

Loads project-wide consistent product-level attribute metadata from a YAML file.

These global attributes are expected on every Libera data product so we store them in a global config.

Parameters:: file_path (Path) – The path to the global attribute metadata YAML file.
Returns:: Dictionary of key-value pairs for static product attributes.
Return type:: dict

classmethod _set_attributes(raw_attributes: dict[str, Any]) → dict[str, Any]#

Validates product level attributes and adds requirements for globally consistent attributes.

Any attributes defined with null values are treated as required dynamic attributes that must be set either by the user’s data product definition or dynamically on the Dataset before writing.

Parameters:: raw_attributes (dict[str, Any]) – The attributes specification in the product definition.
Returns:: The validated attributes dictionary, including standard defaults that we always require.
Return type:: dict[str, Any]

check_dataset_conformance(dataset: Dataset, strict: bool = True) → list[str]#

Check the conformance of a Dataset object against a DataProductDefinition

This method is responsible only for finding errors, not fixing them.

Parameters:

dataset (Dataset) – Dataset object to validate against expectations in the product configuration
strict (bool) – Default True. Raises an exception for nonconformance.

Returns:

List of error messages describing problems found. Empty list if no problems.

Return type:

list[str]

create_conforming_dataset(data: dict[str, ndarray], user_product_attributes: dict[str, Any] | None = None, user_variable_attributes: dict[str, dict[str, Any]] | None = None, strict: bool = True) → tuple[Dataset, list[str]]#

Create a Dataset from numpy arrays that is valid against the data product definition

Parameters:

data (dict[str, np.ndarray]) – Dictionary of variable/coordinate data keyed by variable/coordinate name.
user_product_attributes (dict[str, Any] | None) – Algorithm developers should not need to use this kwarg. Product level attributes for the data product. This allows the user to specify product level attributes that are required but not statically specified in the product definition (e.g. the algorithm version used to generate the product)
user_variable_attributes (dict[str, dict[str, Any]] | None) – Algorithm developers should not need to use this kwarg. Per-variable attributes for each variable’s DataArray. Key is variable name, value is an attributes dict. This allows the user to specify variable level attributes that are required but not statically defined in the product definition.
strict (bool) – Default True. Raises an exception for nonconformance.

Returns:

Tuple of (Dataset, error_messages) where error_messages contains any validation problems. Empty list if the dataset is fully valid.

Return type:

tuple[Dataset, list[str]]

Notes

We make no distinction between coordinate and data variable input data and assume that we can determine which is which based on coordinate/variable names the product definition.
This method is not responsible for primary validation or error reporting. We call out to check_dataset_conformance at the end for that.

property dynamic_attributes#

Return product-level attributes that are dynamically defined (null values) in the data product definition

These attributes are _required_ but are expected to be defined externally to the data product definition

enforce_dataset_conformance(dataset: Dataset) → tuple[Dataset, list[str]]#

Analyze and update a Dataset to conform to the expectations of the DataProductDefinition

This method is for modifying an existing xarray Dataset. If you are creating a Dataset from scratch with numpy arrays, consider using create_conforming_dataset instead.

Parameters:: dataset (Dataset) – Possibly non-compliant dataset
Returns:: Tuple of (updated Dataset, error_messages) where error_messages contains any problems that could not be fixed. Empty list if all problems were fixed.
Return type:: tuple[Dataset, list[str]]

Notes

This method is responsible for trying (and possibly failing) to coerce a Dataset into a valid form with attributes and encodings. We use check_dataset_conformance to check for validation errors.

classmethod from_yaml(product_definition_filepath: str | CloudPath | Path)#

Create a DataProductDefinition from a Libera data product definition YAML file.

Parameters:: product_definition_filepath (str | PathType) – Path to YAML file with product and variable definitions
Returns:: Configured instance with loaded metadata and optional data
Return type:: DataProductDefinition

generate_data_product_filename(dataset: Dataset, time_variable: str) → LiberaDataProductFilename#

Generate a standardized Libera data product filename.

Parameters:

dataset (Dataset) – The Dataset for which to create a filename. Used to extract algorithm version and start and end times.
time_variable (str) – Name of the time dimension to use for determining the start and end time.

Returns:

Properly formatted filename object

Return type:

LiberaDataProductFilename

model_config: ClassVar[ConfigDict] = {'frozen': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property static_attributes#: Return product-level attributes that are statically defined (have values) in the data product definition

class libera_utils.Manifest(*, manifest_type: ~libera_utils.constants.ManifestType, files: list[~libera_utils.io.manifest.ManifestFileRecord] = <factory>, configuration: dict[str, ~typing.Any] = <factory>, filename: ~libera_utils.io.filenaming.ManifestFilename | None = None, ulid_code: ~ulid.ULID | None = <factory>)#

Pydantic model for a manifest file.

Attributes:

model_extra: Get extra fields set during validation.
model_fields_set: Returns the set of fields that have been explicitly set on this model instance.

Methods

`add_desired_time_range`(start_datetime, ...)	Add a time range to the configuration section of the manifest.
`add_files`(*files)	Add files to the manifest from filename
`check_file_structure`(file_structure, ...)	Check file structure, returning True if it is good.
`copy`(*[, include, exclude, update, deep])	Returns a copy of the model.
`from_file`(filepath)	Read a manifest file and return a Manifest object (factory method).
`model_construct`([_fields_set])	Creates a new instance of the Model class with validated data.
`model_copy`(*[, update, deep])	!!! abstract "Usage Documentation"
`model_dump`(*[, mode, include, exclude, ...])	!!! abstract "Usage Documentation"
`model_dump_json`(*[, indent, ensure_ascii, ...])	!!! abstract "Usage Documentation"
`model_json_schema`([by_alias, ref_template, ...])	Generates a JSON schema for a model class.
`model_parametrized_name`(params)	Compute the class name for parametrizations of generic classes.
`model_post_init`(context, /)	Override this method to perform additional initialization after __init__ and model_construct.
`model_rebuild`(*[, force, raise_errors, ...])	Try to rebuild the pydantic-core schema for the model.
`model_validate`(obj, *[, strict, extra, ...])	Validate a pydantic model instance.
`model_validate_json`(json_data, *[, strict, ...])	!!! abstract "Usage Documentation"
`model_validate_strings`(obj, *[, strict, ...])	Validate the given object with string data against the Pydantic model.
`output_manifest_from_input_manifest`(...)	Create Output manifest from input manifest file path, adds input files to output manifest configuration
`serialize_filename`(filename, _info)	Custom serializer for the manifest filename.
`transform_filename`(raw_filename)	Convert raw filename to ManifestFilename class if necessary.
`transform_files`(raw_list)	Allow for the incoming files list to have varying types.
`validate_checksums`()	Validate checksums of listed files
`write`(out_path[, filename])	Write a manifest file from a Manifest object (self).

construct
dict
from_orm
json
parse_file
parse_obj
parse_raw
schema
schema_json
update_forward_refs
validate

_generate_filename() → ManifestFilename#: Generate a valid manifest filename

add_desired_time_range(start_datetime: datetime, end_datetime: datetime)#

Add a time range to the configuration section of the manifest.

Parameters:

start_datetime (datetime.datetime) – The desired start time for the range of data in this manifest
end_datetime (datetime.datetime) – The desired end time for the range of data in this manifest

Return type:

None

add_files(*files: str | Path | S3Path)#

Add files to the manifest from filename

Parameters:: files (Union[str, Path, S3Path]) – Path to the file to add to the manifest.
Return type:: None

classmethod check_file_structure(file_structure: ManifestFileRecord, existing_names: set[str], existing_checksums: set[str]) → bool#: Check file structure, returning True if it is good.

classmethod from_file(filepath: str | Path | S3Path)#

Read a manifest file and return a Manifest object (factory method).

Parameters:: filepath (Union[str, Path, S3Path]) – Location of manifest file to read.
Returns:: Pydantic model built from the json of the given manifest file.
Return type:: Manifest

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod output_manifest_from_input_manifest(input_manifest: Path | S3Path | Manifest) → Manifest#

Create Output manifest from input manifest file path, adds input files to output manifest configuration

Parameters:: input_manifest (Union[Path, S3Path, 'Manifest']) – An S3 or regular path to an input_manifest object, or the input manifest object itself
Returns:: output_manifest – The newly created output manifest
Return type:: Manifest

serialize_filename(filename: str | Path | S3Path | ManifestFilename | None, _info) → str#: Custom serializer for the manifest filename.

classmethod transform_filename(raw_filename: str | ManifestFilename | None) → ManifestFilename | None#: Convert raw filename to ManifestFilename class if necessary.

classmethod transform_files(raw_list: list[dict | str | Path | S3Path | ManifestFileRecord] | None) → list[ManifestFileRecord]#: Allow for the incoming files list to have varying types. Convert to a standardized list of ManifestFileStructure.

validate_checksums() → None#: Validate checksums of listed files

write(out_path: str | Path | S3Path, filename: str = None) → Path | S3Path#

Write a manifest file from a Manifest object (self).

Parameters:

out_path (Union[str, Path, S3Path]) – Directory path to write to (directory being used loosely to refer also to an S3 bucket path).
filename (str, Optional) – must be a valid manifest filename. If not provided, the method uses the objects internal filename attribute. If that is not set, then a filename is automatically generated.

Returns:

The path where the manifest file is written.

Return type:

Union[Path, S3Path]

class libera_utils.ManifestType(value)#

Enumerated legal manifest type values

Methods

`capitalize`(/)	Return a capitalized version of the string.
`casefold`(/)	Return a version of the string suitable for caseless comparisons.
`center`(width[, fillchar])	Return a centered string of length width.
`count`(sub[, start[, end]])	Return the number of non-overlapping occurrences of substring sub in string S[start:end].
`encode`(/[, encoding, errors])	Encode the string using the codec registered for encoding.
`endswith`(suffix[, start[, end]])	Return True if S ends with the specified suffix, False otherwise.
`expandtabs`(/[, tabsize])	Return a copy where all tab characters are expanded using spaces.
`find`(sub[, start[, end]])	Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].
`format`(args, *kwargs)	Return a formatted version of S, using substitutions from args and kwargs.
`format_map`(mapping)	Return a formatted version of S, using substitutions from mapping.
`index`(sub[, start[, end]])	Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].
`isalnum`(/)	Return True if the string is an alpha-numeric string, False otherwise.
`isalpha`(/)	Return True if the string is an alphabetic string, False otherwise.
`isascii`(/)	Return True if all characters in the string are ASCII, False otherwise.
`isdecimal`(/)	Return True if the string is a decimal string, False otherwise.
`isdigit`(/)	Return True if the string is a digit string, False otherwise.
`isidentifier`(/)	Return True if the string is a valid Python identifier, False otherwise.
`islower`(/)	Return True if the string is a lowercase string, False otherwise.
`isnumeric`(/)	Return True if the string is a numeric string, False otherwise.
`isprintable`(/)	Return True if the string is printable, False otherwise.
`isspace`(/)	Return True if the string is a whitespace string, False otherwise.
`istitle`(/)	Return True if the string is a title-cased string, False otherwise.
`isupper`(/)	Return True if the string is an uppercase string, False otherwise.
`join`(iterable, /)	Concatenate any number of strings.
`ljust`(width[, fillchar])	Return a left-justified string of length width.
`lower`(/)	Return a copy of the string converted to lowercase.
`lstrip`([chars])	Return a copy of the string with leading whitespace removed.
`maketrans`(x[, y, z])	Return a translation table usable for str.translate().
`partition`(sep, /)	Partition the string into three parts using the given separator.
`removeprefix`(prefix, /)	Return a str with the given prefix string removed if present.
`removesuffix`(suffix, /)	Return a str with the given suffix string removed if present.
`replace`(old, new[, count])	Return a copy with all occurrences of substring old replaced by new.
`rfind`(sub[, start[, end]])	Return the highest index in S where substring sub is found, such that sub is contained within S[start:end].
`rindex`(sub[, start[, end]])	Return the highest index in S where substring sub is found, such that sub is contained within S[start:end].
`rjust`(width[, fillchar])	Return a right-justified string of length width.
`rpartition`(sep, /)	Partition the string into three parts using the given separator.
`rsplit`(/[, sep, maxsplit])	Return a list of the substrings in the string, using sep as the separator string.
`rstrip`([chars])	Return a copy of the string with trailing whitespace removed.
`split`(/[, sep, maxsplit])	Return a list of the substrings in the string, using sep as the separator string.
`splitlines`(/[, keepends])	Return a list of the lines in the string, breaking at line boundaries.
`startswith`(prefix[, start[, end]])	Return True if S starts with the specified prefix, False otherwise.
`strip`([chars])	Return a copy of the string with leading and trailing whitespace removed.
`swapcase`(/)	Convert uppercase characters to lowercase and lowercase characters to uppercase.
`title`(/)	Return a version of the string where each word is titlecased.
`translate`(table, /)	Replace each character in the string using the given translation table.
`upper`(/)	Return a copy of the string converted to uppercase.
`zfill`(width, /)	Pad a numeric string with zeros on the left, to fill a field of the given width.

Copy function that can handle local files or files in an S3 bucket. Returns the path to the newly created file as a Path or an S3Path, depending on the destination.

Parameters:

source_path (Union[str, Path, S3Path]) – Path to the source file to be copied. Files residing in an s3 bucket must begin with “s3://”.
dest_path (Union[str, Path, S3Path]) – Path to the Destination file to be copied to. Files residing in an s3 bucket must begin with “s3://”.
delete (bool, optional) – If true, deletes files copied from source (default = False)

Returns:

The path to the newly created file

Return type:

Path or S3Path

libera_utils.smart_open(path: str | Path | S3Path, mode: str | None = 'rb', enable_gzip: bool | None = True)#

Open function that can handle local files or files in an S3 bucket. It also correctly handles gzip files determined by a *.gz extension.

Parameters:

path (Union[str, Path, S3Path]) – Path to the file to be opened. Files residing in an s3 bucket must begin with “s3://”.
mode (str, Optional) – Optional string specifying the mode in which the file is opened. Defaults to ‘rb’.
enable_gzip (bool, Optional) – Flag to specify that *.gz files should be opened as a GzipFile object. Setting this to False is useful when creating the md5sum of a *.gz file. Defaults to True.

Return type:

IO or gzip.GzipFile

Modules

`aws`
`cli`	Module for the Libera SDC utilities CLI
`config`	Configuration reader.
`constants`	Constants module used throughout the libera_utils package
`db`	db module for dyanmodb operations.
`io`
`kernel_maker`	Module containing CLI tool for creating SPICE kernels from packets
`logutil`	Logging utilities
`quality_flags`	Quality flag definitions
`scene_definitions`
`scene_id`	Module for mapping radiometer footprints to scene IDs.
`spice_utils`	Modules for SPICE kernel creation, management, and usage
`time`	Module for dealing with time and time conventions
`version`	Module for anything related to package versioning

libera_utils#

This Page