libera_utils#
libera_utils high-level package initialization for common utilities.
- class libera_utils.LiberaDataProductDefinition(*, coordinates: dict[str, LiberaVariableDefinition], variables: dict[str, LiberaVariableDefinition], attributes: dict[str, Any])#
Pydantic model for a Libera data product definition.
Used for validating existing data product Datasets with helper methods for creating valid Datasets and DataArrays.
- data_variables#
A dictionary of variable names and their corresponding LiberaVariable objects, which contain metadata and data.
- product_metadata#
The metadata associated with the data product, including dynamic metadata and spatio-temporal metadata.
- Type:
ProductMetadata | None
- Attributes:
dynamic_attributesReturn product-level attributes that are dynamically defined (null values) in the data product definition
model_extraGet extra fields set during validation.
model_fields_setReturns the set of fields that have been explicitly set on this model instance.
static_attributesReturn product-level attributes that are statically defined (have values) in the data product definition
Methods
check_dataset_conformance(dataset[, strict])Check the conformance of a Dataset object against a DataProductDefinition
copy(*[, include, exclude, update, deep])Returns a copy of the model.
create_conforming_dataset(data[, ...])Create a Dataset from numpy arrays that is valid against the data product definition
enforce_dataset_conformance(dataset)Analyze and update a Dataset to conform to the expectations of the DataProductDefinition
from_yaml(product_definition_filepath)Create a DataProductDefinition from a Libera data product definition YAML file.
generate_data_product_filename(dataset, ...)Generate a standardized Libera data product filename.
model_construct([_fields_set])Creates a new instance of the Model class with validated data.
model_copy(*[, update, deep])!!! abstract "Usage Documentation"
model_dump(*[, mode, include, exclude, ...])!!! abstract "Usage Documentation"
model_dump_json(*[, indent, ensure_ascii, ...])!!! abstract "Usage Documentation"
model_json_schema([by_alias, ref_template, ...])Generates a JSON schema for a model class.
model_parametrized_name(params)Compute the class name for parametrizations of generic classes.
model_post_init(context, /)Override this method to perform additional initialization after __init__ and model_construct.
model_rebuild(*[, force, raise_errors, ...])Try to rebuild the pydantic-core schema for the model.
model_validate(obj, *[, strict, extra, ...])Validate a pydantic model instance.
model_validate_json(json_data, *[, strict, ...])!!! abstract "Usage Documentation"
model_validate_strings(obj, *[, strict, ...])Validate the given object with string data against the Pydantic model.
construct
dict
from_orm
json
parse_file
parse_obj
parse_raw
schema
schema_json
update_forward_refs
validate
- _check_dataset_attrs(dataset_attrs: dict[str, Any]) list[str]#
Validate the product level attributes of a Dataset against the product definition
Static attributes must match exactly. Some special attributes have their values checked for validity.
- static _get_static_project_attributes(file_path=None)#
Loads project-wide consistent product-level attribute metadata from a YAML file.
These global attributes are expected on every Libera data product so we store them in a global config.
- Parameters:
file_path (Path) – The path to the global attribute metadata YAML file.
- Returns:
Dictionary of key-value pairs for static product attributes.
- Return type:
- classmethod _set_attributes(raw_attributes: dict[str, Any]) dict[str, Any]#
Validates product level attributes and adds requirements for globally consistent attributes.
Any attributes defined with null values are treated as required dynamic attributes that must be set either by the user’s data product definition or dynamically on the Dataset before writing.
- check_dataset_conformance(dataset: Dataset, strict: bool = True) list[str]#
Check the conformance of a Dataset object against a DataProductDefinition
This method is responsible only for finding errors, not fixing them.
- create_conforming_dataset(data: dict[str, ndarray], user_product_attributes: dict[str, Any] | None = None, user_variable_attributes: dict[str, dict[str, Any]] | None = None, strict: bool = True) tuple[Dataset, list[str]]#
Create a Dataset from numpy arrays that is valid against the data product definition
- Parameters:
data (dict[str, np.ndarray]) – Dictionary of variable/coordinate data keyed by variable/coordinate name.
user_product_attributes (dict[str, Any] | None) – Algorithm developers should not need to use this kwarg. Product level attributes for the data product. This allows the user to specify product level attributes that are required but not statically specified in the product definition (e.g. the algorithm version used to generate the product)
user_variable_attributes (dict[str, dict[str, Any]] | None) – Algorithm developers should not need to use this kwarg. Per-variable attributes for each variable’s DataArray. Key is variable name, value is an attributes dict. This allows the user to specify variable level attributes that are required but not statically defined in the product definition.
strict (bool) – Default True. Raises an exception for nonconformance.
- Returns:
Tuple of (Dataset, error_messages) where error_messages contains any validation problems. Empty list if the dataset is fully valid.
- Return type:
Notes
We make no distinction between coordinate and data variable input data and assume that we can determine which is which based on coordinate/variable names the product definition.
This method is not responsible for primary validation or error reporting. We call out to check_dataset_conformance at the end for that.
- property dynamic_attributes#
Return product-level attributes that are dynamically defined (null values) in the data product definition
These attributes are _required_ but are expected to be defined externally to the data product definition
- enforce_dataset_conformance(dataset: Dataset) tuple[Dataset, list[str]]#
Analyze and update a Dataset to conform to the expectations of the DataProductDefinition
This method is for modifying an existing xarray Dataset. If you are creating a Dataset from scratch with numpy arrays, consider using create_conforming_dataset instead.
- Parameters:
dataset (Dataset) – Possibly non-compliant dataset
- Returns:
Tuple of (updated Dataset, error_messages) where error_messages contains any problems that could not be fixed. Empty list if all problems were fixed.
- Return type:
Notes
This method is responsible for trying (and possibly failing) to coerce a Dataset into a valid form with attributes and encodings. We use check_dataset_conformance to check for validation errors.
- classmethod from_yaml(product_definition_filepath: str | CloudPath | Path)#
Create a DataProductDefinition from a Libera data product definition YAML file.
- Parameters:
product_definition_filepath (str | PathType) – Path to YAML file with product and variable definitions
- Returns:
Configured instance with loaded metadata and optional data
- Return type:
DataProductDefinition
- generate_data_product_filename(dataset: Dataset, time_variable: str) LiberaDataProductFilename#
Generate a standardized Libera data product filename.
- Parameters:
dataset (Dataset) – The Dataset for which to create a filename. Used to extract algorithm version and start and end times.
time_variable (str) – Name of the time dimension to use for determining the start and end time.
- Returns:
Properly formatted filename object
- Return type:
- model_config: ClassVar[ConfigDict] = {'frozen': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property static_attributes#
Return product-level attributes that are statically defined (have values) in the data product definition
- class libera_utils.Manifest(*, manifest_type: ~libera_utils.constants.ManifestType, files: list[~libera_utils.io.manifest.ManifestFileRecord] = <factory>, configuration: dict[str, ~typing.Any] = <factory>, filename: ~libera_utils.io.filenaming.ManifestFilename | None = None, ulid_code: ~ulid.ULID | None = <factory>)#
Pydantic model for a manifest file.
- Attributes:
model_extraGet extra fields set during validation.
model_fields_setReturns the set of fields that have been explicitly set on this model instance.
Methods
add_desired_time_range(start_datetime, ...)Add a time range to the configuration section of the manifest.
add_files(*files)Add files to the manifest from filename
check_file_structure(file_structure, ...)Check file structure, returning True if it is good.
copy(*[, include, exclude, update, deep])Returns a copy of the model.
from_file(filepath)Read a manifest file and return a Manifest object (factory method).
model_construct([_fields_set])Creates a new instance of the Model class with validated data.
model_copy(*[, update, deep])!!! abstract "Usage Documentation"
model_dump(*[, mode, include, exclude, ...])!!! abstract "Usage Documentation"
model_dump_json(*[, indent, ensure_ascii, ...])!!! abstract "Usage Documentation"
model_json_schema([by_alias, ref_template, ...])Generates a JSON schema for a model class.
model_parametrized_name(params)Compute the class name for parametrizations of generic classes.
model_post_init(context, /)Override this method to perform additional initialization after __init__ and model_construct.
model_rebuild(*[, force, raise_errors, ...])Try to rebuild the pydantic-core schema for the model.
model_validate(obj, *[, strict, extra, ...])Validate a pydantic model instance.
model_validate_json(json_data, *[, strict, ...])!!! abstract "Usage Documentation"
model_validate_strings(obj, *[, strict, ...])Validate the given object with string data against the Pydantic model.
Create Output manifest from input manifest file path, adds input files to output manifest configuration
serialize_filename(filename, _info)Custom serializer for the manifest filename.
transform_filename(raw_filename)Convert raw filename to ManifestFilename class if necessary.
transform_files(raw_list)Allow for the incoming files list to have varying types.
Validate checksums of listed files
write(out_path[, filename])Write a manifest file from a Manifest object (self).
construct
dict
from_orm
json
parse_file
parse_obj
parse_raw
schema
schema_json
update_forward_refs
validate
- _generate_filename() ManifestFilename#
Generate a valid manifest filename
- add_desired_time_range(start_datetime: datetime, end_datetime: datetime)#
Add a time range to the configuration section of the manifest.
- Parameters:
start_datetime (datetime.datetime) – The desired start time for the range of data in this manifest
end_datetime (datetime.datetime) – The desired end time for the range of data in this manifest
- Return type:
None
- add_files(*files: str | Path | S3Path)#
Add files to the manifest from filename
- Parameters:
files (Union[str, Path, S3Path]) – Path to the file to add to the manifest.
- Return type:
None
- classmethod check_file_structure(file_structure: ManifestFileRecord, existing_names: set[str], existing_checksums: set[str]) bool#
Check file structure, returning True if it is good.
- classmethod from_file(filepath: str | Path | S3Path)#
Read a manifest file and return a Manifest object (factory method).
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- classmethod output_manifest_from_input_manifest(input_manifest: Path | S3Path | Manifest) Manifest#
Create Output manifest from input manifest file path, adds input files to output manifest configuration
- Parameters:
input_manifest (Union[Path, S3Path, 'Manifest']) – An S3 or regular path to an input_manifest object, or the input manifest object itself
- Returns:
output_manifest – The newly created output manifest
- Return type:
- serialize_filename(filename: str | Path | S3Path | ManifestFilename | None, _info) str#
Custom serializer for the manifest filename.
- classmethod transform_filename(raw_filename: str | ManifestFilename | None) ManifestFilename | None#
Convert raw filename to ManifestFilename class if necessary.
- classmethod transform_files(raw_list: list[dict | str | Path | S3Path | ManifestFileRecord] | None) list[ManifestFileRecord]#
Allow for the incoming files list to have varying types. Convert to a standardized list of ManifestFileStructure.
- write(out_path: str | Path | S3Path, filename: str = None) Path | S3Path#
Write a manifest file from a Manifest object (self).
- Parameters:
out_path (Union[str, Path, S3Path]) – Directory path to write to (directory being used loosely to refer also to an S3 bucket path).
filename (str, Optional) – must be a valid manifest filename. If not provided, the method uses the objects internal filename attribute. If that is not set, then a filename is automatically generated.
- Returns:
The path where the manifest file is written.
- Return type:
Union[Path, S3Path]
- class libera_utils.ManifestType(value)#
Enumerated legal manifest type values
Methods
capitalize(/)Return a capitalized version of the string.
casefold(/)Return a version of the string suitable for caseless comparisons.
center(width[, fillchar])Return a centered string of length width.
count(sub[, start[, end]])Return the number of non-overlapping occurrences of substring sub in string S[start:end].
encode(/[, encoding, errors])Encode the string using the codec registered for encoding.
endswith(suffix[, start[, end]])Return True if S ends with the specified suffix, False otherwise.
expandtabs(/[, tabsize])Return a copy where all tab characters are expanded using spaces.
find(sub[, start[, end]])Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].
format(*args, **kwargs)Return a formatted version of S, using substitutions from args and kwargs.
format_map(mapping)Return a formatted version of S, using substitutions from mapping.
index(sub[, start[, end]])Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].
isalnum(/)Return True if the string is an alpha-numeric string, False otherwise.
isalpha(/)Return True if the string is an alphabetic string, False otherwise.
isascii(/)Return True if all characters in the string are ASCII, False otherwise.
isdecimal(/)Return True if the string is a decimal string, False otherwise.
isdigit(/)Return True if the string is a digit string, False otherwise.
isidentifier(/)Return True if the string is a valid Python identifier, False otherwise.
islower(/)Return True if the string is a lowercase string, False otherwise.
isnumeric(/)Return True if the string is a numeric string, False otherwise.
isprintable(/)Return True if the string is printable, False otherwise.
isspace(/)Return True if the string is a whitespace string, False otherwise.
istitle(/)Return True if the string is a title-cased string, False otherwise.
isupper(/)Return True if the string is an uppercase string, False otherwise.
join(iterable, /)Concatenate any number of strings.
ljust(width[, fillchar])Return a left-justified string of length width.
lower(/)Return a copy of the string converted to lowercase.
lstrip([chars])Return a copy of the string with leading whitespace removed.
maketrans(x[, y, z])Return a translation table usable for str.translate().
partition(sep, /)Partition the string into three parts using the given separator.
removeprefix(prefix, /)Return a str with the given prefix string removed if present.
removesuffix(suffix, /)Return a str with the given suffix string removed if present.
replace(old, new[, count])Return a copy with all occurrences of substring old replaced by new.
rfind(sub[, start[, end]])Return the highest index in S where substring sub is found, such that sub is contained within S[start:end].
rindex(sub[, start[, end]])Return the highest index in S where substring sub is found, such that sub is contained within S[start:end].
rjust(width[, fillchar])Return a right-justified string of length width.
rpartition(sep, /)Partition the string into three parts using the given separator.
rsplit(/[, sep, maxsplit])Return a list of the substrings in the string, using sep as the separator string.
rstrip([chars])Return a copy of the string with trailing whitespace removed.
split(/[, sep, maxsplit])Return a list of the substrings in the string, using sep as the separator string.
splitlines(/[, keepends])Return a list of the lines in the string, breaking at line boundaries.
startswith(prefix[, start[, end]])Return True if S starts with the specified prefix, False otherwise.
strip([chars])Return a copy of the string with leading and trailing whitespace removed.
swapcase(/)Convert uppercase characters to lowercase and lowercase characters to uppercase.
title(/)Return a version of the string where each word is titlecased.
translate(table, /)Replace each character in the string using the given translation table.
upper(/)Return a copy of the string converted to uppercase.
zfill(width, /)Pad a numeric string with zeros on the left, to fill a field of the given width.
- libera_utils.smart_copy_file(source_path: str | Path | S3Path, dest_path: str | Path | S3Path, delete: bool | None = False)#
Copy function that can handle local files or files in an S3 bucket. Returns the path to the newly created file as a Path or an S3Path, depending on the destination.
- Parameters:
source_path (Union[str, Path, S3Path]) – Path to the source file to be copied. Files residing in an s3 bucket must begin with “s3://”.
dest_path (Union[str, Path, S3Path]) – Path to the Destination file to be copied to. Files residing in an s3 bucket must begin with “s3://”.
delete (bool, optional) – If true, deletes files copied from source (default = False)
- Returns:
The path to the newly created file
- Return type:
Path or S3Path
- libera_utils.smart_open(path: str | Path | S3Path, mode: str | None = 'rb', enable_gzip: bool | None = True)#
Open function that can handle local files or files in an S3 bucket. It also correctly handles gzip files determined by a *.gz extension.
- Parameters:
path (Union[str, Path, S3Path]) – Path to the file to be opened. Files residing in an s3 bucket must begin with “s3://”.
mode (str, Optional) – Optional string specifying the mode in which the file is opened. Defaults to ‘rb’.
enable_gzip (bool, Optional) – Flag to specify that *.gz files should be opened as a GzipFile object. Setting this to False is useful when creating the md5sum of a *.gz file. Defaults to True.
- Return type:
IO or gzip.GzipFile
Modules
Module for the Libera SDC utilities CLI |
|
Configuration reader. |
|
Constants module used throughout the libera_utils package |
|
db module for dyanmodb operations. |
|
Module containing CLI tool for creating SPICE kernels from packets |
|
Logging utilities |
|
Quality flag definitions |
|
Module for mapping radiometer footprints to scene IDs. |
|
Modules for SPICE kernel creation, management, and usage |
|
Module for dealing with time and time conventions |
|
Module for anything related to package versioning |