libera_utils.io.product_definition#

Data Product configuration and writing for Libera NetCDF4 data product files

Classes

LiberaDataProductDefinition(*, coordinates, ...)

Pydantic model for a Libera data product definition.

LiberaDimensionDefinition(*[, size])

Pydantic model describing rules for a Libera dimension definition.

LiberaVariableDefinition(*, dtype, ...)

Pydantic model for a Libera variable definition.

class libera_utils.io.product_definition.LiberaDataProductDefinition(*, coordinates: dict[str, LiberaVariableDefinition], variables: dict[str, LiberaVariableDefinition], attributes: dict[str, Any])#

Pydantic model for a Libera data product definition.

Used for validating existing data product Datasets with helper methods for creating valid Datasets and DataArrays.

data_variables#

A dictionary of variable names and their corresponding LiberaVariable objects, which contain metadata and data.

Type:

dict[str, LiberaVariable]

product_metadata#

The metadata associated with the data product, including dynamic metadata and spatio-temporal metadata.

Type:

ProductMetadata | None

Attributes:
dynamic_attributes

Return product-level attributes that are dynamically defined (null values) in the data product definition

model_extra

Get extra fields set during validation.

model_fields_set

Returns the set of fields that have been explicitly set on this model instance.

static_attributes

Return product-level attributes that are statically defined (have values) in the data product definition

Methods

check_dataset_conformance(dataset[, strict])

Check the conformance of a Dataset object against a data product definition.

copy(*[, include, exclude, update, deep])

Returns a copy of the model.

create_product_dataset(data[, ...])

Create a product Dataset from numpy arrays.

enforce_dataset_conformance(dataset)

Analyze and update a Dataset to conform to the expectations of the DataProductDefinition

from_yaml(product_definition_filepath)

Create a DataProductDefinition from a Libera data product definition YAML file.

generate_data_product_filename(dataset, ...)

Generate a standardized Libera data product filename.

model_construct([_fields_set])

Creates a new instance of the Model class with validated data.

model_copy(*[, update, deep])

!!! abstract "Usage Documentation"

model_dump(*[, mode, include, exclude, ...])

!!! abstract "Usage Documentation"

model_dump_json(*[, indent, ensure_ascii, ...])

!!! abstract "Usage Documentation"

model_json_schema(by_alias, ref_template, ...)

Generates a JSON schema for a model class.

model_parametrized_name(params)

Compute the class name for parametrizations of generic classes.

model_post_init(context, /)

Override this method to perform additional initialization after __init__ and model_construct.

model_rebuild(*[, force, raise_errors, ...])

Try to rebuild the pydantic-core schema for the model.

model_validate(obj, *[, strict, extra, ...])

Validate a pydantic model instance.

model_validate_json(json_data, *[, strict, ...])

!!! abstract "Usage Documentation"

model_validate_strings(obj, *[, strict, ...])

Validate the given object with string data against the Pydantic model.

construct

dict

from_orm

json

parse_file

parse_obj

parse_raw

schema

schema_json

update_forward_refs

validate

_check_dataset_attrs(dataset_attrs: dict[str, Any]) list[str]#

Validate the product level attributes of a Dataset against the product definition

Static attributes must match exactly. Some special attributes have their values checked for validity.

Parameters:

dataset_attrs (dict[str, Any]) – Dataset attributes to validate

Returns:

List of error messages describing problems found. Empty list if no problems.

Return type:

list[str]

static _get_static_project_attributes(file_path=None) dict[str, Any]#

Loads project-wide consistent product-level attribute metadata from a YAML file.

These global attributes are expected on every Libera data product so we store them in a global config.

Parameters:

file_path (Path) – The path to the global attribute metadata YAML file.

Returns:

Dictionary of key-value pairs for static product attributes.

Return type:

dict[str, Any]

classmethod _set_attributes(raw_attributes: dict[str, Any]) dict[str, Any]#

Validates product level attributes and adds requirements for globally consistent attributes.

Any attributes defined with null values are treated as required dynamic attributes that must be set either by the user’s data product definition or dynamically on the Dataset before writing.

Parameters:

raw_attributes (dict[str, Any]) – The attributes specification in the product definition.

Returns:

The validated attributes dictionary, including standard defaults that we always require.

Return type:

dict[str, Any]

check_dataset_conformance(dataset: Dataset, strict: bool = True) list[str]#

Check the conformance of a Dataset object against a data product definition.

This method is responsible only for finding errors, not fixing them. It warns on every violation and logs all errors it finds at the end. If strict is True, it raises an exception if any errors are found. If strict is False, it just returns the list of error messages.

Parameters:
  • dataset (Dataset) – Dataset object to validate against expectations in the product configuration

  • strict (bool) – Default True. Raises an exception for nonconformance.

Returns:

List of error messages describing problems found. Empty list if no problems.

Return type:

list[str]

create_product_dataset(data: dict[str, ndarray], dynamic_product_attributes: dict[str, Any] | None = None, dynamic_variable_attributes: dict[str, dict[str, Any]] | None = None) Dataset#

Create a product Dataset from numpy arrays.

This method creates a Dataset from numpy arrays, setting attributes and encodings according to the product definition. This does not guarantee a fully conformant Dataset. To bring the Dataset into conformance, use enforce_dataset_conformance on the resulting Dataset and check the result with check_dataset_conformance.

Parameters:
  • data (dict[str, np.ndarray]) – Dictionary of variable/coordinate data keyed by variable/coordinate name.

  • dynamic_product_attributes (dict[str, Any] | None) – Algorithm developers should not need to use this kwarg. Product level attributes for the data product. This allows the user to specify product level attributes that are required but not statically specified in the product definition (e.g. the algorithm version used to generate the product)

  • dynamic_variable_attributes (dict[str, dict[str, Any]] | None) – Algorithm developers should not need to use this kwarg. Per-variable attributes for each variable’s DataArray. Key is variable name, value is an attributes dict. This allows the user to specify variable level attributes that are required but not statically defined in the product definition.

Returns:

The created Dataset. This Dataset is not guaranteed to be conformant and should be checked with check_dataset_conformance.

Return type:

Dataset

Notes

  • We make no distinction between coordinate and data variable input data and determine which is which based on coordinate/variable sections in the product definition.

  • This method is not responsible for primary validation or error reporting. The caller is responsible for checking the result with check_dataset_conformance and fixing any errors that arise.

property dynamic_attributes#

Return product-level attributes that are dynamically defined (null values) in the data product definition

These attributes are _required_ but are expected to be defined externally to the data product definition

enforce_dataset_conformance(dataset: Dataset) Dataset#

Analyze and update a Dataset to conform to the expectations of the DataProductDefinition

This method attempts to bring a Dataset into conformance with a product definition, including enforcing conformance of variable DataArrays. When making changes, the data product definition takes precedence over any existing metadata or settings on the Dataset. Logs are emitted for all changes made. When the Dataset configuration contradicts the data product definition, warnings are also issued. This method is not responsible for validating the final result and does not guarantee that the resulting Dataset will pass the validation checks because some problems simply can’t be fixed.

Parameters:

dataset (Dataset) – Possibly non-compliant dataset

Returns:

The updated Dataset. This Dataset is not guaranteed to be fully conformant and should be checked with check_dataset_conformance to verify.

Return type:

Dataset

classmethod from_yaml(product_definition_filepath: str | CloudPath | Path)#

Create a DataProductDefinition from a Libera data product definition YAML file.

Parameters:

product_definition_filepath (str | PathType) – Path to YAML file with product and variable definitions

Returns:

Configured instance with loaded metadata and optional data

Return type:

DataProductDefinition

generate_data_product_filename(dataset: Dataset, time_variable: str) LiberaDataProductFilename#

Generate a standardized Libera data product filename.

Parameters:
  • dataset (Dataset) – The Dataset for which to create a filename. Used to extract algorithm version and start and end times.

  • time_variable (str) – Name of the time dimension to use for determining the start and end time.

Returns:

Properly formatted filename object

Return type:

LiberaDataProductFilename

model_config = {'frozen': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property static_attributes#

Return product-level attributes that are statically defined (have values) in the data product definition

class libera_utils.io.product_definition.LiberaDimensionDefinition(*, size: int | None = None, long_name: str)#

Pydantic model describing rules for a Libera dimension definition.

size#

The size of the dimension. If None, the dimension is dynamic.

Type:

int | None

long_name#

A descriptive human-readable name for the dimension.

Type:

str

Attributes:
model_extra

Get extra fields set during validation.

model_fields_set

Returns the set of fields that have been explicitly set on this model instance.

Methods

copy(*[, include, exclude, update, deep])

Returns a copy of the model.

model_construct([_fields_set])

Creates a new instance of the Model class with validated data.

model_copy(*[, update, deep])

!!! abstract "Usage Documentation"

model_dump(*[, mode, include, exclude, ...])

!!! abstract "Usage Documentation"

model_dump_json(*[, indent, ensure_ascii, ...])

!!! abstract "Usage Documentation"

model_json_schema(by_alias, ref_template, ...)

Generates a JSON schema for a model class.

model_parametrized_name(params)

Compute the class name for parametrizations of generic classes.

model_post_init(context, /)

Override this method to perform additional initialization after __init__ and model_construct.

model_rebuild(*[, force, raise_errors, ...])

Try to rebuild the pydantic-core schema for the model.

model_validate(obj, *[, strict, extra, ...])

Validate a pydantic model instance.

model_validate_json(json_data, *[, strict, ...])

!!! abstract "Usage Documentation"

model_validate_strings(obj, *[, strict, ...])

Validate the given object with string data against the Pydantic model.

construct

dict

from_orm

json

parse_file

parse_obj

parse_raw

schema

schema_json

update_forward_refs

validate

model_config = {'frozen': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class libera_utils.io.product_definition.LiberaVariableDefinition(*, dtype: str, attributes: dict[str, ~typing.Any]={}, dimensions: list[str] = [], encoding: dict = <factory>)#

Pydantic model for a Libera variable definition.

This model is the same for both data variables and coordinate variables

dtype#

The data type of the variable’s data array, specified as a string

Type:

str

attributes#

The attribute metadata for the variable, containing specific key value pairs for CF metadata compliance

Type:

VariableAttributes

dimensions#

A list of dimension names that the variable’s data array references.

Type:

list[str]

encoding#

A dictionary specifying how the variable’s data should be encoded when written to a NetCDF file.

Type:

dict

Attributes:
dynamic_attributes

Variable level attributes defined with null values in the product definition YAML.

model_extra

Get extra fields set during validation.

model_fields_set

Returns the set of fields that have been explicitly set on this model instance.

static_attributes

Variable level attributes defined with non-null values in the product definition YAML

Methods

check_data_array_conformance(data_array, ...)

Check the conformance of a DataArray object against a data variable definition.

copy(*[, include, exclude, update, deep])

Returns a copy of the model.

create_variable_data_array(data, variable_name)

Create a DataArray for a single variable from a numpy array.

enforce_data_array_conformance(data_array, ...)

Update a variable or coordinate DataArray to conform to specifications in data product definition.

model_construct([_fields_set])

Creates a new instance of the Model class with validated data.

model_copy(*[, update, deep])

!!! abstract "Usage Documentation"

model_dump(*[, mode, include, exclude, ...])

!!! abstract "Usage Documentation"

model_dump_json(*[, indent, ensure_ascii, ...])

!!! abstract "Usage Documentation"

model_json_schema(by_alias, ref_template, ...)

Generates a JSON schema for a model class.

model_parametrized_name(params)

Compute the class name for parametrizations of generic classes.

model_post_init(context, /)

Override this method to perform additional initialization after __init__ and model_construct.

model_rebuild(*[, force, raise_errors, ...])

Try to rebuild the pydantic-core schema for the model.

model_validate(obj, *[, strict, extra, ...])

Validate a pydantic model instance.

model_validate_json(json_data, *[, strict, ...])

!!! abstract "Usage Documentation"

model_validate_strings(obj, *[, strict, ...])

Validate the given object with string data against the Pydantic model.

construct

dict

from_orm

json

parse_file

parse_obj

parse_raw

schema

schema_json

update_forward_refs

validate

classmethod _check_allowed_dimensions(raw_dimensions: list[str]) list[str]#

Validates that all dimensions used in coordinates and variables are defined in the global standard dimensions.

This is just an early preliminary check and does not check anything related to data. Verification of dimension size is done when checking conformance once data is provided.

Parameters:

raw_dimensions (list[str]) – The raw dimensions specification in the product definition.

Returns:

The validated dimensions list, unchanged.

Return type:

list[str]

_check_data_array_attributes(data_array_attrs: dict[str, Any], variable_name: str) list[str]#

Validate the variable level attributes of a DataArray against the product definition

All attributes must have values. Static attributes defined in the product definition must match exactly. Dynamic attributes defined in the product definition may have any value but must be present.

Parameters:
  • data_array_attrs (dict[str, Any]) – DataArray attributes to validate

  • variable_name (str) – Name of the variable being checked (for error messages)

Returns:

List of error messages describing problems found. Empty list if no problems.

Return type:

list[str]

static _get_standard_dimensions(file_path: PathLike | None = None) dict[str, LiberaDimensionDefinition]#

Loads standard dimension metadata from a YAML file.

These standard dimensions are expected to be used in every Libera data product so we store them in a global config.

Parameters:

file_path (PathLike | None) – The path to the standard dimension metadata YAML file.

Returns:

Dictionary of dimension name to LiberaDimensionDefinition instances.

Return type:

dict[str, LiberaDimensionDefinition]

classmethod _set_encoding(encoding: dict | None)#

Merge configured encoding with required defaults, issuing warnings on conflicts.

classmethod _validate_dtype(dtype: str) str#

Validates that the dtype specified in the product definition is a valid numpy dtype string.

Parameters:

dtype (str) – The raw dtype specification in the product definition.

Returns:

The validated dtype string, unchanged.

Return type:

str

check_data_array_conformance(data_array: DataArray, variable_name: str) list[str]#

Check the conformance of a DataArray object against a data variable definition.

This method is responsible only for finding errors, not fixing them. It warns on every violation and returns a list of error messages.

Notes

This does not verify that all required coordinate data exists on the DataArray. Dimensions lacking coordinates are treated as index dimensions. If coordinate data is later added to a Dataset under a dimension of the same name, the dimension will reference that coordinate data.

Parameters:
  • data_array (DataArray) – The data array to validate with this variable’s metadata configuration.

  • variable_name (str) – Name of the variable being checked (for error messages)

Returns:

List of error messages describing problems found. Empty list if no problems.

Return type:

list[str]

create_variable_data_array(data: ndarray, variable_name: str, dynamic_variable_attributes: dict[str, Any] | None = None) DataArray#

Create a DataArray for a single variable from a numpy array.

Sets encoding and attributes from product definition, adding dynamic attributes if provided.

Coordinate data is not required. Dimensions that reference coordinate dimensions are created as index dimensions. If coordinate data is added later (e.g. to a Dataset), these dimensions will reference the coordinates based on dimension name matching coordinate name.

Parameters:
  • data (np.ndarray) – Data for the variable DataArray.

  • variable_name (str) – Name of the variable. Used for log messages and warnings.

  • dynamic_variable_attributes (dict[str, Any] | None) – Algorithm developers should not need to use this kwarg. Variable level attributes defined by the user. This allows a user to specify dynamic attributes that may be required by the definition but not statically defined in yaml.

Returns:

A minimal DataArray for the specified variable. This DataArray may not be fully conformant to the product definition. To bring it into conformance, use enforce_dataset_conformance on a Dataset containing this DataArray.

Return type:

DataArray

property dynamic_attributes: dict#

Variable level attributes defined with null values in the product definition YAML.

These attributes are _required_ but are expected to be passed explicitly during data product creation.

Returns:

Dictionary of dynamic variable level attributes with null values that must be set during product creation.

Return type:

dict

enforce_data_array_conformance(data_array: DataArray, variable_name: str) DataArray#

Update a variable or coordinate DataArray to conform to specifications in data product definition.

This method attempts to bring a DataArray into conformance with a variable definition. When making changes, the data variable definition takes precedence over any existing metadata or settings on the DataArray. Logs are emitted for all changes made. When the DataArray configuration contradicts the data product definition, warnings are also issued. This method is not responsible for validating the final result and does not guarantee that the resulting DataArray will pass the validation checks because some problems simply can’t be fixed.

Parameters:
  • data_array (DataArray) – The variable data array to analyze and update

  • variable_name (str) – Name of the variable being enforced (for logging)

Returns:

The updated DataArray. This DataArray is not guaranteed to be fully conformant and should be checked with check_data_array_conformance after enforcement to verify.

Return type:

DataArray

Warns:

UserWarning – If any conflicts are found between the DataArray and the product definition attributes or encoding settings.

Raises:

ValueError – Raise for problems that can’t be fixed.

model_config = {'frozen': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property static_attributes: dict#

Variable level attributes defined with non-null values in the product definition YAML

Returns:

Dictionary of static variable level attributes with their defined values

Return type:

dict