libera_utils.scene_id.FootprintData#

class libera_utils.scene_id.FootprintData(data: Dataset)#

Bases: object

Container for footprint data with scene identification capabilities.

Manages satellite footprint data through the complete scene identification workflow, including data extraction, preprocessing, derived field calculation, and scene classification.

Parameters:

data (xr.Dataset) – Input dataset containing required footprint variables

_data#

Internal dataset of footprint data. During scene identification, scene IDs are added as variables to this dataset.

Type:

xr.Dataset

process_ssf_and_camera(ssf_path, scene_definitions)#

Process SSF and camera data to identify scenes

process_cldpx_viirs_geos_cam_groundscene()#

Process alternative data format (not implemented)

process_clouds_groundscene()#

Process cloud/ground scene data (not implemented)

Notes

This class handles the complete pipeline from raw satellite data to scene identification, including: 1. Data extraction from NetCDF files 2. Missing value handling 3. Derived field calculation (cloud fraction, optical depth, etc.) 4. Scene ID matching based on classification rules

Methods

from_ceres_ssf(ssf_path)

Process SSF (Single Scanner Footprint) and camera data to identify scenes.

from_cldpx_viirs_geos_cam_groundscene()

Process cloud pixel/VIIRS/GEOS/camera/ground scene data format.

from_clouds_groundscene()

Process clouds/ground scene data format.

identify_scenes([scene_definitions, ...])

Identify and assign scene IDs to all footprints based on scene definitions.

export_to_netcdf

__init__(data: Dataset)#

Methods

export_to_netcdf(netcdf_path)

from_ceres_ssf(ssf_path)

Process SSF (Single Scanner Footprint) and camera data to identify scenes.

from_cldpx_viirs_geos_cam_groundscene()

Process cloud pixel/VIIRS/GEOS/camera/ground scene data format.

from_clouds_groundscene()

Process clouds/ground scene data format.

identify_scenes([scene_definitions, ...])

Identify and assign scene IDs to all footprints based on scene definitions.

_calculate_required_fields(result_fields: list[str])#

Calculate necessary derived fields on data from input FootprintVariables.

Computes derived atmospheric variables needed for scene identification, handling dependencies between calculated fields automatically.

Parameters:

result_fields (list of str) – List of field names to calculate (e.g., ‘cloud_fraction’, ‘optical_depth’)

Raises:

ValueError – If an unknown field is requested or if circular dependencies exist

Notes

This method modifies self._data in place to conserve memory. It automatically resolves dependencies between calculated fields (e.g., optical depth depends on cloud fraction being calculated first).

The calculation order is determined by dependency analysis and may require multiple passes. A maximum of 30 iterations is allowed to prevent infinite loops from circular dependencies.

Available calculated fields are defined in _CALCULATED_VARIABLE_MAP.

_calculate_single_field_from_spec(spec: CalculationSpec, calculated: list[str])#

Calculate a single field from input FootprintVariables.

Applies the calculation function specified in the CalculationSpec to the input variables, creating a new variable in the dataset.

Parameters:
  • spec (CalculationSpec) – Specification defining the calculation to perform

  • calculated (list of str) – List of variable names already available in the dataset

Raises:

ValueError – If required input variables are not available in the dataset

_convert_missing_values(input_missing_value: float)#

Convert input missing values in footprint data to output missing values.

This method standardizes missing value representations by converting from the input dataset’s missing value convention to the output convention used in FootprintData processing (np.NaN).

Parameters:

input_missing_value (float) – Missing value indicator used in input data (e.g., -999.0, 9.96921e+36)

Notes

Handles two cases: - If input_missing_value is NaN: Uses np.isnan() for comparison - If input_missing_value is numeric: Uses direct equality comparison

Modifies self._data in place, replacing all occurrences of input_missing_value with np.NaN.

Examples

>>> footprint._data = xr.Dataset({'temp': [20.0, -999.0, 25.0]})
>>> footprint._convert_missing_values(-999.0)
>>> print(footprint._data['temp'].values)
array([20., nan, 25.])
static _extract_data_from_CeresSSFNOAA20FM6Ed1C(dataset: Dataset) Dataset#

Extract data from CERES SSF file (using numpy arrays).

Parameters:
  • dataset (netCDF4.Dataset) – Open NetCDF4 dataset in CeresSSFNOAA20FM6Ed1C format

  • chunk_size (int, optional) – Number of footprints per chunk along the first dimension (parameter kept for compatibility but not used)

Returns:

Dataset containing extracted footprint variables as numpy arrays

Return type:

xr.Dataset

_fill_column_above_max_value(column_name: str, threshold: float, fill_value=nan)#

Replace values above threshold with fill value for specified column.

Parameters:
  • column_name (str) – Name of the column/variable to process

  • threshold (float) – Maximum allowed value - values above this will be replaced

  • fill_value (float, optional) – Value to use as replacement for out-of-range data. Default is NaN.

Raises:

ValueError – If the specified column is not found in the dataset

Examples

>>> footprint._data = xr.Dataset({'cloud_fraction': [50, 120, 80]})
>>> footprint._fill_column_above_max_value('cloud_fraction', 100.0)
>>> print(footprint._data['cloud_fraction'].values)
array([50., nan, 80.])
classmethod from_ceres_ssf(ssf_path: Path)#

Process SSF (Single Scanner Footprint) and camera data to identify scenes.

Reads CERES SSF data, extracts relevant variables, calculates derived fields, and identifies scene classifications for each footprint.

Parameters:
  • ssf_path (pathlib.Path) – Path to the SSF NetCDF file (CeresSSFNOAA20FM6Ed1C format)

  • scene_definitions (list of SceneDefinition) – List of scene definition objects to apply for classification

Returns:

Processed footprint data object containing original variables, calculated derived fields, and scene IDs.

Return type:

FootprintData

Raises:

FileNotFoundError – If the SSF file cannot be found or opened

Notes

Processing steps: 1. Extract variables from SSF NetCDF groups 2. Apply maximum value thresholds to cloud properties 3. Calculate derived fields (cloud fraction, optical depth, wind speed, etc.) 4. Match footprints to scene IDs using provided scene definitions

Maximum value thresholds applied: - Cloud fraction: 100% - Cloud phase: 2 (ice) - Optical depth: 500

Examples

>>> scene_defs = [SceneDefinition(Path("trmm.csv"))]
>>> footprint_data = FootprintData.from_ceres_ssf(
...     Path("CERES_SSF_NOAA20_2024001.nc"),
...     scene_defs
... )
classmethod from_cldpx_viirs_geos_cam_groundscene()#

Process cloud pixel/VIIRS/GEOS/camera/ground scene data format.

Raises:

NotImplementedError – This data format is not yet supported

Notes

TODO: LIBSDC-672 Implement processing for alternative data formats including: - Cloud pixel data - VIIRS observations - GEOS model data - Camera data - Ground scene classifications

classmethod from_clouds_groundscene()#

Process clouds/ground scene data format.

Raises:

NotImplementedError – This data format is not yet supported

Notes

TODO: LIBSDC-673 Implement processing for cloud and ground scene data formats.

identify_scenes(scene_definitions: list[~libera_utils.scene_definitions.SceneDefinition] = [<libera_utils.scene_definitions.SceneDefinition object>, <libera_utils.scene_definitions.SceneDefinition object>], additional_scene_definitions_files: list[~pathlib.Path] | None = None)#

Identify and assign scene IDs to all footprints based on scene definitions.

Applies scene classification rules from one or more SceneDefinition objects to assign scene IDs to each footprint in the dataset.

Parameters:
  • scene_definitions (list[SceneDefinition]) – List of SceneDefinition objects from standard libera_utils definitions

  • additional_scene_definitions_files (list of pathlib.Path or None) – List of scene definition files containing classification rules for custom analysis.

Notes

This method modifies self._data in place by adding scene IDs for each row of footprint data.

For each SceneDefinition provided: 1. Validates that all required variables exist in the footprint data 2. Matches each footprint to a scene based on variable ranges 3. Adds a new variable to the dataset with the scene IDs

Footprints that don’t match any scene are assigned a scene ID of 0.

TODO: LIBSDC-674 - Add unfiltering scene ID algorithm

Examples

>>> footprint_data = FootprintData(dataset)
>>> footprint_data.identify_scenes()