libera_utils.scene_id#
Module for mapping radiometer footprints to scene IDs.
This module provides functionality for identifying and classifying atmospheric scenes based on footprint data from satellite observations.
Functions
|
Calculate cloud fraction from clear sky area percentage. |
Calculate weighted optical depth from upper and lower cloud layers. |
|
|
Calculate weighted cloud phase from upper and lower cloud layers. |
|
Calculate total surface wind speed from u and v vector components. |
|
Convert TRMM surface type to IGBP surface type classification. |
Classes
|
Specification for calculating a derived variable. |
|
Container for footprint data with scene identification capabilities. |
|
Standardized variable names for footprint data processing. |
|
Enumeration of surface types used in scene classification. |
|
Enumeration of TRMM surface types used in ERBE and TRMM scene classification. |
- class libera_utils.scene_id.CalculationSpec(output_var: str, function: Callable, input_vars: list[str], output_datatype: type, dependent_calculations: list[str] | None = None)#
Specification for calculating a derived variable.
Defines the parameters needed to calculate a derived variable from input data, including the calculation function, required inputs, and any dependencies on other calculated variables.
- function#
The function to call for calculation
- Type:
Callable
- dependent_calculations#
List of other calculated variables that must be computed first, or None if no dependencies exist. Default is None.
Examples
>>> spec = CalculationSpec( ... output_var="cloud_fraction", ... function=calculate_cloud_fraction, ... input_vars=["clear_area"], ... output_datatype=float ... )
- Attributes:
- dependent_calculations
- class libera_utils.scene_id.FootprintData(data: Dataset)#
Container for footprint data with scene identification capabilities.
Manages satellite footprint data through the complete scene identification workflow, including data extraction, preprocessing, derived field calculation, and scene classification.
- Parameters:
data (xr.Dataset) – Input dataset containing required footprint variables
- _data#
Internal dataset of footprint data. During scene identification, scene IDs are added as variables to this dataset.
- Type:
xr.Dataset
- process_ssf_and_camera(ssf_path, scene_definitions)#
Process SSF and camera data to identify scenes
- process_cldpx_viirs_geos_cam_groundscene()#
Process alternative data format (not implemented)
- process_clouds_groundscene()#
Process cloud/ground scene data (not implemented)
Notes
This class handles the complete pipeline from raw satellite data to scene identification, including: 1. Data extraction from NetCDF files 2. Missing value handling 3. Derived field calculation (cloud fraction, optical depth, etc.) 4. Scene ID matching based on classification rules
Methods
from_ceres_ssf(ssf_path)Process SSF (Single Scanner Footprint) and camera data to identify scenes.
Process cloud pixel/VIIRS/GEOS/camera/ground scene data format.
Process clouds/ground scene data format.
identify_scenes([scene_definitions, ...])Identify and assign scene IDs to all footprints based on scene definitions.
export_to_netcdf
- _calculate_required_fields(result_fields: list[str])#
Calculate necessary derived fields on data from input FootprintVariables.
Computes derived atmospheric variables needed for scene identification, handling dependencies between calculated fields automatically.
- Parameters:
result_fields (list of str) – List of field names to calculate (e.g., ‘cloud_fraction’, ‘optical_depth’)
- Raises:
ValueError – If an unknown field is requested or if circular dependencies exist
Notes
This method modifies self._data in place to conserve memory. It automatically resolves dependencies between calculated fields (e.g., optical depth depends on cloud fraction being calculated first).
The calculation order is determined by dependency analysis and may require multiple passes. A maximum of 30 iterations is allowed to prevent infinite loops from circular dependencies.
Available calculated fields are defined in _CALCULATED_VARIABLE_MAP.
- _calculate_single_field_from_spec(spec: CalculationSpec, calculated: list[str])#
Calculate a single field from input FootprintVariables.
Applies the calculation function specified in the CalculationSpec to the input variables, creating a new variable in the dataset.
- Parameters:
spec (CalculationSpec) – Specification defining the calculation to perform
calculated (list of str) – List of variable names already available in the dataset
- Raises:
ValueError – If required input variables are not available in the dataset
- _convert_missing_values(input_missing_value: float)#
Convert input missing values in footprint data to output missing values.
This method standardizes missing value representations by converting from the input dataset’s missing value convention to the output convention used in FootprintData processing (np.NaN).
- Parameters:
input_missing_value (float) – Missing value indicator used in input data (e.g., -999.0, 9.96921e+36)
Notes
Handles two cases: - If input_missing_value is NaN: Uses np.isnan() for comparison - If input_missing_value is numeric: Uses direct equality comparison
Modifies self._data in place, replacing all occurrences of input_missing_value with np.NaN.
Examples
>>> footprint._data = xr.Dataset({'temp': [20.0, -999.0, 25.0]}) >>> footprint._convert_missing_values(-999.0) >>> print(footprint._data['temp'].values) array([20., nan, 25.])
- static _extract_data_from_CeresSSFNOAA20FM6Ed1C(dataset: Dataset) Dataset#
Extract data from CERES SSF file (using numpy arrays).
- Parameters:
dataset (netCDF4.Dataset) – Open NetCDF4 dataset in CeresSSFNOAA20FM6Ed1C format
chunk_size (int, optional) – Number of footprints per chunk along the first dimension (parameter kept for compatibility but not used)
- Returns:
Dataset containing extracted footprint variables as numpy arrays
- Return type:
xr.Dataset
- _fill_column_above_max_value(column_name: str, threshold: float, fill_value=nan)#
Replace values above threshold with fill value for specified column.
- Parameters:
- Raises:
ValueError – If the specified column is not found in the dataset
Examples
>>> footprint._data = xr.Dataset({'cloud_fraction': [50, 120, 80]}) >>> footprint._fill_column_above_max_value('cloud_fraction', 100.0) >>> print(footprint._data['cloud_fraction'].values) array([50., nan, 80.])
- classmethod from_ceres_ssf(ssf_path: Path)#
Process SSF (Single Scanner Footprint) and camera data to identify scenes.
Reads CERES SSF data, extracts relevant variables, calculates derived fields, and identifies scene classifications for each footprint.
- Parameters:
ssf_path (pathlib.Path) – Path to the SSF NetCDF file (CeresSSFNOAA20FM6Ed1C format)
scene_definitions (list of SceneDefinition) – List of scene definition objects to apply for classification
- Returns:
Processed footprint data object containing original variables, calculated derived fields, and scene IDs.
- Return type:
- Raises:
FileNotFoundError – If the SSF file cannot be found or opened
Notes
Processing steps: 1. Extract variables from SSF NetCDF groups 2. Apply maximum value thresholds to cloud properties 3. Calculate derived fields (cloud fraction, optical depth, wind speed, etc.) 4. Match footprints to scene IDs using provided scene definitions
Maximum value thresholds applied: - Cloud fraction: 100% - Cloud phase: 2 (ice) - Optical depth: 500
Examples
>>> scene_defs = [SceneDefinition(Path("trmm.csv"))] >>> footprint_data = FootprintData.from_ceres_ssf( ... Path("CERES_SSF_NOAA20_2024001.nc"), ... scene_defs ... )
- classmethod from_cldpx_viirs_geos_cam_groundscene()#
Process cloud pixel/VIIRS/GEOS/camera/ground scene data format.
- Raises:
NotImplementedError – This data format is not yet supported
Notes
TODO: LIBSDC-672 Implement processing for alternative data formats including: - Cloud pixel data - VIIRS observations - GEOS model data - Camera data - Ground scene classifications
- classmethod from_clouds_groundscene()#
Process clouds/ground scene data format.
- Raises:
NotImplementedError – This data format is not yet supported
Notes
TODO: LIBSDC-673 Implement processing for cloud and ground scene data formats.
- identify_scenes(scene_definitions: list[~libera_utils.scene_definitions.SceneDefinition] = [<libera_utils.scene_definitions.SceneDefinition object>, <libera_utils.scene_definitions.SceneDefinition object>], additional_scene_definitions_files: list[~pathlib.Path] | None = None)#
Identify and assign scene IDs to all footprints based on scene definitions.
Applies scene classification rules from one or more SceneDefinition objects to assign scene IDs to each footprint in the dataset.
- Parameters:
scene_definitions (list[SceneDefinition]) – List of SceneDefinition objects from standard libera_utils definitions
additional_scene_definitions_files (list of pathlib.Path or None) – List of scene definition files containing classification rules for custom analysis.
Notes
This method modifies self._data in place by adding scene IDs for each row of footprint data.
For each SceneDefinition provided: 1. Validates that all required variables exist in the footprint data 2. Matches each footprint to a scene based on variable ranges 3. Adds a new variable to the dataset with the scene IDs
Footprints that don’t match any scene are assigned a scene ID of 0.
TODO: LIBSDC-674 - Add unfiltering scene ID algorithm
Examples
>>> footprint_data = FootprintData(dataset) >>> footprint_data.identify_scenes()
- class libera_utils.scene_id.FootprintVariables(value)#
Standardized variable names for footprint data processing.
This class defines consistent naming conventions for all variables used in the scene identification workflow, including both input variables from satellite data products and calculated derived fields.
Methods
capitalize(/)Return a capitalized version of the string.
casefold(/)Return a version of the string suitable for caseless comparisons.
center(width[, fillchar])Return a centered string of length width.
count(sub[, start[, end]])Return the number of non-overlapping occurrences of substring sub in string S[start:end].
encode(/[, encoding, errors])Encode the string using the codec registered for encoding.
endswith(suffix[, start[, end]])Return True if S ends with the specified suffix, False otherwise.
expandtabs(/[, tabsize])Return a copy where all tab characters are expanded using spaces.
find(sub[, start[, end]])Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].
format(*args, **kwargs)Return a formatted version of S, using substitutions from args and kwargs.
format_map(mapping)Return a formatted version of S, using substitutions from mapping.
index(sub[, start[, end]])Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end].
isalnum(/)Return True if the string is an alpha-numeric string, False otherwise.
isalpha(/)Return True if the string is an alphabetic string, False otherwise.
isascii(/)Return True if all characters in the string are ASCII, False otherwise.
isdecimal(/)Return True if the string is a decimal string, False otherwise.
isdigit(/)Return True if the string is a digit string, False otherwise.
isidentifier(/)Return True if the string is a valid Python identifier, False otherwise.
islower(/)Return True if the string is a lowercase string, False otherwise.
isnumeric(/)Return True if the string is a numeric string, False otherwise.
isprintable(/)Return True if the string is printable, False otherwise.
isspace(/)Return True if the string is a whitespace string, False otherwise.
istitle(/)Return True if the string is a title-cased string, False otherwise.
isupper(/)Return True if the string is an uppercase string, False otherwise.
join(iterable, /)Concatenate any number of strings.
ljust(width[, fillchar])Return a left-justified string of length width.
lower(/)Return a copy of the string converted to lowercase.
lstrip([chars])Return a copy of the string with leading whitespace removed.
maketrans(x[, y, z])Return a translation table usable for str.translate().
partition(sep, /)Partition the string into three parts using the given separator.
removeprefix(prefix, /)Return a str with the given prefix string removed if present.
removesuffix(suffix, /)Return a str with the given suffix string removed if present.
replace(old, new[, count])Return a copy with all occurrences of substring old replaced by new.
rfind(sub[, start[, end]])Return the highest index in S where substring sub is found, such that sub is contained within S[start:end].
rindex(sub[, start[, end]])Return the highest index in S where substring sub is found, such that sub is contained within S[start:end].
rjust(width[, fillchar])Return a right-justified string of length width.
rpartition(sep, /)Partition the string into three parts using the given separator.
rsplit(/[, sep, maxsplit])Return a list of the substrings in the string, using sep as the separator string.
rstrip([chars])Return a copy of the string with trailing whitespace removed.
split(/[, sep, maxsplit])Return a list of the substrings in the string, using sep as the separator string.
splitlines(/[, keepends])Return a list of the lines in the string, breaking at line boundaries.
startswith(prefix[, start[, end]])Return True if S starts with the specified prefix, False otherwise.
strip([chars])Return a copy of the string with leading and trailing whitespace removed.
swapcase(/)Convert uppercase characters to lowercase and lowercase characters to uppercase.
title(/)Return a version of the string where each word is titlecased.
translate(table, /)Replace each character in the string using the given translation table.
upper(/)Return a copy of the string converted to uppercase.
zfill(width, /)Pad a numeric string with zeros on the left, to fill a field of the given width.
- class libera_utils.scene_id.IGBPSurfaceType(value)#
Enumeration of surface types used in scene classification.
These surface types are derived from IGBP (International Geosphere-Biosphere Programme) land cover classifications.
- IGBP_1 through IGBP_20
TRMM surface type categories (values: 1-20)
- Type:
- Attributes:
denominatorthe denominator of a rational number in lowest terms
imagthe imaginary part of a complex number
numeratorthe numerator of a rational number in lowest terms
realthe real part of a complex number
Methods
Return integer ratio.
bit_count(/)Number of ones in the binary representation of the absolute value of self.
bit_length(/)Number of bits necessary to represent self in binary.
Returns self, the complex conjugate of any int.
from_bytes(/, bytes[, byteorder, signed])Return the integer represented by the given array of bytes.
to_bytes(/[, length, byteorder, signed])Return an array of bytes representing an integer.
- property trmm_surface_type: TRMMSurfaceType#
Map IGBP surface type to corresponding TRMM surface type.
- Returns:
The corresponding TRMM surface type category
- Return type:
Examples
>>> IGBPSurfaceType.EVERGREEN_NEEDLELEAF_FOREST.trmm_surface_type <TRMMSurfaceType.HI_SHRUB: 1> >>> IGBPSurfaceType.WATER_BODIES.trmm_surface_type <TRMMSurfaceType.OCEAN: 0>
- class libera_utils.scene_id.TRMMSurfaceType(value)#
Enumeration of TRMM surface types used in ERBE and TRMM scene classification.
- Attributes:
denominatorthe denominator of a rational number in lowest terms
imagthe imaginary part of a complex number
numeratorthe numerator of a rational number in lowest terms
realthe real part of a complex number
Methods
Return integer ratio.
bit_count(/)Number of ones in the binary representation of the absolute value of self.
bit_length(/)Number of bits necessary to represent self in binary.
Returns self, the complex conjugate of any int.
from_bytes(/, bytes[, byteorder, signed])Return the integer represented by the given array of bytes.
to_bytes(/[, length, byteorder, signed])Return an array of bytes representing an integer.
- libera_utils.scene_id.calculate_cloud_fraction(clear_area: float | ndarray[Any, dtype[floating]]) float | ndarray[Any, dtype[floating]]#
Calculate cloud fraction from clear sky area percentage.
- Parameters:
clear_area (float or ndarray) – Clear area percentage (0-100)
- Returns:
Cloud fraction percentage (0-100), calculated as 100 - clear_area
- Return type:
float or ndarray
- Raises:
ValueError – If clear_area contains values less than 0 or greater than 100
Examples
>>> calculate_cloud_fraction(30.0) 70.0 >>> calculate_cloud_fraction(np.array([10, 25, 90])) array([90, 75, 10])
- libera_utils.scene_id.calculate_cloud_fraction_weighted_optical_depth(optical_depth_lower: float | ndarray[Any, dtype[floating]], optical_depth_upper: float | ndarray[Any, dtype[floating]], cloud_fraction_lower: float | ndarray[Any, dtype[floating]], cloud_fraction_upper: float | ndarray[Any, dtype[floating]], cloud_fraction: float | ndarray[Any, dtype[floating]]) float | ndarray[Any, dtype[floating]]#
Calculate weighted optical depth from upper and lower cloud layers.
Combines optical depth measurements from two atmospheric layers using cloud fraction weighting to produce a single representative optical depth value.
- Parameters:
optical_depth_lower (float or ndarray) – Optical depth for lower cloud layer (dimensionless)
optical_depth_upper (float or ndarray) – Optical depth for upper cloud layer (dimensionless)
cloud_fraction_lower (float or ndarray) – Cloud fraction for lower layer (0-100)
cloud_fraction_upper (float or ndarray) – Cloud fraction for upper layer (0-100)
cloud_fraction (float or ndarray) – Total cloud fraction (0-100)
- Returns:
Optical depth weighted by cloud fraction and summed across layers, or np.nan if no valid data or zero total cloud fraction
- Return type:
float or ndarray
- libera_utils.scene_id.calculate_cloud_phase(cloud_phase_lower: float | ndarray[Any, dtype[floating]], cloud_phase_upper: float | ndarray[Any, dtype[floating]], cloud_fraction_lower: float | ndarray[Any, dtype[floating]], cloud_fraction_upper: float | ndarray[Any, dtype[floating]], cloud_fraction: float | ndarray[Any, dtype[floating]], optical_depth_lower: float | ndarray[Any, dtype[floating]], optical_depth_upper: float | ndarray[Any, dtype[floating]]) float | ndarray[Any, dtype[floating]]#
Calculate weighted cloud phase from upper and lower cloud layers.
Computes the dominant cloud phase by weighting each layer’s phase by its cloud fraction contribution and rounding to the nearest integer phase classification (1=liquid, 2=ice).
- Parameters:
cloud_phase_lower (float or ndarray) – Cloud phase for lower layer (1=liquid, 2=ice)
cloud_phase_upper (float or ndarray) – Cloud phase for upper layer (1=liquid, 2=ice)
cloud_fraction_lower (float or ndarray) – Cloud fraction for lower layer (0-100)
cloud_fraction_upper (float or ndarray) – Cloud fraction for upper layer (0-100)
cloud_fraction (float or ndarray) – Total cloud fraction (0-100)
optical_depth_lower (float or ndarray) – Optical depth for lower layer (used for NaN check)
optical_depth_upper (float or ndarray) – Optical depth for upper layer (used for NaN check)
- Returns:
Cloud phase weighted by cloud fraction and rounded to nearest integer (1=liquid, 2=ice), or np.nan if no valid data
- Return type:
float or ndarray
- libera_utils.scene_id.calculate_surface_wind(surface_wind_u: float | ndarray[Any, dtype[floating]], surface_wind_v: float | ndarray[Any, dtype[floating]]) float | ndarray[Any, dtype[floating]]#
Calculate total surface wind speed from u and v vector components.
- Parameters:
- Returns:
Total wind speed magnitude (m/s), or np.nan where input components are NaN
- Return type:
float or ndarray
Notes
Wind speed is calculated using the Pythagorean theorem: sqrt(u^2 + v^2). NaN values in either component result in NaN output for that position.
Examples
>>> calculate_surface_wind(3.0, 4.0) 5.0 >>> calculate_surface_wind(np.array([3, np.nan]), np.array([4, 5])) array([5., nan])
- libera_utils.scene_id.calculate_trmm_surface_type(igbp_surface_type: int | ndarray[Any, dtype[integer]]) int | ndarray[Any, dtype[integer]]#
Convert TRMM surface type to IGBP surface type classification.
- Parameters:
igbp_surface_type (int or ndarray of int) – IGBP surface type codes
- Returns:
TRMM surface type codes
- Return type:
- Raises:
ValueError – If any input values cannot be converted to a valid IGBP surface type
Notes
The conversion uses a lookup table derived from the TRMMSurfaceType.value property. Values that don’t correspond to valid TRMM surface types will raise a ValueError.
Examples
>>> calculate_trmm_surface_type(1) 5 # Maps IGBP HI_SHRUB back to TRMM type 5 >>> calculate_trmm_surface_type(np.array([1, 0])) array([5, 17]) >>> calculate_trmm_surface_type(999) ValueError: Cannot convert IGBP surface type value(s) to TRMM surface type: [999]