# Creating Libera Data Product Files (NetCDF-4) TODO[LIBSDC-633]: Include discussion here (or in the YAML format description below) of Dimensions, static global attributes, and dynamic attributes. The Libera SDC provides utilities for creating NetCDF4 data product files that conform to YAML data product definitions and a few global expectations. These definitions ensure a degree of consistency across all Libera data products, including proper metadata, coordinate systems, and data types. Libera Utils includes a few different ways to work with data product definitions for validation. ## Product Definition YAML Structure TODO[LIBSDC-623]: Include discussion of dimension names being in a global namespace. Product definition files specify the exact structure and metadata for data products: ```yaml # Product level metadata attributes attributes: ProductID: RAD-4CH version: 0.0.1 # Coordinate variables (dimensions) coordinates: time: dtype: datetime64[ns] dimensions: ["time"] attributes: long_name: "Time of sample collection" encoding: units: nanoseconds since 1958-01-01 calendar: standard dtype: int64 # Data variables variables: fil_rad: dtype: float64 dimensions: ["time"] attributes: long_name: Filtered Radiance units: W/(m^2*sr*nm) valid_range: [0, 1000] ``` The `LiberaDataProductDefinition` system ensures all Libera data products maintain consistent structure, metadata, and quality standards across the mission. ## Basic Usage The simplest way to create a valid Libera data product is using the `write_libera_data_product` function: ```python import numpy as np from pathlib import Path from libera_utils.io.netcdf import write_libera_data_product # Define the path to the data product definition YAML file product_definition_path = Path("path/to/product_definition.yml") # Create numpy data arrays for each variable and coordinate required n_times = 100 time_data = np.arange(n_times).astype("datetime64[ns]") + np.datetime64("2024-01-01") lat_data = np.linspace(-90, 90, n_times) lon_data = np.linspace(-180, 180, n_times) fil_rad_data = np.random.rand(n_times) * 1000 # Filtered radiance values q_flag_data = np.zeros(n_times, dtype=np.int32) # Quality flags # Combine all data into a dictionary data = { "time": time_data, "lat": lat_data, "lon": lon_data, "fil_rad": fil_rad_data, "q_flag": q_flag_data, } # Call write_libera_data_product with correct arguments output_dir = Path("/path/to/output/directory") filename = write_libera_data_product( data_product_definition=product_definition_path, data=data, output_path=output_dir, time_variable="time", # Specify which variable contains time data strict=True, # Enforce strict conformance, raising exceptions on errors ) print(f"Created data product: {filename.path}") # Output: Created data product: /path/to/output/directory/LIBERA_L1B_RAD-4CH_V0-0-1_20240101T000000_20240101T002739_R25280215327.nc ``` The function automatically: - Validates data against the product definition - Generates a standardized filename based on the time range - Applies correct encoding and compression settings - Ensures CF-convention compliance ## Advanced Usage There are a few other APIs that may be useful for debugging data product creation during development. Once a data product definition is working, the proper approach is to call `write_libera_data_product` with `strict=True` (the default). ### Creation, Enforcement, and Validation for Existing Dataset You can use the `LiberaDataProductDefinition` class to create, enforce (modify), and validate xarray Dataset objects. #### Validation of Conformance The `LiberaDataProductDefinition.check_dataset_conformance()` method takes a dataset as an argument and checks that it conforms to the data product definition. The typical usage of this method is in strict mode, which raises an exception for any validation error. For debugging, you can run with `strict=False`, in which case it returns a (possibly empty) list of informational error messages. ```python # Call LiberaDataProductDefinition.check_dataset_conformance() with strict=False # so we get a list of error messages instead of an exception raised. # This checks the conformance of an existing dataset against a data product definition without modifying it # Note that the dataset need not be generated by create_conforming_dataset. This method accepts _any_ xarray Dataset object. validation_errors = dpd.check_dataset_conformance(dataset, strict=False) # Demonstrate how to use the return values if validation_errors: print(f"\nValidation found {len(validation_errors)} issues") print("Attempting to fix issues with enforce_dataset_conformance...") ``` #### Enforcement of Product Definition The `LiberaDataProductDefinition.enforce_dataset_conformance()` method takes a Dataset as an argument and attempts to enforce all data product definition requirements by modifying any necessary attributes, data types, and encodings. This doesn't necessarily work, so we return a list of error messages for any problems that couldn't be fixed. ```python # Call dpd.enforce_dataset_conformance() to fix problems with a dataset # This modifies the dataset in-place to fix what it can # Obviously some problems are not fixable such as missing variables. fixed_dataset, remaining_errors = dpd.enforce_dataset_conformance(dataset) # Demonstrate how to use the return values if not remaining_errors: print("Successfully fixed all conformance issues!") # Save the fixed dataset fixed_dataset.to_netcdf("output_fixed.nc") else: print("Some issues could not be automatically fixed") print(f"Remaining issues: {len(remaining_errors)}") for error in remaining_errors: print(f" - {error}") ``` #### Direct Creation of Conforming Dataset ```python import numpy as np import xarray as xr from pathlib import Path from libera_utils.io.product_definition import LiberaDataProductDefinition # Create LiberaDataProductDefinition object from a YAML file definition_path = Path("path/to/product_definition.yml") dpd = LiberaDataProductDefinition.from_yaml(definition_path) # Prepare your data as numpy arrays n_times = 50 data = { "time": np.arange(n_times).astype("datetime64[ns]") + np.datetime64("2024-03-01"), "lat": np.linspace(-90, 90, n_times), "lon": np.linspace(-180, 180, n_times), "fil_rad": np.random.rand(n_times) * 500, "q_flag": np.random.randint(0, 100, n_times, dtype=np.uint32), } # Create a Dataset with dpd.create_conforming_dataset() with strict=False # This allows creation even if some requirements aren't met. # This method coerces the resulting dataset to conform as closely as possible # to the data product definition by adding attributes, encodings, # and coercing data types if possible. dataset, errors = dpd.create_conforming_dataset( data=data, strict=False # Don't raise exceptions, just report issues ) # Demonstrate how to use the return values if not errors: print("Dataset is valid and conforms to the product definition") else: print(f"Dataset has {len(errors)} conformance issues:") for error in errors[:5]: # Show first 5 errors print(f" - {error}") ``` ### Dynamic Global and Variable Attributes _Note: L2 Developers should not need this._ Avoid this unless absolutely necessary! When creating datasets, you can specify attributes whose values are not defined in the YAML (i.e. they are null). We call these "dynamic attributes" and they are required but must be passed directly during dataset creation: ```python # Set dynamic global attribute values (e.g., algorithm-specific metadata values for required attribute keys) user_global_attrs = { "date_created": "2024-03-15T11:22:33", "algorithm_version": "2.1.0", "processing_level": "L1B", } # Set dynamic variable attribute values # NOTE: these should be avoided unless absolutely necessary user_var_attrs = { "fil_rad": { "calibration_date": "2024-01-01", "sensor_id": "LIBERA-001", }, "q_flag": { "flag_meanings": "good questionable bad", "flag_values": [0, 1, 2], }, } # Create dataset with custom attributes dataset, errors = dpd.create_conforming_dataset( data=data, user_global_attributes=user_global_attrs, user_variable_attributes=user_var_attrs, strict=True, ) ```