# Creating Libera Data Product Files (NetCDF-4)

TODO[LIBSDC-633]: Include discussion here (or in the YAML format description below) of Dimensions, static global attributes, and dynamic attributes.

The Libera SDC provides utilities for creating NetCDF4 data product files that conform to YAML data product definitions and a few global expectations.
These definitions ensure a degree of consistency across all Libera data products, including proper metadata, coordinate systems, and data types.
Libera Utils includes a few different ways to work with data product definitions for validation.

## Product Definition YAML Structure

TODO[LIBSDC-623]: Include discussion of dimension names being in a global namespace.

Product definition files specify the exact structure and metadata for data products:

```yaml
# Product level metadata attributes
attributes:
  ProductID: RAD-4CH
  version: 0.0.1

# Coordinate variables (dimensions)
coordinates:
  time:
    dtype: datetime64[ns]
    dimensions: ["time"]
    attributes:
      long_name: "Time of sample collection"
    encoding:
      units: nanoseconds since 1958-01-01
      calendar: standard
      dtype: int64

# Data variables
variables:
  fil_rad:
    dtype: float64
    dimensions: ["time"]
    attributes:
      long_name: Filtered Radiance
      units: W/(m^2*sr*nm)
      valid_range: [0, 1000]
```

The `LiberaDataProductDefinition` system ensures all Libera data products maintain consistent structure, metadata, and quality standards across the mission.

## Basic Usage

The simplest way to create a valid Libera data product is using the `write_libera_data_product` function:

```python
import numpy as np
from pathlib import Path
from libera_utils.io.netcdf import write_libera_data_product

# Define the path to the data product definition YAML file
product_definition_path = Path("path/to/product_definition.yml")

# Create numpy data arrays for each variable and coordinate required
n_times = 100
time_data = np.arange(n_times).astype("datetime64[ns]") + np.datetime64("2024-01-01")
lat_data = np.linspace(-90, 90, n_times)
lon_data = np.linspace(-180, 180, n_times)
fil_rad_data = np.random.rand(n_times) * 1000  # Filtered radiance values
q_flag_data = np.zeros(n_times, dtype=np.int32)  # Quality flags

# Combine all data into a dictionary
data = {
    "time": time_data,
    "lat": lat_data,
    "lon": lon_data,
    "fil_rad": fil_rad_data,
    "q_flag": q_flag_data,
}

# Call write_libera_data_product with correct arguments
output_dir = Path("/path/to/output/directory")
filename = write_libera_data_product(
    data_product_definition=product_definition_path,
    data=data,
    output_path=output_dir,
    time_variable="time",  # Specify which variable contains time data
    strict=True,  # Enforce strict conformance, raising exceptions on errors
)

print(f"Created data product: {filename.path}")
# Output: Created data product: /path/to/output/directory/LIBERA_L1B_RAD-4CH_V0-0-1_20240101T000000_20240101T002739_R25280215327.nc
```

The function automatically:

- Validates data against the product definition
- Generates a standardized filename based on the time range
- Applies correct encoding and compression settings
- Ensures CF-convention compliance

## Advanced Usage

There are a few other APIs that may be useful for debugging data product creation during development. Once a data product definition is working,
the proper approach is to call `write_libera_data_product` with `strict=True` (the default).

### Creation, Enforcement, and Validation for Existing Dataset

You can use the `LiberaDataProductDefinition` class to create, enforce (modify), and validate xarray Dataset objects.

#### Validation of Conformance

The `LiberaDataProductDefinition.check_dataset_conformance()` method takes a dataset as an argument and checks that
it conforms to the data product definition. The typical usage of this method is in strict mode, which raises an exception for any validation error.
For debugging, you can run with `strict=False`, in which case it returns a (possibly empty) list of informational error messages.

```python
# Call LiberaDataProductDefinition.check_dataset_conformance() with strict=False
# so we get a list of error messages instead of an exception raised.
# This checks the conformance of an existing dataset against a data product definition without modifying it
# Note that the dataset need not be generated by create_conforming_dataset. This method accepts _any_ xarray Dataset object.
validation_errors = dpd.check_dataset_conformance(dataset, strict=False)

# Demonstrate how to use the return values
if validation_errors:
    print(f"\nValidation found {len(validation_errors)} issues")
    print("Attempting to fix issues with enforce_dataset_conformance...")
```

#### Enforcement of Product Definition

The `LiberaDataProductDefinition.enforce_dataset_conformance()` method takes a Dataset as an argument and attempts to enforce all
data product definition requirements by modifying any necessary attributes, data types, and encodings. This doesn't necessarily work, so
we return a list of error messages for any problems that couldn't be fixed.

```python
# Call dpd.enforce_dataset_conformance() to fix problems with a dataset
# This modifies the dataset in-place to fix what it can
# Obviously some problems are not fixable such as missing variables.
fixed_dataset, remaining_errors = dpd.enforce_dataset_conformance(dataset)

# Demonstrate how to use the return values
if not remaining_errors:
    print("Successfully fixed all conformance issues!")
    # Save the fixed dataset
    fixed_dataset.to_netcdf("output_fixed.nc")
else:
    print("Some issues could not be automatically fixed")
    print(f"Remaining issues: {len(remaining_errors)}")
    for error in remaining_errors:
        print(f"  - {error}")
```

#### Direct Creation of Conforming Dataset

```python
import numpy as np
import xarray as xr
from pathlib import Path
from libera_utils.io.product_definition import LiberaDataProductDefinition

# Create LiberaDataProductDefinition object from a YAML file
definition_path = Path("path/to/product_definition.yml")
dpd = LiberaDataProductDefinition.from_yaml(definition_path)

# Prepare your data as numpy arrays
n_times = 50
data = {
    "time": np.arange(n_times).astype("datetime64[ns]") + np.datetime64("2024-03-01"),
    "lat": np.linspace(-90, 90, n_times),
    "lon": np.linspace(-180, 180, n_times),
    "fil_rad": np.random.rand(n_times) * 500,
    "q_flag": np.random.randint(0, 100, n_times, dtype=np.uint32),
}

# Create a Dataset with dpd.create_conforming_dataset() with strict=False
# This allows creation even if some requirements aren't met.
# This method coerces the resulting dataset to conform as closely as possible
# to the data product definition by adding attributes, encodings,
# and coercing data types if possible.
dataset, errors = dpd.create_conforming_dataset(
    data=data,
    strict=False  # Don't raise exceptions, just report issues
)

# Demonstrate how to use the return values
if not errors:
    print("Dataset is valid and conforms to the product definition")
else:
    print(f"Dataset has {len(errors)} conformance issues:")
    for error in errors[:5]:  # Show first 5 errors
        print(f"  - {error}")
```

### Dynamic Global and Variable Attributes

_Note: L2 Developers should not need this._

Avoid this unless absolutely necessary!

When creating datasets, you can specify attributes whose values are not defined in the YAML (i.e. they are null).
We call these "dynamic attributes" and they are required but must be passed directly during dataset creation:

```python
# Set dynamic global attribute values (e.g., algorithm-specific metadata values for required attribute keys)
user_global_attrs = {
    "date_created": "2024-03-15T11:22:33",
    "algorithm_version": "2.1.0",
    "processing_level": "L1B",
}

# Set dynamic variable attribute values
# NOTE: these should be avoided unless absolutely necessary
user_var_attrs = {
    "fil_rad": {
        "calibration_date": "2024-01-01",
        "sensor_id": "LIBERA-001",
    },
    "q_flag": {
        "flag_meanings": "good questionable bad",
        "flag_values": [0, 1, 2],
    },
}

# Create dataset with custom attributes
dataset, errors = dpd.create_conforming_dataset(
    data=data,
    user_global_attributes=user_global_attrs,
    user_variable_attributes=user_var_attrs,
    strict=True,
)
```