Libera Data Models and Database Schema in DynamoDB on AWS#

Under Review (Remove before release)#

This document provides an overview of the data models used in the Libera databases in AWS DynamoDB. The Libera databases are designed to take advantage of the DynamoDB pricing model to minimize costs and utilize the serverless nature of DyanmoDB to facilitate an asynchronous event-driven architecture.

See DynamoDB Basics in AWS for more information on the basics of DynamoDB before diving into this page if you are unfamiliar with this AWS service.

This documentation will provide the examples of the data models for different DynamoDB tables used in the Libera SDC. Each visual table here will describe the key-value nature of the DynamoDB tables and how they are used in the Libera in the following way:

Partition Key	Sort Key	Attribute 1 Key	Attribute 2 Key	…	Attribute N Key
PK Value Example	SK Value Example	Value1 Example	Value2 Example	…	ValueN Ex.
PK Description	SK Description	Attribute 1 Desc	Attribute 2 Desc	…	Attribute N Desc

Note: Many tables in the Libera SDC use a generic key-value entry for the two keys (partition and sort) as follows:

Partition Key - “PK”:”value”
Sort Key- “SK”:”value”

This is in contrast to a more specifically named key-value entry that is more readable by a human, but can be restrictive in the DynamoDB table design as it relates to vertical partitioning and logical grouping of data. For example, below is a more human-readable key set that is not used in the Libera SDC, but is representative of logical example of how the keys are used.

Partition Key - ‘Filename’:”Example_file.nc”
Sort Key - ‘Filetype’:”L0”

This is generic usage of “PK” and “SK” is a common pattern in DynamoDB to allow for easy access to the data in the table and allows greater flexibility in the logical design of the data model without confusing semantics of the keys.

Libera Metadata and Provenance Database#

Metadata Data Model#

This database is used to store metadata and provenance information about the data files generated by the Libera SDC. The metadata for each file can broadly be broken down into three categories that are all stored together in the same table:

File metadata about the file itself, such as the filename, the time it was archived, and the version of the algorithm used to generate the data file
Sortable metadata about the file, such as the applicable date of the data in the file, calibration versions used, and other metadata that can be used to sort and filter the data in other processing steps
Additional metadata that is specific to a given data level. For example, the construction records for level 0 data files contain a wide array of additional metadata that is not present in higher level data products.

Minimum Database Examples#

Additional attributes may be added as needed for specific data products, this shows the expected minimum.

File Metadata Example#

PK	SK	archive-time	algorithm-version
“Example_file.nc”	“#”	“2024-01-01 00:00:00”	“1.0.0”
Unique Filename	Placeholder	Archive Time	Algorithm Version

Sortable Metadata Example#

PK	SK	applicable-date
“Example_file.nc”	“#L0#SPICE#AZ”	“2024-01-01”
Unique Filename	File Type Identifier	Applicable Date of Data

Additional Metadata Example#

No minimum requirement.

Level 0 (L0) Data Specifics#

These are the raw binary data files received from ASDC and come as two separate files: construct records and PDS files.

Construction Record (CR)#

CR File Metadata#

PK	SK	ingest-time	archive-time	algorithm-version
“L0_CONS_file.PDS”	“#”	“2024-01-01 00:00:00”	“2024-01-01 00:01:00”	“1.0.0”
Unique Filename	Placeholder	Applicable Date of Data	Archive Time	Algorithm Version

CR Sortable Metadata#

PK	SK	applicable-date	first-packet-time	last-packet-time	missing-packet-count	filled-gap-count
“L0_CONS_file.PDS”	“#L0#APID11”	“2024-01-01”	“2024-01-01 00:00:00”	“2024-01-01 00:00:00”	0	0
Unique Filename	File Type Identifier	Applicable Date of Data	First Packet Time	Last Packet Time	Missing Packet Count	Filled Gap Count

CR Additional Metadata (Not Comprehensive of all Entries)#

PK	SK	filename	edos-version	SCID	APID	…
“L0_CONS_file.PDS”	“#PDS”	“L0_PDS_file.PDS”
Unique Filename	Identifier	Applicable Date of Data
——————–	————–	————————-	———————–	—–	–	–
“L0_CONS_file.PDS”	“#APID11”		“1.0.0”	1	11	…
Unique Filename	Identifier		EDOS Version	SCID	APID	…

The above example shows only 2 example items. The construction record file has approximately 10 more items associated with it that are stored in this table. See the io/pds.py file for the full details of items stored in the construction record.

PDS File#

PDS File Metadata#

This is the same as the construction record file metadata.

PK	SK	ingest-time	archive-time	algorithm-version
“L0_file.nc”	“#”	“2024-01-01 00:00:00”	“2024-01-01 00:01:00”	“1.0.0”
Unique Filename	File Type	Applicable Date of Data	Archive Time	Algorithm Version

PDS Sortable Metadata#

This is covered in the construction record sortable metadata.

PDS Additional Metadata#

This is covered by the construction record additional metadata.

Higher Level Data Products#

TBD

Calibration Data Products#

Cal File Metadata#

PK	SK	archive-time	calibration-version
“Example_calibration_file.nc”	“#CAL#L0”	“2024-01-01 00:00:00”	“1.0.0”
Unique Filename	Identifier	Archive Time	Calibration Version

Cal Sortable Metadata#

Not applicable.

Cal Additional Metadata#

TBD.

Using this Data Model with a Global Secondary Index (GSI)#

The Libera databases make use of a global secondary index (GSI) to allow for efficient querying of the data in the table using an alternate key. This is used to define specific “Use Cases” that allow for querying the data in different ways from the standard primary key of PK + SK.

NOTE: GSI’s are not required to be unique with a primary key (PK + SK).

The GSI uses the existing attributes in the existing database and allows for a different way to access the data in the table in more efficient ways than scanning the whole database manually then sorting yourself.

Libera Metadata Database GSI’s#

Date searching GSI - This GSI is used to search for all files that have a specific date associated with them and get the filenames associated with particular identifier.

applicable-date	SK	PK
“2024-01-01”	“#L0”	“L0_CONS_file.PDS”
Applicable Date	Type Identifier	Unique Filename

Calibration Version searching GSI (TBD) - This GSI is used to search for all files that have a specific calibration identifier and retrieve the version associated with them.

SK	****	PK	version
“#Cal”		“Example_calibration_file.nc “	“1.0.0”
Type Identifier		Unique Filename	Calibration Version

Additional GSI’s are TBD.

Implementation#

Implementation specifics for general data products live in the db/dynamodb_utils.py and are under active development.

Specifics for level 0 (L0) raw data files are in the io/pds.py file. This is for construction records and PDS files received from ASDC, and these are used and refined in the io/pds-ingest.py.

The handling of this data model for higher level products is TBD with the basic structure and tools in place in the db/dynamodb_utils.py file.