The IAEA Marine Radioactivity Information System (MARIS) provides open access to radioactivity measurements in marine environments. Developed by the IAEA Marine Environmental Laboratories in Monaco, MARIS offers data on seawater, biota, sediment, and suspended matter.
This Python package includes command-line tools to convert MARIS
datasets into NetCDF
or .csv formats, enhancing compatibility with various scientific and
data analysis software.
marisco is built around the concept of handlers, specialized modules
designed to convert MARIS datasets into NetCDF format. Each handler is
tailored to a specific data provider and implemented as a dedicated
Jupyter notebook.
We’ve adopted a Literate Programming approach, which means:
- Documentation: Each handler serves as comprehensive documentation.
- Code Reference: The notebooks contain the actual implementation code.
- Communication Tool: They facilitate discussions with data providers about discrepancies or inconsistencies.
To achieve this, we leverage nbdev, a powerful tool that allows us to:
- Write code within Jupyter notebooks
- Automatically export relevant parts as dedicated Python modules
This approach bridges the gap between documentation and implementation, ensuring they remain in sync.
For a concrete example of this approach, check out our GEOTRACES dataset handler implementation.
MARISCO includes a suite of specialized data handlers designed to:
- Convert provider-specific data formats into standardized MARIS NetCDF files
- Ensure data quality and consistency across providers
- Facilitate integration with the MARIS marine radioactivity database
- Support automated data processing workflows
The following handlers are currently implemented:
| Handler | Description | Link to Data Source |
|---|---|---|
| MARIS Legacy | All legacy MARIS datasets from the MARIS Master Database | - |
| HELCOM | HELCOM marine environment protection datasets | HELCOM |
| OSPAR | OSPAR marine environment datasets | ODIMS OSPAR |
| TEPCO | TEPCO Fukushima monitoring data | TEPCO Monitoring |
| GEOTRACES | BODC GEOTRACES oceanographic data | GEOTRACES IDP2021 |
Now, to install marisco simply run
pip install mariscoYou need to set up your Zotero API key. marisco automatically
retrieves bibliographic metadata for MARIS datasets from
Zotero.
To do so, define the following environment variable containing the MARIS Zotero API key:
export ZOTERO_API_KEY=your_api_key_hereImportant
Please contact MARIS Administrators to get your API key.
All commands accept a -h argument to get access to its documentation.
Convert helcom, geotraces, tepco or ospar marine radioactivity
datasets to MARIS NetCDF4 format.
usage: maris_to_nc [-h] [--src SRC] ds dest
positional arguments:
ds Name of the dataset to encode as NetCDF4
dest Output path for NetCDF file
options:
-h, --help show this help message and exit
--src SRC Optional input data path only required for the 'GEOTRACES' dataset
For instance: maris_to_nc ospar 191-OSPAR-2024.nc
The MARIS Master Database integrates two types of datasets:
- Historical datasets retrieved from published scientific papers
- Ongoing monitoring data from international programs like
HELCOM,OSPAR,TEPCO, andGEOTRACES
This command-line utility converts MARIS datasets from their legacy format to NetCDF4, making them more accessible for modern data analysis workflows. Users can either convert the entire database or specify particular datasets by their reference IDs for selective conversion.
usage: maris_db_to_nc [-h] [--ref_ids REF_IDS] src dest
Convert MARIS legacy database to NetCDF4 format. If ref_ids is provided as comma-separated values, only encodes those subsets.
positional arguments:
src Path to MARIS database dump as `.txt` file
dest Output path for NetCDF file(s)
options:
-h, --help show this help message and exit
--ref_ids REF_IDS Optional comma-separated reference IDs (e.g., "123,456,789") (default: )
For instance:
maris_db_to_nc "~/pro/data/maris/2024-11-20 MARIS_QA_shapetype_id=1.txt" ~/pro/tmp/output- or
maris_db_to_nc "~/pro/data/maris/2024-11-20 MARIS_QA_shapetype_id=1.txt" ~/pro/tmp/output --ref_ids="16,30"for a subset of the MARIS Master Database.
This utility converts NetCDF files to CSV files that conform to the MARIS Standard format, originally designed for OpenRefine workflows.
Although MARISCO has now superseded OpenRefine in the data preparation pipeline, the MARIS master database continues to require CSV inputs in this legacy format. This command-line utility, built with the MARISCO library, handles the conversion process.
usage: maris_nc_to_csv [-h] src dest
Converts NetCDF files into CSV files that follow the MARIS Standard format.
positional arguments:
src Input path and filename for NetCDF file
dest Output path and filename (without extension) for CSV file
options:
-h, --help show this help message and exit
For instance:
maris_nc_to_csv ~/pro/tmp/output/191-OSPAR-2024.nc ~/pro/tmp/output/191-OSPAR-2024
Tip
When specifying the destination path (e.g.,
~/pro/tmp/output/191-OSPAR-2024), the utility automatically appends
the MARIS sample type to the filename. For example:
191-OSPAR-2024_BIOTA.csvfor biological samples
While this specific example produces only a BIOTA file, the utility can generate multiple files (one per sample type) depending on the content of the source dataset. This reflects the NetCDF4 file structure, where each MARIS sample type is stored as a separate group within the file.
Documentation is organised into two groups:
Practical walkthroughs for common tasks:
- Writing a new handler: step-by-step guide to adding a new data provider to the marisco pipeline
- Nomenclature reconciliation: repeatable procedure for mapping provider names (nuclides, species, units, …) to MARIS standard identifiers
Detailed specifications and reference material:
- MARIS Data Guide: overview of sample types, measurement fields, nomenclature, curation pipeline, and available datasets; aimed at data providers and data users
- Field Definitions: complete field-by-field reference with MARISCO column names, NetCDF variable names, CSV variable names, types, and lookup tables
- Data Curation Rules: rules applied during data curation across all handlers
- Enum Rules: enumeration value handling rules
- Sample ID Coverage: coverage analysis of sample identifiers
- Sample Uniqueness: sample uniqueness constraints and validation
The MARIS NetCDF template is generated from
nbs/api/files/cdl/maris.cdl Common Data Language (CDL) file as defined
by Unidata. During development, to
regenerate the MARIS NetCDF template nbs/files/nc/maris-template.nc:
- install the NetCDF-C utilities
- once in
Mariscohome directory, run:
ncgen -4 -o nbs/files/nc/maris-template.nc nbs/files/cdl/maris.cdlIn-depth guidance for contributors is captured in the project’s CRAFT
file (at the repository root) and optional CRAFTs under the CRAFTs/
folder. These are auto-loaded by SolveitAI when working with this
codebase and cover:
- Project overview, architecture and setup (root CRAFT)
- Coding style and abbreviations (
CRAFTs/coding-style-abbr.ipynb): naming conventions, abbreviations, and fastai/fastcore idioms used in this codebase - Handler documentation guide (
CRAFTs/handler-doc-style.ipynb): documentation style and template for handlers - Software design principles (
CRAFTs/sicp-design-memento.ipynb): high-level architecture, abstraction layers, and system design
Development of this package was supported by the Solveit platform, an interactive development environment for dialog-driven software engineering.