Modular metadata schema components for documenting geochemical analytical Methods and Datasets. Built using the OGC Building Blocks pattern.
The scheme involves three components:
-
A Technique-Aligned protocol (TAPP) that defines a analytical procedure, including kinds of samples used, target analytes, instruments used, sample preparation, analysis workflow and data reduction. In the TAPP definition, some of these might be specified as fixed, some might have default values, and some are expected to be specified a the individual session level. The fixed properties are the necessary properties that define the TAPP. There are also properties that apply as the analytical session (or 'analysis event') level, and properties that are specific to the description of individual analytes. The authoritative protocol definition is in an Excel workbook. For discussion purposes, the label 'property' is used for properties in the TAPP that are fixed, and 'parameter' for properties that may be adjusted at the session level. Parameters may have default values specified in the TAPP definition.
-
A building block JSON schema specific to the protocol. This protocol definition object is registered in a protocol registry and accessible via its URI. The TAPP definition is referenced as a measurementTechnique in dataset metadata.
-
A technique-specific 'detail' building block JSON schema that defines the parameters that may be assigned values at the individual dataset level. There is one detail block per technique under
_sources/analysisSpecificDetails/(e.g.detailEMPA), not a single 'details' file. The content of this schema is included in the schema for dataset instances to create a metadata schema for Datasets conforming to the profile. Session-level and per-analyte parameters are defined once in a registered parameter registry (parameterValues) and referenced from the detail blocks by URI, so a parameter can be reused across detail definitions; the references are resolved inline into the published resolved schema.
_sources/techniqueProtocols/
analyteColumns/ ← registered BB: schema:PropertyValueSpecification $defs per analyte column
parameterTemplates/ ← registered BB: PropertyValueSpecification $defs (readOnly:true params)
parameterValues/ ← registered BB: schema:PropertyValue $defs (readOnly:false params)
vocab/ ← catalog: schema:DefinedTermSet files (referenced by @id, not $ref)
tappDefinition/ ← base TAPP definition (JSON-LD class ada:TAPPDefinition)
empaTAPP/ ← concrete TAPP profile (EMPA)
laicpmsTAPP/ ← concrete TAPP profile (LA-ICP-MS)
<future>TAPP/ ← additional TAPPs `$ref` the catalogs above
analyteColumns, parameterTemplates, and parameterValues are each a registered type-library building block (bblock.json with isTypeLibrary: true): every entry lives as a named $def in the catalog's schema.yaml, and TAPP / detail blocks reference them by URI fragment ($ref: …/<catalog>/schema.yaml#/$defs/<name>). Because they are registered, the OGC bblocks annotate step resolves those refs locally via the register and inlines them into resolvedSchema.json. This matters: a loose helper file (a plain <name>.json not inside a registered BB) is instead fetched from the published gh-pages URL, which 404s on moved or unpublished paths (process-bblocks.yml sets skip-pages: true, so gh-pages never auto-updates) — that fragility is why the catalogs were promoted to registered BBs. vocab/ is the exception: it stays a plain catalog of schema:DefinedTermSet files because it is referenced only by JSON-LD @id (schema:inDefinedTermSet), never by $ref, so the annotate step never fetches it.
The catalog dirs are shared dictionary resources — multiple TAPP profiles $ref the same files when their definitions match. The tooling's share_or_write_catalog helper lets a TAPP regen overwrite its own entries (matched by $id ownership) but errors out on a collision with an entry originated by a different TAPP, so a new TAPP either reuses identical catalog entries or surfaces a renaming requirement.
tappDefinition— base TAPP. Defines theWorkflowHowTo/WorkflowStep/MethodParameter/AnalyteColumn/AnalyteIdentifierColumn$defsthat concrete TAPP profiles extend.empaTAPP— Electron Microprobe Analysis. ExtendstappDefinitionviaallOfwith EPMA top-level properties +ada:methodParameters/ada:analyteTemplate.ada:analyteColumnsconstraints referencing the shared catalogs. Generated fromdocs/TAPP_EPMA_filled.xlsx(the canonical TAPP template). 11 examples ship with the BB (10 publication-derived instances + a comprehensive synthetic example).laicpmsTAPP— LA-ICP-MS. Built fromdocs/TAPP_LAICPMS_filled.xlsx(reshaped from theLA-ICPMS_TAPP_v8.xlsxsource workbook).
The 16 technique-specific detail blocks live under _sources/analysisSpecificDetails/ (detailEMPA, detailXRD, detailARGT, …). Each pairs with a TAPP definition and carries the per-instance values. (The old details umbrella BB — an anyOf over the detail blocks — was unused and has been removed.)
detailEMPA— paired withempaTAPP. CarriesreadOnly:falseparameter values asschema:additionalProperty[]PropertyValue entries. Theschema:additionalPropertyconstraint is inline in detailEMPA'sallOf(no separateparametersConstraint.yaml), withanyOfbranches$refing theparameterValuesregistry$defs. References the empaTAPP definition viaschema:measurementTechniqueanyOf(by@idref or inline). 11 paired examples (exampledetailEMPA-P1.json…-P10.json+-all.json).
The split was made on 2026-04-28: parameters in the TAPP spreadsheet route to empaTAPP/methodParameters[] (readOnly:true) or detailEMPA/schema:additionalProperty[] (readOnly:false). Method-level constants (the ada:xxxDefault top-level properties) stay on the TAPP.
profiles/geochemProfiles/ (alongside profiles/adaProfiles/) holds technique profiles that compose a TAPP definition + detail block on top of adaProduct:
empaProfile— extendsadaProductwithschema:measurementTechniqueanyOfpointing at empaTAPP and aschema:distribution.schema:hasPartbranch that letsdetailEMPAappear.LA-ICPMS— extendsadaProductwithschema:measurementTechniquewired to laicpmsTAPP.
Property building blocks that define ADA-specific metadata elements: file types, instrument details, technique-specific data structures, spatial registration, and more. (The technique-specific detail<XXX> blocks that previously lived here now reside under analysisSpecificDetails/.)
Key building blocks that extend CDIF core BBs:
- instrument — extends core CDIF instrument (
schema:Productwithnxs:BaseClass/NXinstrumentinadditionalType) - laboratory — extends core CDIF spatialExtent (
schema:Placewithnxs:BaseClass/NXsourceinadditionalType)
Metadata profiles that compose property building blocks with CDIF base schemas:
- adaProduct — base ADA product profile, composes via
allOf:cdifCore— core metadata propertiescdifDataDescription— variableMeasured with DDI-CDI extensions,@idrequirementcdifArchiveDistribution— archive distribution withhasPartcomponent filescdifProvenance—prov:wasGeneratedByprovenance activities- ADA-specific: technique types, instrument/lab/sample overlays,
ada:componentType
- 35 technique profiles — technique-specific constraints on
ada:componentTypevalues (e.g., adaSEM, adaXRD, adaICPMS, adaTEM)
Each archive hasPart item carries an ada:componentType (a single string like ada:EMPAImageMap) that classifies the file. The architecture enforces a two-level constraint:
-
File type ↔ componentType mapping — each file-type building block (
image,imageMap,tabularData,collection,dataCube,document,supDocImage,otherFile) declares a sealedenumof valid componentType values. The enum is derived from the Components worksheet ofamds-ldeo/metadata/ADA-AnalyticalMethodsAndAttributes.xlsx(the canonical mapping). E.g.ada:EMPAImageMapis valid only on parts whose@typeincludesada:imageMap. -
Profile-level constraint — a technique profile's
schema:hasPart.itemsuses a schema-levelanyOfwith three kinds of branch: (a)$reftoadaProduct/$defs/universalComponentTypeBranch(factored once, used everywhere) for universal componentTypes; (b) inline string-enum for technique-specific componentTypes that have no detail block; (c)$refto a technique-specific detail schema (e.g.detailEMPA) which pinsada:componentTypeto its technique consts and contributes detail-specific sibling properties (e.g.ada:spectrometersUsed,ada:signalUsed) flat on the hasPart item — not nested inside componentType.
After editing the Components worksheet:
python tools/apply_componentType_enums.py --refresh \
--xlsx ../../amds-ldeo/metadata/ADA-AnalyticalMethodsAndAttributes.xlsx
python tools/regenerate_schema_json.py
python tools/resolve_schema.py --all
python tools/validate_examples.py
The cached mapping at tools/componentType_enum_cache.json is committed so the apply step works on a fresh clone without spreadsheet access.
This repository imports shared schema.org and CDIF property building blocks from metadataBuildingBlocks via the OGC Building Blocks import mechanism. All external references use absolute URLs (https://cross-domain-interoperability-framework.github.io/metadataBuildingBlocks/_sources/...).
Browse the building blocks at: https://usgin.github.io/geochemBuildingBlocks/
The end-to-end pipeline for adding a new technique profile from a filled-in TAPP template spreadsheet. See docs/TAPP_TEMPLATE_GUIDE.md for what to put in the spreadsheet.
python tools/build_TAPP_from_spreadsheet.py [TAPP_NAME] [XLSX_PATH] [--pub Pn]… # 1. TAPP BB + catalogs
python tools/build_detail_BB.py [TAPP_NAME] [XLSX_PATH] [--pub Pn]… # 2. detail BB + parameterValues
python tools/build_profile_BB.py [TAPP_NAME] # 3. profile BB scaffold
python tools/build_dataset_template.py <tapp-instance.json> # 4. xlsx data-entry template
[<out.xlsx>]
All four scripts default to empaTAPP / docs/TAPP_EPMA_filled.xlsx for back-compat. The shared library at tools/_tapp_lib.py does the heavy lifting (parser, catalog emit helpers, scaffolders); the four drivers are thin wrappers. The library is parameterized for multiple TAPPs via a TAPP_PROFILES / CFG mechanism, so empaTAPP and laicpmsTAPP share the same generator (it was previously EMPA-hardcoded).
--pub <code> (repeatable) on scripts 1 and 2 limits which publication-derived examples get regenerated — useful when migrating pub columns one at a time.
python tools/interpret_pub_analytes.py # preview only (review files)
python tools/interpret_pub_analytes.py --apply # also rewrite source xlsx
Reads publication columns whose analyte axis isn't explicitly populated and infers it from rows 48 / 59 / 64 (Halogen Correction / Primary Calibration Standard / Typical Detection Limit). Default-mode outputs:
docs/TAPP_EPMA_filled-interp.xlsx— side workbook with each<pub>-interpcolumn inserted right after its source pub for side-by-side review.build/interp-review/example<empaTAPP|detailEMPA>-<pub>-interp.json— paired review JSON instances built from the inferred data.
With --apply, additionally rewrites rows 32 / 40 / 59 / 64 of each inferred pub column in docs/TAPP_EPMA_filled.xlsx to the pipe-delim convention. After migration, the regular pipeline (build_TAPP_from_spreadsheet.py etc.) reproduces the same rich examples directly from the source — no interp loop needed.
Detection-limit values keep their full text per element (e.g. "SiO2: 0.02 wt%", "<0.03 wt% for TiO2") so context isn't lost in the migration.
tools/generate_profiles.py— generates technique-specific profile building blocks from configuration datatools/resolve_schema.py— resolve all$refinto a structuredresolvedSchema.json($defs+ internal$ref, recursion-safe and ~88–90% smaller than the old fully-inlined form, which is no longer emitted;--structuredis now a no-op)tools/regenerate_schema_json.py— generate *Schema.json from schema.yaml sources (YAML→JSON + ref rewrite)
tools/audit_building_blocks.py— comprehensive audit: file completeness, schema consistency, resolvedSchema freshness (via the structured resolver), SHACL coverage.isTypeLibraryBBs (reusable$defslibraries with no instantiable root class, e.g.stringArray,parameterValues) are exempt from the standalone-example and SHACL-NodeShape requirements.tools/audit_shacl_coverage.py— check SHACL rules cover all schema.yaml properties; reports missing/extra shapestools/validate_examples.py— validate example JSON files against resolved schemastools/validate_instance.py— profile-aware validation of ADA metadata instancestools/compare_schemas.py— detect drift between schema.yaml and *Schema.json
tools/download_ecl_methods.py— download analytical method Excel workbooks from the EarthChem Library. Reads methods list from Google Sheets, downloads available workbooks. Supports--dry-run,--output-dir.
tools/augment_register.py— add resolvedSchema URLs to build/register.json for the viewertools/generate_custom_report.py— generate HTML validation report with granular SHACL severity breakdowntools/cors_server.py— local HTTP server with CORS headers for testing the viewer
resolve_schema.py and regenerate_schema_json.py are synced from the canonical copies in metadataBuildingBlocks/tools/. Do not edit locally — update the canonical copy and run python tools/sync_resolve_schema.py --apply from the metadataBuildingBlocks repo. The audit, validation, and report tools were also sourced from that repository.
The tappDefinition building block at techniqueProtocols/tappDefinition/ defines a registry-backed Technique-Aligned Protocol Profile (TAPP) definition schema (v3). Was previously methodDefinition. A TAPP definition is modeled as a cdi:Activity + schema:Action + ada:TAPPDefinition + bios:LabProtocol.
- TAPP identity (top level) — name, DOI, version,
schema:measurementTechnique,schema:object(target materials), instrument,schema:location(laboratory/facility), software (bios:computationalTool), reagents (bios:reagent), agent - Standard workflow (
schema:actionProcess) — aschema:HowTocontaining orderedcdi:Activity+schema:Actionsteps: sample preparation, calibration, data acquisition, data processing, quality control - Parameters — typed as
schema:PropertyValueSpecificationwithschema:readonlyValue,schema:valueRequired,schema:defaultValue,schema:minValue/maxValue,schema:inDefinedTermSet(SKOS vocabulary link), andada:fieldScope(method/session/element) - Analyte template (
ada:analyteTemplate) — per-element column definitions (alsoPropertyValueSpecification) and default analyte rows - Quality metrics (
dqv:hasQualityMeasurement) — at method level and on workflow steps
Example files use the sibling example<bbName>-<variant>.json pattern (validated by tools/validate_examples.py):
exampletappDefinition-concord-glass-v1-0-6.json— EPMA WDS tephra glass (Concord University)exampletappDefinition-nmnh-spinel-oxybar-v1.json— EPMA WDS spinel oxybarometry (Smithsonian NMNH)exampletappDefinition-uoc-laicpms-glass-v1.json— LA-ICP-MS volcanic glass trace elements (University of Cologne)
For the empaTAPP profile: 10 publication-derived examples (exampleempaTAPP-P1.json … exampleempaTAPP-P10.json) plus exampleempaTAPP-all.json, a hand-authored comprehensive synthetic instance that exercises every property allowed by the resolved schema. Use the latter as a structural reference when authoring new TAPP profiles or onboarding new authors.
- Bioschemas —
bios:LabProtocol,bios:LabProcess,bios:computationalTool,bios:reagent - DDI-CDI —
cdi:Activityfor workflow steps - W3C DQV —
dqv:hasQualityMeasurementfor quality metrics - schema.org —
PropertyValueSpecificationfor parameter definitions,Action/HowTo/HowToStepfor workflow