Skip to content

Add Genome Quality Metrics#20

Merged
JeanMainguy merged 12 commits into
mainfrom
add_genome_columns
Jun 3, 2026
Merged

Add Genome Quality Metrics#20
JeanMainguy merged 12 commits into
mainfrom
add_genome_columns

Conversation

@JeanMainguy

@JeanMainguy JeanMainguy commented Apr 9, 2026

Copy link
Copy Markdown
Member

Adds support for loading genome quality metrics (CheckM/CheckM2 scores, assembly statistics) into typed Genome table columns, with data integrity protection to prevent accidental overwrites.

Key Changes

Database Schema

  • Added 14 optional columns to GenomeBase: CheckM/CheckM2 metrics, assembly stats, genome metadata

New CLI Command

# Add quality metrics (errors if changing existing values)
pangbank_db add-quality-metrics genome_quality_metrics.tsv

# Force overwrite (with warnings)
pangbank_db add-quality-metrics genome_quality_metrics.tsv --force

Can be added also with the command add-collection-release.
The file should be added in the input JSON as follow:

"genome_quality_metrics": {
  "file": "/path/to/genome_quality_metrics.tsv"
}

Data Integrity

  • Default: Raises error when attempting to change existing values
  • --force flag: Allows overwrites with warning logs
  • Idempotent: Re-applying same values succeeds
  • Initial imports via add-collection-release allow setting values

Implementation

  • Dynamic Pydantic model introspection (no hardcoded mappings)
  • Automatic type conversion (int, float, str)
  • Required field protection
  • New columns work automatically

@JeanMainguy JeanMainguy changed the title Add Genome Quality Metrics Loading Feature Add Genome Quality Metrics Apr 10, 2026
@JeanMainguy JeanMainguy requested a review from Copilot May 27, 2026 10:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for ingesting genome quality metrics (typed columns on Genome) and per-release genome statuses (new GenomeStatus table), including CLI commands and end-to-end functional tests to validate the import pipeline.

Changes:

  • Extended Genome schema with optional quality/assembly/statistics columns and added Alembic migration.
  • Introduced GenomeStatus model + migration, API exposure on genome endpoints, and a CLI command to add statuses to an existing release.
  • Added functional and unit tests covering quality-metric import, status import, and API response shape.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
pangbank_api/models.py Adds GenomeStatus* models/relationships and new optional quality-metric columns on GenomeBase.
alembic/versions/1b2d64350ce4_add_genome_status_table.py Creates genomestatus table with uniqueness constraint + indexes.
alembic/versions/765d642f1d8c_add_new_optional_columns_in_genome_table.py Adds new nullable quality/assembly columns to genome table.
pangbank_api/manage_db/input_models.py Updates JSON input model to accept genome_quality_metrics and genome_statuses.
pangbank_api/manage_db/utils.py Validates optional quality-metrics file and status files exist for add-collection-release.
pangbank_api/manage_db/genome_metadata.py Adds dynamic column filtering/type conversion and updater for typed Genome quality metrics.
pangbank_api/manage_db/genome_status.py New loader for genome status files + insert logic into DB.
pangbank_api/manage_db/pangbank_db.py Wires quality-metrics + statuses into add-collection-release; adds add-genome-statuses and add-quality-metrics CLI commands.
pangbank_api/manage_db/collections.py Removes now-unused genome_metadata_sources linkage from release creation.
pangbank_api/crud/genomes.py Extends genome public representation to include statuses.
pangbank_api/routers/genomes.py Updates response models to include statuses (via GenomePublic).
tests/routers/test_genomes.py Adds API tests asserting statuses is included and empty when absent.
tests/manage_db/test_genome_status.py New tests for parsing status files and adding statuses to releases.
tests/manage_db/test_genome_metadata.py Adds tests for type conversion and overwrite protection/idempotency behavior.
tests/manage_db/test_collection.py Updates tests for changed create_collection_release signature.
tests/functional/test_functional_pangbank_db.py New end-to-end functional test covering import (taxonomy, genomes, pangenomes, statuses, quality metrics).
tests/functional/run_functional_test.py Adds a standalone runner to generate data and run the functional workflow.
tests/functional/README.md Documents how to run and interpret functional tests and generated data structure.
tests/functional/__init__.py Marks functional tests package.
pyproject.toml Registers functional pytest marker.
README.md Documents new JSON inputs and new CLI commands (quality metrics + genome statuses).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pangbank_api/manage_db/genome_metadata.py Outdated
Comment thread pangbank_api/crud/genomes.py
Comment thread pangbank_api/crud/genomes.py Outdated
Comment thread pangbank_api/manage_db/genome_status.py Outdated
Comment thread pangbank_api/manage_db/genome_status.py Outdated
Comment thread README.md
Comment thread README.md
Comment thread tests/functional/run_functional_test.py
Comment thread tests/functional/README.md
Comment thread tests/manage_db/test_genome_metadata.py Outdated
@JeanMainguy JeanMainguy force-pushed the add_genome_columns branch from a8e71c6 to 15b2f6e Compare June 2, 2026 14:23
@JeanMainguy JeanMainguy merged commit 239485a into main Jun 3, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants