Add Genome Quality Metrics#20
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds support for ingesting genome quality metrics (typed columns on Genome) and per-release genome statuses (new GenomeStatus table), including CLI commands and end-to-end functional tests to validate the import pipeline.
Changes:
- Extended
Genomeschema with optional quality/assembly/statistics columns and added Alembic migration. - Introduced
GenomeStatusmodel + migration, API exposure on genome endpoints, and a CLI command to add statuses to an existing release. - Added functional and unit tests covering quality-metric import, status import, and API response shape.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
pangbank_api/models.py |
Adds GenomeStatus* models/relationships and new optional quality-metric columns on GenomeBase. |
alembic/versions/1b2d64350ce4_add_genome_status_table.py |
Creates genomestatus table with uniqueness constraint + indexes. |
alembic/versions/765d642f1d8c_add_new_optional_columns_in_genome_table.py |
Adds new nullable quality/assembly columns to genome table. |
pangbank_api/manage_db/input_models.py |
Updates JSON input model to accept genome_quality_metrics and genome_statuses. |
pangbank_api/manage_db/utils.py |
Validates optional quality-metrics file and status files exist for add-collection-release. |
pangbank_api/manage_db/genome_metadata.py |
Adds dynamic column filtering/type conversion and updater for typed Genome quality metrics. |
pangbank_api/manage_db/genome_status.py |
New loader for genome status files + insert logic into DB. |
pangbank_api/manage_db/pangbank_db.py |
Wires quality-metrics + statuses into add-collection-release; adds add-genome-statuses and add-quality-metrics CLI commands. |
pangbank_api/manage_db/collections.py |
Removes now-unused genome_metadata_sources linkage from release creation. |
pangbank_api/crud/genomes.py |
Extends genome public representation to include statuses. |
pangbank_api/routers/genomes.py |
Updates response models to include statuses (via GenomePublic). |
tests/routers/test_genomes.py |
Adds API tests asserting statuses is included and empty when absent. |
tests/manage_db/test_genome_status.py |
New tests for parsing status files and adding statuses to releases. |
tests/manage_db/test_genome_metadata.py |
Adds tests for type conversion and overwrite protection/idempotency behavior. |
tests/manage_db/test_collection.py |
Updates tests for changed create_collection_release signature. |
tests/functional/test_functional_pangbank_db.py |
New end-to-end functional test covering import (taxonomy, genomes, pangenomes, statuses, quality metrics). |
tests/functional/run_functional_test.py |
Adds a standalone runner to generate data and run the functional workflow. |
tests/functional/README.md |
Documents how to run and interpret functional tests and generated data structure. |
tests/functional/__init__.py |
Marks functional tests package. |
pyproject.toml |
Registers functional pytest marker. |
README.md |
Documents new JSON inputs and new CLI commands (quality metrics + genome statuses). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
a8e71c6 to
15b2f6e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds support for loading genome quality metrics (CheckM/CheckM2 scores, assembly statistics) into typed
Genometable columns, with data integrity protection to prevent accidental overwrites.Key Changes
Database Schema
GenomeBase: CheckM/CheckM2 metrics, assembly stats, genome metadataNew CLI Command
Can be added also with the command add-collection-release.
The file should be added in the input JSON as follow:
Data Integrity
--forceflag: Allows overwrites with warning logsadd-collection-releaseallow setting valuesImplementation