Skip to content

feat: N_NO_COVERAGE field and coverage-evidence filters#31

Merged
dlopez-bioinfo merged 7 commits into
masterfrom
feat/no-coverage-evidence
May 7, 2026
Merged

feat: N_NO_COVERAGE field and coverage-evidence filters#31
dlopez-bioinfo merged 7 commits into
masterfrom
feat/no-coverage-evidence

Conversation

@dlopez-bioinfo
Copy link
Copy Markdown
Collaborator

@dlopez-bioinfo dlopez-bioinfo commented May 6, 2026

Summary

  • Add N_NO_COVERAGE field tracking samples whose technology does not cover a variant's region, so AN/AF can be interpreted in light of capture differences across WGS/WES kits.
  • Introduce phase 1/2 coverage-evidence filters across query, annotate, and dump paths to drop or downweight variants with insufficient covered samples.
  • Document the new field, filters, and CLI/API surface (new docs/advanced/coverage-evidence.md plus updates to create-db, update-db, query, CLI, and Python API guides).

Test plan

  • pytest --tb=short -q (includes new tests/test_no_coverage.py, 284 lines)
  • pytest tests/test_no_coverage.py -v
  • Spot-check afquery query and afquery annotate on a sample DB to confirm N_NO_COVERAGE appears and filters behave as documented
  • mkdocs serve renders the new coverage-evidence page

Distinguish between true hom-ref and uncertain coverage for non-carrier WES
samples. Two opt-in mechanisms (combinable, fully backward-compatible) move
samples from N_HOM_REF to a new N_NO_COVERAGE field while keeping them in
eligible/AN.

Phase 1 (query-time, no schema change):
  --min-pass K          per-WES-tech gate on PASS carriers (het|hom)
  --min-observed K      per-WES-tech gate on any-VCF entries (het|hom|fail)

Phase 2 (build-time, schema_version 3.0):
  --min-dp / --min-gq / --min-qual   carrier quality thresholds at create-db
  --min-covered K                    minimum quality carriers per WES tech
  --min-quality-evidence K           query-time companion (errors on legacy DBs)

Stores two new Parquet columns (filtered_bitmap, quality_pass_bitmap) and
the chosen thresholds under coverage_filter in manifest.json. update-db
recomputes filtered_bitmap on add-samples; compact preserves both columns.
ingest reads FORMAT/DP, FORMAT/GQ, and QUAL (None when absent).

New invariant: N_HET + N_HOM_ALT + N_HOM_REF + N_FAIL + N_NO_COVERAGE = n_eligible
WGS samples and carriers (het/hom/fail) are never reclassified.

Affected commands: query, variant-info (genotype='no_coverage'), annotate
(AFQUERY_N_NO_COVERAGE INFO), dump (N_NO_COVERAGE column).

resources/normalize_vcf.sh now preserves FORMAT/DP and FORMAT/GQ.
- docs/advanced/coverage-evidence.md (new): conceptual guide covering both
  phases, threshold-selection guidance, and the new genotype invariant.
- docs/guides/query.md: new section "Coverage-Evidence Filters" plus
  N_NO_COVERAGE in text/tsv/json examples.
- docs/guides/create-database.md: new section explaining --min-dp/--min-gq/
  --min-qual/--min-covered and the schema_version 3.0 bump.
- docs/guides/update-database.md: explain filtered_bitmap recomputation on
  add-samples for Phase 2 databases.
- docs/getting-started/preprocessing.md: note FORMAT/DP and FORMAT/GQ
  preservation for Phase 2 quality thresholds.
- docs/reference/cli.md: add Phase 1/2 flags to query, variant-info,
  annotate, dump, and create-db.
- docs/reference/python-api.md: add N_NO_COVERAGE to QueryResult,
  'no_coverage' genotype to SampleCarrier, and the three new SampleFilter
  fields.
- mkdocs.yml: link the new advanced page.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 6, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 75.88235% with 82 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/afquery/preprocess/update.py 56.41% 28 Missing and 6 partials ⚠️
src/afquery/preprocess/build.py 59.09% 13 Missing and 5 partials ⚠️
src/afquery/preprocess/ingest.py 53.12% 15 Missing ⚠️
src/afquery/preprocess/__init__.py 44.44% 4 Missing and 1 partial ⚠️
src/afquery/preprocess/compact.py 71.42% 2 Missing and 2 partials ⚠️
src/afquery/query.py 95.65% 2 Missing and 2 partials ⚠️
src/afquery/annotate.py 91.30% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

- Replace lexicographic schema_version >= "3.0" with tuple-based
  _parse_schema_version (lex would break at "10.0" / "3.10").
- Rename _select_cols -> _bitmap_cols returning list[str]; drop the
  brittle .split(",") aliasing in _query_batch_inner.
- Unify duplicated het|hom|fail bitmap union in _compute_no_coverage_bm.
- Widen SampleCarrier.filter_pass to bool | None; emit None for
  no_coverage rows (rendered as null / empty / '-' across json/tsv/text)
  so PASS/FAIL no longer misrepresents samples that had no call.
- Drop duplicate `from pyroaring import BitMap` in compact.py.
- Pass PARQUET_SCHEMA to pa.table in _make_phase2_db so test fixtures
  match production large_binary types.
- Add test_no_coverage_filter_pass_is_none assertion.
- Update Python/CLI reference docs for the filter_pass type change.
Comment thread docs/advanced/coverage-evidence.md Outdated
exactly right: every covered sample without a variant call is hom-ref. For WES
samples it is a *best-effort* assumption: the BED capture region tells us a
position *could* be sequenced, but not that it *was* sequenced at adequate
depth in this particular sample. Standard variant-only VCFs do not contain
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start paragraph with an introductory sentence, like "Standard variant-only VCFs do not contain hom-ref calls, so AFQuery cannot distinguish "true hom-ref" from "no coverage"
for non-carrier WES samples."

Comment thread docs/advanced/coverage-evidence.md Outdated
# Coverage Evidence

`N_HOM_REF` is computed as a residual:
`len(eligible) − N_HET − N_HOM_ALT − N_FAIL`. For WGS samples that residual is
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace WGS for full covered genomes

Comment thread docs/advanced/coverage-evidence.md Outdated
for non-carrier WES samples.

Two opt-in mechanisms let users tighten that assumption. Together they expose
a new field, **`N_NO_COVERAGE`**, that holds samples whose hom-ref status is
Copy link
Copy Markdown
Collaborator Author

@dlopez-bioinfo dlopez-bioinfo May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove mention to "new" functionalities: the tool is still under development, so no need to mention old features.

Comment thread docs/advanced/coverage-evidence.md Outdated

---

## Phase 1 — Query-time, evidence-counting
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove mention to phase 1 and phase 2. This documentation must be focused on how to use the tool and when to use different parameters

Comment thread docs/advanced/coverage-evidence.md Outdated

If the tech falls below either threshold, *all of its non-carrier samples* at
that position move from `N_HOM_REF` to `N_NO_COVERAGE`. When both flags are
set, both must hold (AND). Default `0` ⇒ no filtering, identical to legacy
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove mention to legacy behavioiur

Rewrites docs/advanced/coverage-evidence.md as user-facing documentation:
drops the Phase 1 / Phase 2 / schema_version / "best-effort" / "legacy"
framing, adopts "fully-covered" / "partially-covered" terminology, adds a
worked before/after query example, and structures the page around when to
reach for each flag (--min-pass, --min-observed, --min-dp, --min-gq,
--min-qual, --min-covered, --min-quality-evidence).

Cleans up the same Phase 1/2 and schema_version 3.0 wording in
create-database.md, update-database.md, query.md, cli.md, python-api.md,
and preprocessing.md.

Adds the previously-undocumented N_NO_COVERAGE / no_coverage /
AFQUERY_N_NO_COVERAGE rows to the field tables in glossary.md,
understanding-output.md, annotate-vcf.md, dump-export.md, and
variant-info.md, where the field already showed up in output examples.
Closes #19, #17.

The previous CI used `mkdocs gh-deploy --force`, which ignored the `mike`
provider declared in `mkdocs.yml` and overwrote `gh-pages` flat on every
push, so the version selector had nothing to render. The print-site
plugin had `add_to_navigation: false`, so the generated `/print_page/`
was unreachable from the UI.

- `docs.yml`: replace `mkdocs gh-deploy` with `mike deploy --push
  --update-aliases dev`; configure git identity; fetch `gh-pages`. Add
  a `bootstrap` workflow_dispatch input that runs `mike delete --all`
  first, for the one-time migration from the prior flat deploy.
- `release.yml`: new `docs` job for non-`rc` tags. Runs `mike deploy
  --push --update-aliases <version> latest` and `mike set-default
  --push latest`, so the site root redirects to the most recent tag.
- `mkdocs.yml`: add `site_url` (required by mike for cross-version
  links); flip `print-site.add_to_navigation` to `true`; register
  `autorefs` explicitly before `mkdocstrings` so it does not get
  auto-inserted after `print-site` (silences the false-positive
  "print-site should be last" warning under `--strict`).
- `CONTRIBUTING.md`: document the docs deployment and release workflow,
  local preview, and the one-time bootstrap.
@dlopez-bioinfo dlopez-bioinfo merged commit 5065070 into master May 7, 2026
4 checks passed
@dlopez-bioinfo dlopez-bioinfo deleted the feat/no-coverage-evidence branch May 12, 2026 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants