Skip to content

ontology_vote_onclass crashes with NetworkXError when voter predicts unknown_celltype_label sentinel #112

Description

@joschkahey

Summary

popv.annotation.ontology_vote_onclass raises NetworkXError: The node unassigned is not in the digraph whenever a voter predicts the unknown_celltype_label sentinel (default "unassigned" for the Tabula Sapiens pretrained references). The same crash fires from the sister function popv.annotation.ontology_parent_onclass.

Reproduction

Run popV inference mode on a query where the per-voter QC rejects a non-trivial fraction of cells (so voters emit the unknown_celltype_label sentinel for those cells):

from popv.hub import HubModel
hub = HubModel.from_pretrained("popv/tabula_sapiens_All_Cells")
hub.annotate_data(
    query,
    save_folder="popv_out",
    prediction_mode="inference",
    # default unknown_celltype_label="unassigned" comes from the TS metadata.json
)

When ≥ 1 voter emits "unassigned" for ≥ 1 cell, ontology aggregation crashes:

File "popv/annotation.py", line 232, in ontology_vote_onclass
    root_to_node = nx.descendants(G, cell_type)
networkx.exception.NetworkXError: The node unassigned is not in the digraph.

Root cause

popV builds the ontology DAG G from cl_popv.json ct_edges via popv._utils.make_ontology_dag (_utils.py:232-257). G contains only real Cell Ontology terms — "unassigned" is not a CL node.

popv.annotation.ontology_vote_onclass:223-232 iterates over each cell's voter predictions and walks the DAG using nx.descendants(G, cell_type). The guard at line 228 is if not pd.isna(cell_type) — but pd.isna("unassigned") is False, so the sentinel slips through and nx.descendants raises.

The same pattern in popv.annotation.ontology_parent_onclass (sister function used when mode='hierarchical') has the identical bug.

Why the official tutorial does not hit this

The Tabula Sapiens tutorial uses prediction_mode="fast" (HubModel default) which skips per-voter inference. The bench query is also drawn from the same distribution as the reference, so per-voter QC almost never produces the sentinel. We reproduce it in production when prediction_mode="inference" is used on out-of-distribution query data (≈ 76 % per-voter abstain rate in our PBMC smoke).

Suggested patch

Skip unknown_celltype_label (and any non-DAG-node predictions) in the graph walk; fall back to popv_majority_vote_prediction for cells whose every non-sentinel vote was filtered:

sentinel = adata.uns.get("unknown_celltype_label", "unassigned")
...
for pred_key in prediction_keys:
    cell_type = adata.obs[pred_key][cell]
    if pd.isna(cell_type) or cell_type == sentinel or cell_type not in G:
        continue
    if cell_type in cell_type_root_to_node:
        root_to_node = cell_type_root_to_node[cell_type]
    else:
        root_to_node = nx.descendants(G, cell_type)
    ...

Apply the same guard in ontology_parent_onclass.

Affected releases

Verified on 0.6.0 and main (commit at time of writing). No diff between 0.6.0 and main on popv/annotation.py (verified via git diff 0.6.0..origin/main popv/annotation.py).

Context

We carry an in-template monkey-patch for this in our downstream pipeline (Cytoreason nf-core-scdownstream) at modules/local/popv_ensemble/templates/popv_patches.py while waiting for the upstream fix. Happy to provide additional reproduction artifacts (the query AnnData + the exact popV invocation) if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions