Summary
popv.annotation.ontology_vote_onclass raises NetworkXError: The node unassigned is not in the digraph whenever a voter predicts the unknown_celltype_label sentinel (default "unassigned" for the Tabula Sapiens pretrained references). The same crash fires from the sister function popv.annotation.ontology_parent_onclass.
Reproduction
Run popV inference mode on a query where the per-voter QC rejects a non-trivial fraction of cells (so voters emit the unknown_celltype_label sentinel for those cells):
from popv.hub import HubModel
hub = HubModel.from_pretrained("popv/tabula_sapiens_All_Cells")
hub.annotate_data(
query,
save_folder="popv_out",
prediction_mode="inference",
# default unknown_celltype_label="unassigned" comes from the TS metadata.json
)
When ≥ 1 voter emits "unassigned" for ≥ 1 cell, ontology aggregation crashes:
File "popv/annotation.py", line 232, in ontology_vote_onclass
root_to_node = nx.descendants(G, cell_type)
networkx.exception.NetworkXError: The node unassigned is not in the digraph.
Root cause
popV builds the ontology DAG G from cl_popv.json ct_edges via popv._utils.make_ontology_dag (_utils.py:232-257). G contains only real Cell Ontology terms — "unassigned" is not a CL node.
popv.annotation.ontology_vote_onclass:223-232 iterates over each cell's voter predictions and walks the DAG using nx.descendants(G, cell_type). The guard at line 228 is if not pd.isna(cell_type) — but pd.isna("unassigned") is False, so the sentinel slips through and nx.descendants raises.
The same pattern in popv.annotation.ontology_parent_onclass (sister function used when mode='hierarchical') has the identical bug.
Why the official tutorial does not hit this
The Tabula Sapiens tutorial uses prediction_mode="fast" (HubModel default) which skips per-voter inference. The bench query is also drawn from the same distribution as the reference, so per-voter QC almost never produces the sentinel. We reproduce it in production when prediction_mode="inference" is used on out-of-distribution query data (≈ 76 % per-voter abstain rate in our PBMC smoke).
Suggested patch
Skip unknown_celltype_label (and any non-DAG-node predictions) in the graph walk; fall back to popv_majority_vote_prediction for cells whose every non-sentinel vote was filtered:
sentinel = adata.uns.get("unknown_celltype_label", "unassigned")
...
for pred_key in prediction_keys:
cell_type = adata.obs[pred_key][cell]
if pd.isna(cell_type) or cell_type == sentinel or cell_type not in G:
continue
if cell_type in cell_type_root_to_node:
root_to_node = cell_type_root_to_node[cell_type]
else:
root_to_node = nx.descendants(G, cell_type)
...
Apply the same guard in ontology_parent_onclass.
Affected releases
Verified on 0.6.0 and main (commit at time of writing). No diff between 0.6.0 and main on popv/annotation.py (verified via git diff 0.6.0..origin/main popv/annotation.py).
Context
We carry an in-template monkey-patch for this in our downstream pipeline (Cytoreason nf-core-scdownstream) at modules/local/popv_ensemble/templates/popv_patches.py while waiting for the upstream fix. Happy to provide additional reproduction artifacts (the query AnnData + the exact popV invocation) if useful.
Summary
popv.annotation.ontology_vote_onclassraisesNetworkXError: The node unassigned is not in the digraphwhenever a voter predicts theunknown_celltype_labelsentinel (default"unassigned"for the Tabula Sapiens pretrained references). The same crash fires from the sister functionpopv.annotation.ontology_parent_onclass.Reproduction
Run popV
inferencemode on a query where the per-voter QC rejects a non-trivial fraction of cells (so voters emit theunknown_celltype_labelsentinel for those cells):When ≥ 1 voter emits
"unassigned"for ≥ 1 cell, ontology aggregation crashes:Root cause
popV builds the ontology DAG
Gfromcl_popv.jsonct_edgesviapopv._utils.make_ontology_dag(_utils.py:232-257).Gcontains only real Cell Ontology terms —"unassigned"is not a CL node.popv.annotation.ontology_vote_onclass:223-232iterates over each cell's voter predictions and walks the DAG usingnx.descendants(G, cell_type). The guard at line 228 isif not pd.isna(cell_type)— butpd.isna("unassigned") is False, so the sentinel slips through andnx.descendantsraises.The same pattern in
popv.annotation.ontology_parent_onclass(sister function used whenmode='hierarchical') has the identical bug.Why the official tutorial does not hit this
The Tabula Sapiens tutorial uses
prediction_mode="fast"(HubModel default) which skips per-voter inference. The bench query is also drawn from the same distribution as the reference, so per-voter QC almost never produces the sentinel. We reproduce it in production whenprediction_mode="inference"is used on out-of-distribution query data (≈ 76 % per-voter abstain rate in our PBMC smoke).Suggested patch
Skip
unknown_celltype_label(and any non-DAG-node predictions) in the graph walk; fall back topopv_majority_vote_predictionfor cells whose every non-sentinel vote was filtered:Apply the same guard in
ontology_parent_onclass.Affected releases
Verified on
0.6.0andmain(commit at time of writing). No diff between0.6.0andmainonpopv/annotation.py(verified viagit diff 0.6.0..origin/main popv/annotation.py).Context
We carry an in-template monkey-patch for this in our downstream pipeline (Cytoreason
nf-core-scdownstream) atmodules/local/popv_ensemble/templates/popv_patches.pywhile waiting for the upstream fix. Happy to provide additional reproduction artifacts (the query AnnData + the exact popV invocation) if useful.