Summary
popv.algorithms._onclass.py:71 uses the literal string "self.labels_key" as a pandas column key instead of the attribute self.labels_key. This silently creates a column named "self.labels_key" on the query and masks the real intent of the line. The user-facing crash (a TypeError: Cannot setitem on a Categorical with a new category (unknown) from line 191's followup write) is already fixed on main post-0.6.1 by casting self.result_key and self.seen_result_key to str before the relabel write — but the literal-string typo at line 71 remains as a code-quality fix.
Reproduction
Run OnClass voter in prediction_mode="inference" on a query with unknown_celltype_label-tagged cells:
from popv.hub import HubModel
hub = HubModel.from_pretrained("popv/tabula_sapiens_All_Cells")
hub.annotate_data(
query, save_folder="popv_out",
prediction_mode="inference",
methods_list=["OnClass", "KNN_SCVI", "Support_Vector"],
)
# On 0.6.0:
# TypeError: Cannot setitem on a Categorical with a new category (unknown), set the categories first
# On main:
# Runs to completion, but a phantom column "self.labels_key" appears on the query.
Root cause
# popv/algorithms/_onclass.py:71 — both 0.6.0 and main
adata.obs.loc[adata.obs["_dataset"] == "query", "self.labels_key"] = adata.uns["unknown_celltype_label"]
The second arg is the literal string "self.labels_key" (in quotes). pandas treats it as a new column name and creates it. The intended attribute is self.labels_key (no quotes). The phantom column is harmless functionally but is a clear typo and obscures the line's intent for reviewers.
Suggested patch
- adata.obs.loc[adata.obs["_dataset"] == "query", "self.labels_key"] = adata.uns["unknown_celltype_label"]
+ adata.obs.loc[adata.obs["_dataset"] == "query", self.labels_key] = adata.uns["unknown_celltype_label"]
Related issues
- Closed issue #28 describes the same broader root pattern (Categorical setitem with new category) — the user-facing crash on 0.6.0 is now fixed on
main, but the literal-string typo at line 71 is independent and remains.
Affected releases
Verified on 0.6.0 and main (commit at time of writing).
Context
We carry an in-template monkey-patch for the 0.6.0-only user-facing crash in our downstream pipeline (Cytoreason nf-core-scdownstream) at modules/local/popv_ensemble/templates/popv_patches.py. The literal-string fix is a small code-quality cleanup; happy to send a PR if useful.
Summary
popv.algorithms._onclass.py:71uses the literal string"self.labels_key"as a pandas column key instead of the attributeself.labels_key. This silently creates a column named"self.labels_key"on the query and masks the real intent of the line. The user-facing crash (aTypeError: Cannot setitem on a Categorical with a new category (unknown)from line 191's followup write) is already fixed onmainpost-0.6.1 by castingself.result_keyandself.seen_result_keytostrbefore the relabel write — but the literal-string typo at line 71 remains as a code-quality fix.Reproduction
Run
OnClassvoter inprediction_mode="inference"on a query withunknown_celltype_label-tagged cells:Root cause
The second arg is the literal string
"self.labels_key"(in quotes). pandas treats it as a new column name and creates it. The intended attribute isself.labels_key(no quotes). The phantom column is harmless functionally but is a clear typo and obscures the line's intent for reviewers.Suggested patch
Related issues
main, but the literal-string typo at line 71 is independent and remains.Affected releases
Verified on
0.6.0andmain(commit at time of writing).Context
We carry an in-template monkey-patch for the 0.6.0-only user-facing crash in our downstream pipeline (Cytoreason
nf-core-scdownstream) atmodules/local/popv_ensemble/templates/popv_patches.py. The literal-string fix is a small code-quality cleanup; happy to send a PR if useful.