Summary
popv.algorithms._scanvi.SCANVI rejects query data with ValueError: Category unknown not found in source registry. Cannot transfer setup without extend_categories = True whenever prediction_mode="inference" runs against a query where any cell's _labels_annotation carries the unknown_celltype_label sentinel (default "unassigned" for the Tabula Sapiens pretrained references).
Reproduction
from popv.hub import HubModel
hub = HubModel.from_pretrained("popv/tabula_sapiens_All_Cells")
hub.annotate_data(
query,
save_folder="popv_out",
prediction_mode="inference",
methods_list=["KNN_SCVI", "Support_Vector", "XGboost", "CELLTYPIST", "KNN_BBKNN", "KNN_HARMONY", "SCANVI_POPV"],
)
When the SCANVI_POPV voter runs, scvi-tools raises:
File "popv/algorithms/_scanvi.py", line 148, in compute_integration
self.model = scvi.model.SCANVI.load_query_data(
File ".../scvi/model/_scanvi.py", ...
ValueError: Category unknown not found in source registry. Cannot transfer setup without extend_categories = True
Root cause
popv.preprocessing.Process_Query._setup_dataset:306-366 writes unknown_celltype_label into every query cell's _labels_annotation column and adds the sentinel string to the Categorical's category list. The trained TS scANVI model's source registry contains only real CL labels — the sentinel is not in it.
_scanvi.py:148 calls scvi.model.SCANVI.load_query_data(...) without extend_categories=True in the non-retrain branch:
# popv/algorithms/_scanvi.py:148
self.model = scvi.model.SCANVI.load_query_data(
query,
os.path.join(adata.uns["_save_path_trained_models"], "scanvi"),
freeze_classifier=True,
)
scvi-tools' validation rejects the unseen category and aborts.
Why the official tutorial does not hit this
The TS HubModel tutorial defaults to prediction_mode="fast" which skips the SCANVI voter's load_query_data call. Out-of-distribution queries running prediction_mode="inference" with SCANVI_POPV in methods_list reliably hit this.
Suggested patch
Pass extend_categories=True in the non-retrain branch of _scanvi.py:148:
self.model = scvi.model.SCANVI.load_query_data(
query,
os.path.join(adata.uns["_save_path_trained_models"], "scanvi"),
freeze_classifier=True,
+ extend_categories=True,
)
This is the same posture scvi-tools recommends for cross-dataset query adaptation; downstream prediction continues to map the extended category onto the trained label space.
Affected releases
Verified on 0.6.0 and main. No code diff between 0.6.0 and main on popv/algorithms/_scanvi.py beyond a docstring rename (verified via git diff 0.6.0..origin/main popv/algorithms/_scanvi.py).
Context
We carry an in-template monkey-patch for this in our downstream pipeline (Cytoreason nf-core-scdownstream) at modules/local/popv_ensemble/templates/popv_patches.py while waiting for the upstream fix. Happy to provide additional reproduction artifacts if useful.
Summary
popv.algorithms._scanvi.SCANVIrejects query data withValueError: Category unknown not found in source registry. Cannot transfer setup without extend_categories = Truewheneverprediction_mode="inference"runs against a query where any cell's_labels_annotationcarries theunknown_celltype_labelsentinel (default"unassigned"for the Tabula Sapiens pretrained references).Reproduction
When the SCANVI_POPV voter runs, scvi-tools raises:
Root cause
popv.preprocessing.Process_Query._setup_dataset:306-366writesunknown_celltype_labelinto every query cell's_labels_annotationcolumn and adds the sentinel string to the Categorical's category list. The trained TS scANVI model's source registry contains only real CL labels — the sentinel is not in it._scanvi.py:148callsscvi.model.SCANVI.load_query_data(...)withoutextend_categories=Truein the non-retrain branch:scvi-tools' validation rejects the unseen category and aborts.
Why the official tutorial does not hit this
The TS HubModel tutorial defaults to
prediction_mode="fast"which skips the SCANVI voter'sload_query_datacall. Out-of-distribution queries runningprediction_mode="inference"withSCANVI_POPVinmethods_listreliably hit this.Suggested patch
Pass
extend_categories=Truein the non-retrain branch of_scanvi.py:148:self.model = scvi.model.SCANVI.load_query_data( query, os.path.join(adata.uns["_save_path_trained_models"], "scanvi"), freeze_classifier=True, + extend_categories=True, )This is the same posture scvi-tools recommends for cross-dataset query adaptation; downstream prediction continues to map the extended category onto the trained label space.
Affected releases
Verified on
0.6.0andmain. No code diff between0.6.0andmainonpopv/algorithms/_scanvi.pybeyond a docstring rename (verified viagit diff 0.6.0..origin/main popv/algorithms/_scanvi.py).Context
We carry an in-template monkey-patch for this in our downstream pipeline (Cytoreason
nf-core-scdownstream) atmodules/local/popv_ensemble/templates/popv_patches.pywhile waiting for the upstream fix. Happy to provide additional reproduction artifacts if useful.