Common-gene filtering can misalign spatial and reference expression columns

Hi again! I opened a separate issue about the spot-wise cosine similarity, but I think there may also be an independent issue in the gene alignment.

**R implementation**

In the R implementation, both matrices are indexed using the same ordered vector of common genes:

```
cg <- intersect(colnames(ref$X), colnames(srt$X))

srt$X <- srt$X[, cg]
ref$X <- ref$X[, cg]
```

This ensures that both matrices have the same gene ordering before downstream computations.

**Python implementation**

In the Python implementation, the common genes are identified first:

`common_genes = np.intersect1d(spatial['genes'], ref['genes'])`

but each matrix is then filtered independently:

```
sp_idx = np.where(np.isin(spatial['genes'], common_genes))[0]
rf_idx = np.where(np.isin(ref['genes'], common_genes))[0]
```

`np.isin()` preserves the original order of the array being filtered. Therefore, this relies on the common genes having the same relative order in the spatial and reference matrices.

This matters because the least-squares initialization, cosine similarities, reconstructed expression, and gradients all assume column-wise correspondence between the two expression matrices.
Different ordering can occur when the datasets have undergone different preprocessing. For example, if reference differential-expression selection returns genes in score-ranked order while the spatial preprocessing preserves the original feature order, the resulting matrices may no longer share the same relative gene ordering.

Unless I'm missing something, it would be safer to index both matrices using the same ordered vector of common genes (as in the R implementation), rather than relying on the existing ordering being identical.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Common-gene filtering can misalign spatial and reference expression columns #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Common-gene filtering can misalign spatial and reference expression columns #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions