Hi,
we identified a bug in cell-eval caused by a change in pdex starting from version >= 0.2.0.
Previously, the fold_change column in pdex output contained linear fold changes computed as target_mean / ref_mean. In newer versions, pdex now outputs log2 fold changes, but the column name remains fold_change.
cell-eval expects linear fold changes unless a log2_fold_change column is present. If that column is missing, it computes log2 values from fold_change. Because pdex now already provides log2-transformed values under the same column name, this results in applying log2 twice:
|
# Add log2 fold change columns if not present |
|
if self.log2_fold_change_col not in self.data.columns: |
|
self.data = self.data.with_columns( |
|
pl.col(self.fold_change_col) |
|
.log(base=2) |
|
.alias(self.log2_fold_change_col) |
|
.fill_nan(0.0) |
|
).with_columns( |
|
pl.col(self.log2_fold_change_col) |
|
.abs() |
|
.alias(self.abs_log2_fold_change_col) |
|
) |
- First transformation (in
pdex): FC → log2(FC)
- Second transformation (in
cell-eval): log2(FC) → log2(log2(FC))
This leads to:
- Values between 0 and 1 becoming negative after the first log2
- Invalid values (NaN) after the second log2
- NaNs being replaced with 0.0 in
cell-eval (see line 106 above)
As a result, roughly 50% of fold change values are set to zero, which significantly affects downstream metrics, including:
overlap_at_*
precision_at_*
de_direction_match
Two ways to fix that:
- Revert
pdex to output linear fold changes in fold_change column, maintaining current naming
- Alternatively, update
pdex by changing the name of the output column from fold_change to log2_fold_change
The second option may introduce compatibility issues with downstream tools expecting the fold_change column.
Thanks
Hi,
we identified a bug in
cell-evalcaused by a change inpdexstarting from version >= 0.2.0.Previously, the
fold_changecolumn inpdexoutput contained linear fold changes computed astarget_mean / ref_mean. In newer versions,pdexnow outputs log2 fold changes, but the column name remainsfold_change.cell-evalexpects linear fold changes unless alog2_fold_changecolumn is present. If that column is missing, it computes log2 values fromfold_change. Becausepdexnow already provides log2-transformed values under the same column name, this results in applying log2 twice:cell-eval/src/cell_eval/_types/_de.py
Lines 100 to 111 in 2db6b9c
pdex): FC → log2(FC)cell-eval): log2(FC) → log2(log2(FC))This leads to:
cell-eval(see line 106 above)As a result, roughly 50% of fold change values are set to zero, which significantly affects downstream metrics, including:
overlap_at_*precision_at_*de_direction_matchTwo ways to fix that:
pdexto output linear fold changes infold_changecolumn, maintaining current namingpdexby changing the name of the output column fromfold_changetolog2_fold_changeThe second option may introduce compatibility issues with downstream tools expecting the
fold_changecolumn.Thanks