Skip to content

Skip metrics can lead to degenerate performance #219

@okyangyishen

Description

@okyangyishen

I tested an eval config that skips almost everything and keeps only pearson_delta and discrimination_score_l1 for faster iteration through evaluator.compute(profile="full", metric_configs={}, skip_metrics=skip_metrics). Unexpectedly, those two metrics got much worse, even though model predictions were the same. This looks like a cell_eval pipeline bug: skipping many metrics changes internal intermediate state (likely hidden dependency/order effect), which makes pearson_delta/discrimination_score_l1 unreliable in that reduced setup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions