Fix #356 compatibility by KartikP · Pull Request #361 · brain-score/language

KartikP · 2026-02-11T20:43:07Z

From #356 but with other changes to get it to work with testing infrastructure.

…las)

The new cross-validation kwargs (split_coord="story", kfold="group") change the ceiling from 0.2103 to 0.1446.

- blank2014: add story coord to predictions (needed for split_coord="story") - fedorenko2016: add sentence_id coord to predictions (needed for split_coord="sentence_id") - pereira2018: replace .coords check with try/except to handle xarray MultiIndex levels not appearing in .coords (xarray 2022.3.0 behavior) - pereira2018/test.py: update test_dummy_bad expected scores for new GroupKFold CV strategy (243: 0.0186->0.0, 384: 0.0334->0.0168) - linear_predictivity/test.py: update expected score (0.0283->0.0410)

KartikP · 2026-02-13T10:27:52Z

@BKHMSI Would like your feedback on my changes for your PR when you have the chance.

BKHMSI · 2026-02-18T12:23:52Z

Hi @KartikP, thanks for making it work with the testing infrastructure.

The only comment I have is that the benchmarks with linear regression (e.g., Blank2014-linear) should use the same cross validation kwargs as the ridge regression one (i.e., group-based splitting). If you want to keep the old method, then I propose to give it the identifier Blank2014-linear-legacy or something similar.

KartikP · 2026-02-18T13:46:50Z

Hi @KartikP, thanks for making it work with the testing infrastructure.

The only comment I have is that the benchmarks with linear regression (e.g., Blank2014-linear) should use the same cross validation kwargs as the ridge regression one (i.e., group-based splitting). If you want to keep the old method, then I propose to give it the identifier Blank2014-linear-legacy or something similar.

I'll verify that your original implement can reproduce previous scores. Otherwise, I'll mark the linear benchmarks as linear-legacy and then add the kwargs back in.

Renames Blank2014-linear, Fedorenko2016-linear, and Pereira2018.{243,384}sentences-linear to *-linear-shuffle to disambiguate the shuffle-CV legacy variants from the new group-CV ridge variants. Makes the shuffle cv_kwargs explicit on the renamed variants. Pereira2018 keeps loading its cached ceiling from the legacy S3 identifier so the existing S3 artifacts are reused. Tests, integration tests, and examples updated.

# Conflicts: # brainscore_language/benchmarks/pereira2018/benchmark.py

…arks (fedorenko2016,pereira2018,tuckute2024,blank2014)

Pereira2018_*_ridge() now loads its cached extrapolation ceiling from brainscore-storage/brainscore-language/ (ceiling, raw, raw_raw) instead of recomputing on every benchmark instantiation.

The linear ceiling files migrated from brainscore-language (direct) to brainscore-storage/brainscore-language/ without versioning enabled, so the old direct-bucket version_ids 404 when load_from_s3 hits storage. Sha1s match the storage objects exactly, so the integrity check still verifies the download.

Bootstrap iterations whose sampled scores do not fit the exponential (seen with the ridge metric on both Blank2014 and Fedorenko2016, where curve_fit hits its maxfev limit) now mark params as NaN and continue, matching the Pereira2018 ceiling_packaging.py behavior. Also filters NaN/Inf inputs before curve_fit.

KartikP added 2 commits February 11, 2026 14:58

Update dependency versions in pyproject.toml

7049f93

correct file changes

27f9d2e

KartikP force-pushed the epflneuroailab-changes branch from ec07cd5 to 27f9d2e Compare February 11, 2026 20:48

update sklearn to be greater than 1.6 (inconsistent with vision but a…

b913b91

…las)

KartikP force-pushed the epflneuroailab-changes branch from bb6ba01 to b913b91 Compare February 11, 2026 21:07

KartikP added 10 commits February 11, 2026 16:08

Merge branch 'main' into epflneuroailab-changes

3fa2a36

restore plugin submission orchestrator

4f6bf08

import numpy

deb1899

Add layer to dummy model neuroidassembly

1a544ae

return score object with mean of list of score

6a65f52

Update blank2014 test_ceiling expected value for GroupKFold CV

6738606

The new cross-validation kwargs (split_coord="story", kfold="group") change the ceiling from 0.2103 to 0.1446.

remove cv_kwargs for linear

9adee94

account for difference in pearsonr correlation methods

dbe92a8

Propogate raw and ceiling values for database

16e896a

mike-ferguson added the submission_prepared Attached to a PR is metadata and layer mapping is successful. label Feb 12, 2026

KartikP removed the submission_prepared Attached to a PR is metadata and layer mapping is successful. label Feb 12, 2026

mike-ferguson added the submission_prepared Attached to a PR is metadata and layer mapping is successful. label Feb 12, 2026

make identifier format consistent across ridge variants

cf37b02

KartikP changed the title ~~Group-based Splitting for Benchmarks + Batch-layer Evaluation + GPU Support~~ Fix #356 compatibility Feb 12, 2026

This was referenced Feb 12, 2026

Group-based Splitting for Benchmarks + Batch-layer Evaluation + GPU Support #356

Closed

add OASM model from Hadidi et al. 2025 #355

Merged

Merge branch 'main' into epflneuroailab-changes

54ba7cb

Merge branch 'main' into epflneuroailab-changes

a9def6b

KartikP added 3 commits May 15, 2026 14:44

update ridge ceilings

019823c

handle nan and notebook

8b547d5

KartikP and others added 5 commits May 15, 2026 14:49

Merge remote-tracking branch 'origin/main' into epflneuroailab-changes

87e8cc7

# Conflicts: # brainscore_language/benchmarks/pereira2018/benchmark.py

Auto-generate: metadata generation: models (random_embedding), benchm…

e46791e

…arks (fedorenko2016,pereira2018,tuckute2024,blank2014)

wire ridge ceilings to s3

7b82251

Pereira2018_*_ridge() now loads its cached extrapolation ceiling from brainscore-storage/brainscore-language/ (ceiling, raw, raw_raw) instead of recomputing on every benchmark instantiation.

KartikP merged commit 6650678 into main May 18, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #356 compatibility#361

Fix #356 compatibility#361
KartikP merged 24 commits into
mainfrom
epflneuroailab-changes

KartikP commented Feb 11, 2026

Uh oh!

KartikP commented Feb 13, 2026

Uh oh!

BKHMSI commented Feb 18, 2026

Uh oh!

KartikP commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

KartikP commented Feb 11, 2026

Uh oh!

KartikP commented Feb 13, 2026

Uh oh!

BKHMSI commented Feb 18, 2026

Uh oh!

KartikP commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants