Skip to content

Fix #356 compatibility#361

Merged
KartikP merged 24 commits into
mainfrom
epflneuroailab-changes
May 18, 2026
Merged

Fix #356 compatibility#361
KartikP merged 24 commits into
mainfrom
epflneuroailab-changes

Conversation

@KartikP
Copy link
Copy Markdown
Contributor

@KartikP KartikP commented Feb 11, 2026

From #356 but with other changes to get it to work with testing infrastructure.

@KartikP KartikP force-pushed the epflneuroailab-changes branch from ec07cd5 to 27f9d2e Compare February 11, 2026 20:48
@KartikP KartikP force-pushed the epflneuroailab-changes branch from bb6ba01 to b913b91 Compare February 11, 2026 21:07
The new cross-validation kwargs (split_coord="story", kfold="group")
change the ceiling from 0.2103 to 0.1446.
- blank2014: add story coord to predictions (needed for split_coord="story")
- fedorenko2016: add sentence_id coord to predictions (needed for split_coord="sentence_id")
- pereira2018: replace .coords check with try/except to handle xarray
  MultiIndex levels not appearing in .coords (xarray 2022.3.0 behavior)
- pereira2018/test.py: update test_dummy_bad expected scores for new
  GroupKFold CV strategy (243: 0.0186->0.0, 384: 0.0334->0.0168)
- linear_predictivity/test.py: update expected score (0.0283->0.0410)
@mike-ferguson mike-ferguson added the submission_prepared Attached to a PR is metadata and layer mapping is successful. label Feb 12, 2026
@KartikP KartikP removed the submission_prepared Attached to a PR is metadata and layer mapping is successful. label Feb 12, 2026
@mike-ferguson mike-ferguson added the submission_prepared Attached to a PR is metadata and layer mapping is successful. label Feb 12, 2026
@KartikP KartikP changed the title Group-based Splitting for Benchmarks + Batch-layer Evaluation + GPU Support Fix #356 compatibility Feb 12, 2026
@KartikP
Copy link
Copy Markdown
Contributor Author

KartikP commented Feb 13, 2026

@BKHMSI Would like your feedback on my changes for your PR when you have the chance.

@BKHMSI
Copy link
Copy Markdown
Contributor

BKHMSI commented Feb 18, 2026

Hi @KartikP, thanks for making it work with the testing infrastructure.

The only comment I have is that the benchmarks with linear regression (e.g., Blank2014-linear) should use the same cross validation kwargs as the ridge regression one (i.e., group-based splitting). If you want to keep the old method, then I propose to give it the identifier Blank2014-linear-legacy or something similar.

@KartikP
Copy link
Copy Markdown
Contributor Author

KartikP commented Feb 18, 2026

Hi @KartikP, thanks for making it work with the testing infrastructure.

The only comment I have is that the benchmarks with linear regression (e.g., Blank2014-linear) should use the same cross validation kwargs as the ridge regression one (i.e., group-based splitting). If you want to keep the old method, then I propose to give it the identifier Blank2014-linear-legacy or something similar.

I'll verify that your original implement can reproduce previous scores. Otherwise, I'll mark the linear benchmarks as linear-legacy and then add the kwargs back in.

KartikP added 3 commits May 15, 2026 14:44
Renames Blank2014-linear, Fedorenko2016-linear, and Pereira2018.{243,384}sentences-linear
to *-linear-shuffle to disambiguate the shuffle-CV legacy variants from the new
group-CV ridge variants. Makes the shuffle cv_kwargs explicit on the renamed variants.

Pereira2018 keeps loading its cached ceiling from the legacy S3 identifier so the
existing S3 artifacts are reused. Tests, integration tests, and examples updated.
KartikP and others added 5 commits May 15, 2026 14:49
# Conflicts:
#	brainscore_language/benchmarks/pereira2018/benchmark.py
…arks (fedorenko2016,pereira2018,tuckute2024,blank2014)
Pereira2018_*_ridge() now loads its cached extrapolation ceiling from
brainscore-storage/brainscore-language/ (ceiling, raw, raw_raw) instead of
recomputing on every benchmark instantiation.
The linear ceiling files migrated from brainscore-language (direct) to
brainscore-storage/brainscore-language/ without versioning enabled, so the
old direct-bucket version_ids 404 when load_from_s3 hits storage. Sha1s
match the storage objects exactly, so the integrity check still verifies
the download.
Bootstrap iterations whose sampled scores do not fit the exponential
(seen with the ridge metric on both Blank2014 and Fedorenko2016, where
curve_fit hits its maxfev limit) now mark params as NaN and continue,
matching the Pereira2018 ceiling_packaging.py behavior. Also filters
NaN/Inf inputs before curve_fit.
@KartikP KartikP merged commit 6650678 into main May 18, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

submission_prepared Attached to a PR is metadata and layer mapping is successful.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants