fix(benchmark): restore offline AfSP WRB benchmark in the suite (v0.9.111)#63
Open
HugoMachadoRodrigues wants to merge 1 commit into
Open
fix(benchmark): restore offline AfSP WRB benchmark in the suite (v0.9.111)#63HugoMachadoRodrigues wants to merge 1 commit into
HugoMachadoRodrigues wants to merge 1 commit into
Conversation
….111)
The offline AfSP (Africa Soil Profiles) WRB benchmark was silently dropped
from run_all_benchmarks()'s consolidated report. Two bugs, one mechanism:
1. load_afsp_sample() returns a wrapper list(pedons, pulled_on, source,
filter) -- length 4 -- mirroring load_wosis_stratified_sample() /
load_kssl_sample(). The suite fed that wrapper straight into
benchmark_afsp() instead of its $pedons slot.
2. benchmark_afsp() then iterated the 4 wrapper elements and hit $site on
the atomic ones (pulled_on, source) -> "$ operator is invalid for
atomic vectors", which .suite_run_afsp()'s tryCatch swallowed to NULL,
so AfSP never appeared.
The bundled inst/extdata/afsp_sample.rds was NOT a stub: it already holds the
full stratified 5 x 24 = 120-pedon sample.
Fix (engine untouched -- harness only):
- New internal .afsp_as_pedons() unwraps $pedons when handed the wrapper
and passes a bare pedon list through unchanged.
- benchmark_afsp() calls it (accepts either form) and skips any
non-PedonRecord member instead of erroring.
- .suite_run_afsp() unwraps before its length-guard and max_n slice.
- load_afsp_sample() docs corrected from the never-shipped "130 / 26 RSGs"
to the actual 120 / 24, and clarify the wrapper return.
Result: run_all_benchmarks() surfaces a real afsp_sample/wrb2022 row (n=120,
acc 25%, bal-acc 25%, macro-F1 0.22, kappa 0.22, NIR 4.2%, bootstrap CI) -- a
2nd independent OFFLINE WRB benchmark alongside FEBR -- with the full v0.9.110
metric set.
Tests:
- New test-v09111-afsp-benchmark-harness.R: benchmark_afsp(load_afsp_sample())
(raw wrapper) returns non-NULL with a non-empty confusion matrix, junk
members are skipped, .suite_run_afsp() yields a full-metric row, and
run_all_benchmarks() surfaces AfSP.
- test-v09106-benchmark-suite.R: replace the stale assertion that encoded the
old buggy "only canonical survives" behaviour with one that asserts external
datasets are skipped while offline rows (canonical + AfSP) appear.
Full suite: 1702 tests, 0 failures, 0 errors.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Restores the offline AfSP (Africa Soil Profiles) WRB benchmark row in run_all_benchmarks() by unwrapping the bundled load_afsp_sample() wrapper correctly and hardening the AfSP benchmark harness so it no longer fails silently.
Changes:
- Added internal unwrapping helper
.afsp_as_pedons()and updatedbenchmark_afsp()to accept either the wrapper or a bare pedon list (skipping non-PedonRecordmembers). - Updated the benchmark suite’s AfSP wiring to unwrap before length checks / slicing so the full sample is benchmarked.
- Added regression tests and refreshed documentation/NEWS + version bump to 0.9.111.
Reviewed changes
Copilot reviewed 6 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/testthat/test-v09111-afsp-benchmark-harness.R | New regression tests covering wrapper unwrapping, junk-member skipping, suite row surfacing, and consolidated suite visibility. |
| tests/testthat/test-v09106-benchmark-suite.R | Updates suite expectations to allow offline AfSP row while still skipping absent external datasets. |
| R/benchmark-suite.R | Unwraps AfSP wrapper before guards/slicing to prevent feeding atomic wrapper members into the benchmark loop. |
| R/afsp.R | Adds .afsp_as_pedons() and makes benchmark_afsp() resilient to wrapper inputs and non-pedon elements; updates load_afsp_sample() docs. |
| NEWS.md | Adds 0.9.111 entry describing the harness fix and restored AfSP offline benchmark row. |
| man/load_afsp_sample.Rd | Updates documented sample size and clarifies wrapper return shape. |
| man/dot-afsp_as_pedons.Rd | New generated Rd for the internal unwrapping helper. |
| man/benchmark_afsp.Rd | Updates parameter docs to reflect wrapper acceptance. |
| DESCRIPTION | Version/date bump to 0.9.111 / 2026-06-12. |
Files not reviewed (3)
- man/benchmark_afsp.Rd: Generated file
- man/dot-afsp_as_pedons.Rd: Generated file
- man/load_afsp_sample.Rd: Generated file
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+296
to
+298
| .afsp_as_pedons <- function(x) { | ||
| if (is.list(x) && !is.null(x$pedons) && is.list(x$pedons)) x$pedons else x | ||
| } |
Comment on lines
+96
to
+98
| expect_true(any(hit)) | ||
| expect_equal(out$summary$n_compared[hit], 120L) | ||
| expect_gt(out$summary$accuracy[hit], 0.15) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The offline AfSP (Africa Soil Profiles) WRB benchmark was silently dropped from
run_all_benchmarks()'s consolidated report —.suite_run_afsp()returnedNULLand AfSP never appeared as a row. This restores it as a 2nd independent offline WRB benchmark alongside FEBR. Surfaced during the v0.9.110 "benchmark methodology" (B1) work; scoped out then as a separate harness issue.The classification engine is untouched — every edit is in the AfSP loader/benchmark and the suite wiring.
Root cause — two bugs, one mechanism
The bundled
inst/extdata/afsp_sample.rdswas not a stub: it already holds the full stratified 5 × 24 = 120-pedon sample inside a wrapperlist(pedons, pulled_on, source, filter)(mirroringload_wosis_stratified_sample()/load_kssl_sample()).load_afsp_sample()returns that 4-element wrapper; the suite fed it straight intobenchmark_afsp()instead of its$pedonsslot. (The "only 4 pedons" symptom was the wrapper's length, not a truncated sample.)benchmark_afsp()then iterated the 4 wrapper elements and hit$siteon the atomic ones (pulled_on,source) →"$ operator is invalid for atomic vectors"→ swallowed by.suite_run_afsp()'stryCatch→ silentNULL.Fix
.afsp_as_pedons()unwraps$pedonswhen handed the wrapper and passes a bare pedon list through unchanged.benchmark_afsp()calls it (so it accepts either form) and skips any non-PedonRecordmember instead of erroring..suite_run_afsp()unwraps before its length-guard andmax_nslice.load_afsp_sample()docs corrected from the never-shipped "130 / 26 RSGs" to the actual 120 / 24, and clarify the wrapper return; 3 Rd files regenerated.Result
run_all_benchmarks()now surfaces a realafsp_sample / wrb2022row with the full v0.9.110 metric set:Tests
test-v09111-afsp-benchmark-harness.R: assertsbenchmark_afsp(load_afsp_sample())(the raw wrapper) returns non-NULL with a non-empty confusion matrix, junk members are skipped,.suite_run_afsp()yields a full-metric row, andrun_all_benchmarks()surfaces AfSP.test-v09106-benchmark-suite.R: replaced the stale assertion that encoded the old buggy "only canonical survives" behaviour with one asserting external datasets are skipped while offline rows (canonical + AfSP) appear.Full suite: 1702 tests, 0 failures, 0 errors.
Version bumped to 0.9.111 with a NEWS entry.
🤖 Generated with Claude Code