fix(benchmark): restore offline AfSP WRB benchmark in the suite (v0.9.111) by HugoMachadoRodrigues · Pull Request #63 · HugoMachadoRodrigues/soilKey

HugoMachadoRodrigues · 2026-06-12T18:25:33Z

Summary

The offline AfSP (Africa Soil Profiles) WRB benchmark was silently dropped from run_all_benchmarks()'s consolidated report — .suite_run_afsp() returned NULL and AfSP never appeared as a row. This restores it as a 2nd independent offline WRB benchmark alongside FEBR. Surfaced during the v0.9.110 "benchmark methodology" (B1) work; scoped out then as a separate harness issue.

The classification engine is untouched — every edit is in the AfSP loader/benchmark and the suite wiring.

Root cause — two bugs, one mechanism

The bundled inst/extdata/afsp_sample.rds was not a stub: it already holds the full stratified 5 × 24 = 120-pedon sample inside a wrapper list(pedons, pulled_on, source, filter) (mirroring load_wosis_stratified_sample() / load_kssl_sample()).

load_afsp_sample() returns that 4-element wrapper; the suite fed it straight into benchmark_afsp() instead of its $pedons slot. (The "only 4 pedons" symptom was the wrapper's length, not a truncated sample.)
benchmark_afsp() then iterated the 4 wrapper elements and hit $site on the atomic ones (pulled_on, source) → "$ operator is invalid for atomic vectors" → swallowed by .suite_run_afsp()'s tryCatch → silent NULL.

Fix

New internal .afsp_as_pedons() unwraps $pedons when handed the wrapper and passes a bare pedon list through unchanged.
benchmark_afsp() calls it (so it accepts either form) and skips any non-PedonRecord member instead of erroring.
.suite_run_afsp() unwraps before its length-guard and max_n slice.
load_afsp_sample() docs corrected from the never-shipped "130 / 26 RSGs" to the actual 120 / 24, and clarify the wrapper return; 3 Rd files regenerated.

Result

run_all_benchmarks() now surfaces a real afsp_sample / wrb2022 row with the full v0.9.110 metric set:

n	Accuracy	Bal. acc	Macro-F1	Kappa	NIR
120	25.0%	25.0%	0.22	0.22	4.2%

Tests

New test-v09111-afsp-benchmark-harness.R: asserts benchmark_afsp(load_afsp_sample()) (the raw wrapper) returns non-NULL with a non-empty confusion matrix, junk members are skipped, .suite_run_afsp() yields a full-metric row, and run_all_benchmarks() surfaces AfSP.
test-v09106-benchmark-suite.R: replaced the stale assertion that encoded the old buggy "only canonical survives" behaviour with one asserting external datasets are skipped while offline rows (canonical + AfSP) appear.

Full suite: 1702 tests, 0 failures, 0 errors.

Version bumped to 0.9.111 with a NEWS entry.

🤖 Generated with Claude Code

….111) The offline AfSP (Africa Soil Profiles) WRB benchmark was silently dropped from run_all_benchmarks()'s consolidated report. Two bugs, one mechanism: 1. load_afsp_sample() returns a wrapper list(pedons, pulled_on, source, filter) -- length 4 -- mirroring load_wosis_stratified_sample() / load_kssl_sample(). The suite fed that wrapper straight into benchmark_afsp() instead of its $pedons slot. 2. benchmark_afsp() then iterated the 4 wrapper elements and hit $site on the atomic ones (pulled_on, source) -> "$ operator is invalid for atomic vectors", which .suite_run_afsp()'s tryCatch swallowed to NULL, so AfSP never appeared. The bundled inst/extdata/afsp_sample.rds was NOT a stub: it already holds the full stratified 5 x 24 = 120-pedon sample. Fix (engine untouched -- harness only): - New internal .afsp_as_pedons() unwraps $pedons when handed the wrapper and passes a bare pedon list through unchanged. - benchmark_afsp() calls it (accepts either form) and skips any non-PedonRecord member instead of erroring. - .suite_run_afsp() unwraps before its length-guard and max_n slice. - load_afsp_sample() docs corrected from the never-shipped "130 / 26 RSGs" to the actual 120 / 24, and clarify the wrapper return. Result: run_all_benchmarks() surfaces a real afsp_sample/wrb2022 row (n=120, acc 25%, bal-acc 25%, macro-F1 0.22, kappa 0.22, NIR 4.2%, bootstrap CI) -- a 2nd independent OFFLINE WRB benchmark alongside FEBR -- with the full v0.9.110 metric set. Tests: - New test-v09111-afsp-benchmark-harness.R: benchmark_afsp(load_afsp_sample()) (raw wrapper) returns non-NULL with a non-empty confusion matrix, junk members are skipped, .suite_run_afsp() yields a full-metric row, and run_all_benchmarks() surfaces AfSP. - test-v09106-benchmark-suite.R: replace the stale assertion that encoded the old buggy "only canonical survives" behaviour with one that asserts external datasets are skipped while offline rows (canonical + AfSP) appear. Full suite: 1702 tests, 0 failures, 0 errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Restores the offline AfSP (Africa Soil Profiles) WRB benchmark row in run_all_benchmarks() by unwrapping the bundled load_afsp_sample() wrapper correctly and hardening the AfSP benchmark harness so it no longer fails silently.

Changes:

Added internal unwrapping helper .afsp_as_pedons() and updated benchmark_afsp() to accept either the wrapper or a bare pedon list (skipping non-PedonRecord members).
Updated the benchmark suite’s AfSP wiring to unwrap before length checks / slicing so the full sample is benchmarked.
Added regression tests and refreshed documentation/NEWS + version bump to 0.9.111.

Reviewed changes

Copilot reviewed 6 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/testthat/test-v09111-afsp-benchmark-harness.R	New regression tests covering wrapper unwrapping, junk-member skipping, suite row surfacing, and consolidated suite visibility.
tests/testthat/test-v09106-benchmark-suite.R	Updates suite expectations to allow offline AfSP row while still skipping absent external datasets.
R/benchmark-suite.R	Unwraps AfSP wrapper before guards/slicing to prevent feeding atomic wrapper members into the benchmark loop.
R/afsp.R	Adds `.afsp_as_pedons()` and makes `benchmark_afsp()` resilient to wrapper inputs and non-pedon elements; updates `load_afsp_sample()` docs.
NEWS.md	Adds 0.9.111 entry describing the harness fix and restored AfSP offline benchmark row.
man/load_afsp_sample.Rd	Updates documented sample size and clarifies wrapper return shape.
man/dot-afsp_as_pedons.Rd	New generated Rd for the internal unwrapping helper.
man/benchmark_afsp.Rd	Updates parameter docs to reflect wrapper acceptance.
DESCRIPTION	Version/date bump to 0.9.111 / 2026-06-12.

Files not reviewed (3)

man/benchmark_afsp.Rd: Generated file
man/dot-afsp_as_pedons.Rd: Generated file
man/load_afsp_sample.Rd: Generated file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+.afsp_as_pedons <- function(x) {
+  if (is.list(x) && !is.null(x$pedons) && is.list(x$pedons)) x$pedons else x
+}


+  expect_true(any(hit))
+  expect_equal(out$summary$n_compared[hit], 120L)
+  expect_gt(out$summary$accuracy[hit], 0.15)


Copilot AI review requested due to automatic review settings June 12, 2026 18:25

Copilot started reviewing on behalf of HugoMachadoRodrigues June 12, 2026 18:25 View session

Copilot AI reviewed Jun 12, 2026

View reviewed changes

HugoMachadoRodrigues mentioned this pull request Jun 12, 2026

An argic horizon is never a Regosol — Luvisol missing-Al-sat default (v0.9.112) #64

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(benchmark): restore offline AfSP WRB benchmark in the suite (v0.9.111)#63

fix(benchmark): restore offline AfSP WRB benchmark in the suite (v0.9.111)#63
HugoMachadoRodrigues wants to merge 1 commit into
mainfrom
claude/agitated-payne-111bb5

HugoMachadoRodrigues commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HugoMachadoRodrigues commented Jun 12, 2026

Summary

Root cause — two bugs, one mechanism

Fix

Result

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants