diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index ea56fa95c..adbba5a33 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -898,6 +898,7 @@ This benchmark is the empirical core of the paper (candidates: *SoftwareX*, *Geo | **v0.9.2.B** | **Specifier infrastructure** (Ano- / Epi- / Endo- / Bathy- / Panto- via prefix dispatch in the resolver). `.detect_specifier()` recognises the prefix, `.apply_specifier()` calls the base `qual_*` and filters layers by depth band. No need to define one function per (specifier × base) -- the system is generic. CH gains Endogleyic / Endostagnic / Endocalcic in canonical positions. Specifiers Kato- / Poly- / Supra- / Thapto- / Amphi- deferred to v0.9.3 (require buried-horizon flags / chains of designations). Tests: +55 expectations. | **shipped** | | **v0.9.2.C** | **v0.3.x diagnostic corrections** -- false-positive reduction. **cambic** gains a depth gate (`min_top_cm = 5`) and a structural-development gate (`structure_grade ∈ {weak, moderate, strong}` AND `structure_type ∉ {massive, single grain}`); A horizons and massive-C no longer pass. **plaggic** gains an anthropogenic-evidence gate directly in the diagnostic (P >= 50 mg/kg OR artefacts > 0 OR designation Apl/Aplg/Apk); the v0.9.1 gate in `qual_plaggic` was removed (now direct delegation). **sombric** gains a humus-illuviation gate (the candidate layer must have OC ≥ OC_layer_above + 0.1%); the v0.3.3 permissiveness is eliminated. Resulting canonical classification change: DU → "Duric Skeletic Durisol" (loses Cambic from the massive BC1). FR (Latossolo) and the other 30 fixtures unchanged. Tests: +43 expectations. | **shipped** | | **v0.9.3.A** | **Remaining specifiers (Kato/Amphi/Poly/Supra/Thapto) + supplementary engine**. Refactor of `.wrb_specifiers` to support two `kind`s -- `depth` (simple depth band; reuses the v0.9.2.B path) and `filter` (custom function). Helpers: `.kato_filter` (top_cm >= 50), `.amphi_filter` (Epi AND Endo), `.poly_filter` (>= 2 disjoint runs), `.supra_filter` (above a barrier: continuous_rock / petric / technic_hard), `.thapto_filter` (designation ending in `b`). Engine: `resolve_wrb_qualifiers` now also processes the `supplementary:` slot of the YAML, returning `principal` + `supplementary` with families suppressed in both. `classify_wrb2022` renders the full WRB Ch 6 name with parenthesised tags. Tests: +66 expectations. | **shipped** | +| **v0.9.112** | **"An argic horizon is never a Regosol" (accuracy front B2, engine).** Honest B1 benchmark exposed a key correctness bug: a profile with a confirmed argic (clay-illuvial B) horizon dropped to the Regosol catch-all when the eutric/alic split (BS/Al-sat) was unmeasured (`luvisol()` returned NA → key skips → Regosol). Fix in `luvisol()` (R/diagnostics-rsg-argic-derived.R): a graceful Al-sat default mirroring the Acrisol BS-fallback — when `argic()` passes, CEC/clay≥24, and Al-sat is unmeasured on a **B master horizon**, default to Luvisol (the generic high-activity argic; Alisol needs positive Al-sat≥50). Fires only on `is.na()` (a measured Luvisol/Alisol is never overridden); B-horizon guard excludes a Fluvisol's stratified C-layer clay jump; `al_sat_pct` stays in `missing_data` so the assumption is transparent. **FEBR-WRB +9 Luvisols (Regosol→Luvisol), 0 regressions (17.8→21.9%); all 44 canonical fixtures byte-identical.** Scope note: the dominant FEBR-WRB ceiling is missing data (most argic-RSG pedons lack measured clay), not the discriminator the audit imagined — so this is a targeted correctness fix. | **shipped** | | **v0.9.110** | **Benchmark methodology (accuracy front B1).** Harness-only, engine unchanged. (1) **Sampling fix**: `.benchmark_one_dataset_one_system()` now filters each dataset to pedons carrying the requested system's reference label (`.benchmark_has_reference`) BEFORE the `max_n` cap (`.benchmark_filter_then_cap`); FEBR loads with `require_classification="any"`. Fixes the cap-before-filter bug that starved sparse labels (FEBR-USDA n=3 → hundreds). FEBR + BDsolos branches; KSSL/LUCAS/Redape documented as-is. (2) **Metrics**: `.benchmark_metrics_from_confusion()` (NIR majority baseline, balanced accuracy, macro-F1, Cohen's kappa, per-class P/R/F1) + `.benchmark_bootstrap_metrics()` (seed-42, RNG-preserving 95% CIs); attached to `pool_one()` and every report row via a uniform `.suite_row()`. (3) **Honest report**: new columns + `n<30` flag; LUCAS WRB labelled topsoil-only lower-bound (honest WRB rests on offline FEBR + AfSP, the latter now carrying the full metric set); SoilGrids subsoil-fill documented as opt-in. The accuracy-raising B2 (argic/ferralic/nitic discriminator, which moves fixtures) is a separate follow-up. | **shipped** | | **v0.9.109** | **CRAN release hardening.** Documentation-only; engine byte-identical. `R CMD check --as-cran` flagged 545 exported function topics missing `\value` (CI never ran `--as-cran`). ~600 atomic engine predicates (`qual_*`, `*_usda` gates, `carater_*`/`horizonte_*`) marked `@keywords internal` (still exported/callable, out of the public reference) → documented API ~910→~195; the 85 genuinely-public no-value topics gained `@return`. Runnable `@examples` on the entry points; `_pkgdown.yml` gains a `has_keyword("internal")` section; CI runs explicit `--as-cran` + `check_pkgdown()`; dead `SOILKEY_SKIP_*` vars removed; `LazyDataCompression: xz`; lifecycle → maturing. Result: `--as-cran` 0/0/0; suite 5038/0. | **shipped** | | **v0.9.108** | **Pro app polish (front 3 of 3).** A UX pass on `classify_app_pro/`: a soil-science `bs_theme()` palette + slim `www/soilkey.css` (warmer cards, navbar wordmark, CSS-only busy spinner); a global **pedon ribbon** (`page_navbar(header=)`, rendered from `rv$pedon`) and a **"Getting started" modal** with a one-click **Load example & classify** that builds the canonical Ferralsol through the real Pedon flow (`rv$example_request` → `mod_pedon` observer). New visualisations: a Vis-NIR spectrum plot (`pro_spectrum_plot()`, one trace per horizon) in **Spectra** and an uploaded-photo preview + VLM-confidence badge in **Photo**. lat/lon range validation in `mod_pedon`; the **USDA-family** / **WRB-specifier** toggles surfaced in the Classify sidebar, two-way-synced with Settings through a shared-`rv` single source of truth. **Package change (additive):** `report()` / `report_html()` / `report_pdf()` gain `include_family` / `specifiers` (forwarded to `classify_usda()` / `classify_wrb2022()` when a `PedonRecord` is passed); both default `FALSE` → report output byte-identical (regression test). No new dependencies. | **shipped** | diff --git a/DESCRIPTION b/DESCRIPTION index 15227dd94..ead4ed146 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,7 +1,7 @@ Package: soilKey Type: Package Title: Automated Soil Profile Classification per WRB 2022, SiBCS 5 and USDA Soil Taxonomy 13 -Version: 0.9.110 +Version: 0.9.112 Date: 2026-06-11 Authors@R: person("Hugo", "Rodrigues", diff --git a/NEWS.md b/NEWS.md index 6f0674461..01b1364d5 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,44 @@ +# soilKey 0.9.112 (2026-06-11) + +The "**an argic horizon is never a Regosol**" release (accuracy front B2, +engine). The honest B1 benchmark exposed a correctness bug in the WRB key: +a profile with a CONFIRMED argic (clay-illuvial B) horizon could drop to the +Regosol catch-all -- the gate for soils with NO diagnostic subsurface horizon +-- purely because the eutric/alic split (base saturation / Al-saturation) was +unmeasured, leaving the Luvisol gate at \code{NA}. + +## The fix (surgical, in the key) + +\itemize{ + \item \code{luvisol()} (R/diagnostics-rsg-argic-derived.R) gains a graceful + Al-saturation default, mirroring the Acrisol BS-fallback: when + \code{argic()} passes, the clay is high-activity (CEC/clay >= 24), and + Al-saturation is \strong{unmeasured} on a \strong{B master horizon}, + the profile defaults to \strong{Luvisol} (the generic high-activity + argic RSG; Alisol is the high-Al special case that requires positive + Al-sat >= 50 evidence). It fires only on \code{is.na()}, so a measured + Luvisol (Al-sat < 50) or Alisol (Al-sat >= 50) is never overridden, and + a B-horizon guard keeps it off a Fluvisol's stratified C-layer clay + jump (a sedimentary, not pedogenic, increase). \code{al_sat_pct} stays + in the result's \code{missing_data}, and Alisol surfaces as an + ambiguity, so the assumption is transparent. +} + +## Impact + +\itemize{ + \item Measured on the FEBR WRB benchmark: \strong{+9 Luvisols recovered + (Regosol -> Luvisol), 0 regressions} (17.8\% -> 21.9\% order accuracy). + All \strong{44 canonical fixtures classify byte-identically} (the + fallback only fires on missing data, which the fixtures never have). + \item Scope note from the B1 measurement: the dominant FEBR-WRB ceiling is + \emph{missing data} (most argic-RSG reference pedons carry no measured + clay at all), which no key change can address -- so this is a targeted + correctness fix, not the broad "discriminator" the earlier audit + imagined. +} + + # soilKey 0.9.110 (2026-06-11) The "**benchmark methodology**" release (front B1 of the accuracy work). A diff --git a/R/diagnostics-rsg-argic-derived.R b/R/diagnostics-rsg-argic-derived.R index acc512cab..67beb1de0 100644 --- a/R/diagnostics-rsg-argic-derived.R +++ b/R/diagnostics-rsg-argic-derived.R @@ -189,6 +189,35 @@ luvisol <- function(pedon, min_cec = 24, max_al_sat = 50) { max_pct = max_al_sat, candidate_layers = layers) + # v0.9.111: graceful Al-saturation fallback. A confirmed argic horizon with + # high-activity clay (CEC >= 24) but NO Al-saturation measurement is a Luvisol + # by default -- the Alisol is the special high-Al case that requires POSITIVE + # al_sat >= 50 evidence. Without this, an undeterminable eutric/alic split + # leaves al_sat_low = NA, the gate returns NA, and the profile drops to the + # Regosol catch-all -- but an argic horizon is never a Regosol. Fires only on + # is.na() (unmeasured): a measured Luvisol (al_sat < 50) already passes and a + # measured Alisol (al_sat >= 50) gives FALSE, so neither is overridden. The + # promoted layers must equal what the aggregate re-intersects with, and + # al_sat_pct stays in $missing so it still surfaces in $missing_data. + # The B-horizon guard rejects a known false-positive: argic's clay-increase + # test can fire on a STRATIFIED (sedimentary) clay jump between a Fluvisol's C + # layers. A genuine argic horizon is an illuvial B (Bt); a clay increase into + # a C layer is depositional, not pedogenic. Only default to Luvisol when the + # promoted, CEC-high argic layer is a B master horizon -- this keeps the + # Fluvisol (its argic sits on a C) keyed to Fluvisols, not Luvisol. + if (isTRUE(tests$cec_high$passed) && is.na(tests$al_sat_low$passed)) { + promoted <- intersect(arg$layers, tests$cec_high$layers) + desig <- as.character(pedon$horizons$designation)[promoted] + promoted <- promoted[grepl("B", desig)] + if (length(promoted) > 0L) { + tests$al_sat_low$passed <- TRUE + tests$al_sat_low$layers <- promoted + tests$al_sat_low$details <- c(tests$al_sat_low$details %||% list(), + list(al_sat_low_default = + "no Al-saturation measured; high-activity argic defaults to Luvisol")) + } + } + agg <- .argic_derived_aggregate(tests, layer_keys = c("cec_high", "al_sat_low")) diff --git a/inst/benchmarks/reports/benchmark_suite_v09112.md b/inst/benchmarks/reports/benchmark_suite_v09112.md new file mode 100644 index 000000000..6633192a9 --- /dev/null +++ b/inst/benchmarks/reports/benchmark_suite_v09112.md @@ -0,0 +1,26 @@ +# soilKey benchmark suite -- v0.9.112 + +Generated by `run_all_benchmarks()` (max_n = 200, level = order). + +## Accuracy by dataset x system + +Headline metric for imbalanced classes is **balanced accuracy / macro-F1**, read against the **NIR** (no-information-rate) majority-class baseline. Point accuracy carries a bootstrap 95% CI. + +| Dataset | System | n | Accuracy [95% CI] | Bal. acc | Macro-F1 | Kappa | NIR | Flag | +|---------|--------|--:|-------------------|---------:|---------:|------:|----:|------| +| canonical | all | 132 | 100.0% | n/a | n/a | n/a | n/a | | +| febr | sibcs | 200 | 38.0% (31.5%-45.0%) | 23.1% | 17.3% | 0.17 | 28.0% | | +| febr | usda | 194 | 45.4% (38.1%-52.6%) | 28.1% | 25.5% | 0.34 | 32.5% | | +| febr | wrb2022 | 199 | 22.6% (17.6%-28.6%) | 15.4% | 10.3% | 0.19 | 28.1% | | +| redape | sibcs | 94 | 59.6% (50.0%-69.1%) | 61.6% | 60.4% | 0.54 | 26.6% | | + +## Zero-recall classes (improvement targets) + +- **redape/sibcs**: nitossolos, unknowns + +## Notes + +- The **canonical** row is an offline fixture sanity check (coverage, not field accuracy); it has no confusion matrix, so its per-class metrics are blank. +- Rows flagged **n<30** are statistically indicative only. External-dataset rows reflect the local data snapshot and `max_n`. +- **lucas_esdb/wrb2022** is a topsoil-only **lower bound** (LUCAS ships 0-20 cm chemistry only); the honest WRB-at-scale number is the morphologically-complete **FEBR** row. For a LUCAS estimate with a synthetic subsoil, run the opt-in (network, ~1 h): `benchmark_lucas_2018(pedons, fill_subsoil_from = "soilgrids")`. +- **kssl/usda** uses a head-N (not random) sample of the gpkg; **bdsolos** accumulates leading (state-clustered) CSVs until the label cap is met. Both are documented samples, not full random draws. diff --git a/tests/testthat/test-b2-argic-never-regosol.R b/tests/testthat/test-b2-argic-never-regosol.R new file mode 100644 index 000000000..06c802b9c --- /dev/null +++ b/tests/testthat/test-b2-argic-never-regosol.R @@ -0,0 +1,88 @@ +# Tests for v0.9.111 "an argic horizon is never a Regosol": the Luvisol +# graceful-default fallback in luvisol(). A confirmed argic horizon with +# high-activity clay (CEC >= 24) but no Al-saturation measurement defaults to +# Luvisol instead of dropping to the Regosol catch-all -- guarded so a measured +# Alisol/Luvisol is never overridden and the argic must sit on a B master +# horizon (not a stratified Fluvisol C layer). + +# A 3-horizon argic profile: clean clay increase into a high-activity Bt. +# al_sat / base cations control whether the eutric/alic split is determinable. +.b2_argic_pedon <- function(al_sat = NA_real_, ca = NA_real_, mg = NA_real_, + k = NA_real_, na = NA_real_, al_cmol = NA_real_, + bt_designation = c("Bt1", "Bt2")) { + h <- data.frame( + designation = c("A", bt_designation[1], bt_designation[2]), + top_cm = c(0, 25, 60), bottom_cm = c(25, 60, 120), + clay_pct = c(15, 38, 40), silt_pct = c(20, 17, 15), + sand_pct = c(65, 45, 45), cec_cmol = c(8, 16, 16), + ph_h2o = c(5.5, 5.6, 5.7), + clay_films_amount = c(NA, "common", "common"), + al_sat_pct = c(NA, al_sat, al_sat), + ca_cmol = c(NA, ca, ca), mg_cmol = c(NA, mg, mg), + k_cmol = c(NA, k, k), na_cmol = c(NA, na, na), + al_cmol = c(NA, al_cmol, al_cmol), + stringsAsFactors = FALSE) + soilKey::PedonRecord$new(site = list(id = "b2"), horizons = h) +} + +test_that("a high-activity argic with no Al-sat defaults to Luvisol, not Regosol", { + p <- .b2_argic_pedon() # al_sat + all bases NA + lv <- soilKey:::luvisol(p) + expect_true(isTRUE(soilKey:::argic(p)$passed)) + expect_true(isTRUE(lv$passed)) # promoted + expect_gt(length(lv$layers), 0L) # non-empty layers (load-bearing) + expect_true(any(grepl("al_sat", lv$missing))) # al_sat still flagged + expect_true(!is.null(lv$evidence$al_sat_low$details$al_sat_low_default)) + res <- classify_wrb2022(p, on_missing = "silent") + expect_equal(res$rsg_or_order, "Luvisols") # was "Regosols" pre-v0.9.111 + expect_true(any(grepl("al_sat", res$missing_data))) # assumption surfaced +}) + +test_that("a measured Alisol (al_sat >= 50) is not overridden by the default", { + p <- .b2_argic_pedon(al_sat = 60, ca = 1, mg = 1, k = 0.2, na = 0.1, + al_cmol = 6) + expect_true(isTRUE(soilKey:::alisol(p)$passed)) + # Luvisol must be FALSE (measured high Al), NOT NA and NOT promoted-TRUE + expect_false(isTRUE(soilKey:::luvisol(p)$passed)) + expect_equal(classify_wrb2022(p, on_missing = "silent")$rsg_or_order, + "Alisols") +}) + +test_that("a measured Luvisol (al_sat < 50) passes the canonical path, not the default", { + p <- .b2_argic_pedon(al_sat = 20, ca = 4, mg = 2, k = 0.3, na = 0.1, + al_cmol = 1) + lv <- soilKey:::luvisol(p) + expect_true(isTRUE(lv$passed)) + # canonical pass -> the default note must NOT be present + expect_null(lv$evidence$al_sat_low$details$al_sat_low_default %||% NULL) + expect_equal(classify_wrb2022(p, on_missing = "silent")$rsg_or_order, + "Luvisols") +}) + +test_that("Alisol abstains (NA) when Al-sat is unmeasured, ceding to the promoted Luvisol", { + # Guards the key-ordering reasoning: Alisol (tested before Luvisol) must + # return NA (skip), not FALSE, so the engine continues to the Luvisol gate. + p <- .b2_argic_pedon() + expect_true(is.na(soilKey:::alisol(p)$passed)) +}) + +test_that("the default does NOT fire on a stratified clay increase in a C layer", { + # Mirrors the make_fluvisol_canonical pattern: argic's clay-increase test + # fires on a sedimentary jump between C layers; that is a Fluvisol, not a + # default Luvisol. The B-horizon guard keeps it out of the Luvisol gate. + p <- .b2_argic_pedon(bt_designation = c("C1", "C2")) # argic layer is a C + expect_false(isTRUE(soilKey:::luvisol(p)$passed)) # NA or FALSE, not TRUE +}) + +test_that("canonical fixtures with measured chemistry are byte-identical", { + # The fallback fires only on is.na(al_sat); every argic-derived fixture + # carries measured or computable al_sat/BS, so none flips. + expect_equal(classify_wrb2022(make_luvisol_canonical())$rsg_or_order, "Luvisols") + expect_equal(classify_wrb2022(make_alisol_canonical())$rsg_or_order, "Alisols") + expect_equal(classify_wrb2022(make_acrisol_canonical())$rsg_or_order, "Acrisols") + expect_equal(classify_wrb2022(make_lixisol_canonical())$rsg_or_order, "Lixisols") + expect_equal(classify_wrb2022(make_fluvisol_canonical())$rsg_or_order, "Fluvisols") + # the SiBCS argic fixtures' WRB landings (previously unasserted) are pinned + expect_equal(classify_wrb2022(make_argissolo_canonical())$rsg_or_order, "Acrisols") + expect_equal(classify_wrb2022(make_luvissolo_canonical())$rsg_or_order, "Luvisols") +})