Add Model 2a: same-parameter cross-biomarker covariance as a JAGS extension of model.jags (strictly nests Chapter 1) + Chapter-1 comparison by Kwan-Jenny · Pull Request #245 · UCD-SERG/serodynamics

Kwan-Jenny · 2026-06-13T04:15:18Z

Summary

Adds Model 2a, an additive extension of model.jags that introduces same-parameter cross-biomarker covariance into the subject-level antibody kinetic parameters while leaving every Chapter 1 within-biomarker block unchanged. Chapter 1 is recovered exactly when the new covariances are zero, so Model 2a strictly nests the current model. It also adds compare_mod_2a(), a head-to-head of Chapter 1 vs Model 2a on the same data.

Nothing in the existing Chapter 1 pipeline is modified — model.jags, prep_data(), prep_priors(), run_mod() are untouched; all code is new.

What Model 2a is

Chapter 1 covariance is biomarker-separated (cross-biomarker = 0):

Sigma = [ Sigma_G   0     ]
        [   0     Sigma_A  ]

Model 2a keeps Sigma_G, Sigma_A and adds a diagonal cross-biomarker
block C = diag(c_1..c_P) (same-parameter only):

Sigma = [ Sigma_G   C      ]      C = diag(c_p)
        [   C^T    Sigma_A  ]

c_p is the IgG~IgA covariance for the same kinetic parameter p; cross-parameter cross-biomarker terms stay 0; C = 0 reproduces Chapter 1 (35 covariance parameters vs 30).

Implementation (factor parameterization)

A Wishart precision cannot pin a sparse covariance's zero-pattern, so the exact pattern is generated with a shared latent factor per kinetic parameter:

par[i,k,p] = mu.par[k,p] + w[i,k,p] + lambda[k,p] * f[i,p]
w[i,k,1:P] ~ dmnorm(0, prec.par[k,,])   # Chapter 1 Wishart block, unchanged
f[i,p]     ~ dnorm(0, 1)                 # shared factor, one per parameter

For two biomarkers this gives c_p = lambda[1,p] * lambda[2,p] (derived from the loadings — this is a model quantity, independent of the sampler; it is the same c_p whether implemented in JAGS or Stan), cross-parameter cross-biomarker
covariance 0, and — with all lambda = 0 — exactly model.jags. The first biomarker's loadings are constrained > 0 for identifiability; the sign of c_p stays free via lambda[2,p]. The model is general in n_antigen_isos: with 2 biomarkers it is exactly Model 2a; with more, the single per-parameter factor couples all of them (rank-1, same-parameter).

inst/extdata/model_2a.jags is identical to model.jags except this par construction and the loading priors.

What's added

inst/extdata/model_2a.jags — the extended model.
Priors: add_factor_priors(), prep_priors_2a() (reuse Chapter 1 priors).
Fitting: jags_data_2a(), make_inits_2a(), run_mod_2a() — a lean wrapper that reuses prep_data() unchanged and returns the MCMC plus a tidy cross-biomarker covariance/correlation summary (no sr_model post-processing).
Post-processing (small pure helpers): cross_cov_from_loadings(), marginal_var_2a(), cross_cor_from_draw_2a(), summarize_cross_2a(), shared-internal jags_node_utils_2a.R.
Simulation + validation: build_sigma_2a(), sim_params_2a(), sim_case_data_2a(), validate_recovery_2a(), validate_nesting_2a().
Comparison: fit_chapter1_lean() (same posterior as run_mod()), summarize_curve_params_2a(), compare_mod_2a().

No new dependencies (all in Imports already); no new data (uses nepal_sees).

Chapter 1 vs Model 2a — what changes

compare_mod_2a() fits both models on the same data and reports:

What stays the same — the shared population means mu.par and the within-biomarker variances. Because Model 2a strictly nests Chapter 1, these should agree within MCMC error; a large difference would signal a problem, not
an improvement.
What Model 2a adds — the cross-biomarker covariance c_p / correlation rho_p per kinetic parameter, which Chapter 1 cannot represent (it is structurally 0 there).
Where improvement would show — the means are unchanged by design; the value of Model 2a lives in the covariance and in covariance-dependent quantities (credible intervals, joint posterior prediction, DIC/WAIC). A rigorous "is it better" evaluation (WAIC/LOO; downstream time-since-infection / seroincidence MAE/RMSE/coverage) is a planned follow-up, not this PR. compare_mod_2a(dic = TRUE) offers a best-effort DIC as a first signal.

Comparison results on nepal_sees (full-length MCMC, run locally):

(to be filled in from the local comparison run: convergence (max R-hat), the shared-parameter table with max |diff|, the cross-biomarker c_p/rho_p table and which CIs exclude 0, and the mu.par CI-width summary)

Verification

CI runs the pure-function unit tests + a tiny-MCMC smoke test that run_mod_2a() / compare_mod_2a() execute and return the expected structure (matching the existing test-run_mod.R style; CI installs JAGS).
Recovery of a known cross-biomarker correlation from simulated data, the no-false-positive check on independent data, and the Chapter 1 vs Model 2a comparison on nepal_sees, are run locally at full MCMC length (validate_recovery_2a(), validate_nesting_2a(), compare_mod_2a()); results pasted above.

How to try it

data(nepal_sees)              # already case_data; HlyE_IgG, HlyE_IgA
fit <- run_mod_2a(nepal_sees)
fit$cross                 # IgG ~ IgA covariance & correlation per parameter
cmp <- compare_mod_2a(nepal_sees)
cmp$shared; cmp$added     # shared params (Ch1 vs 2a) + the addition

Scope

This PR is the "add the model + show it nests Chapter 1 + compare to Chapter 1" increment. Quantifying improvement (WAIC/LOO, joint posterior-predictive comparison, seroincidence coverage/width) is a follow-up, since WAIC needs a
model-file change and the seroincidence step needs the serocalculator integration.

…AGS extension of model.jags Add Model 2a, an additive extension of model.jags that introduces same-parameter cross-biomarker covariance via a shared per-parameter latent factor, leaving every Chapter 1 within-biomarker block unchanged. Chapter 1 is recovered exactly when the loadings are zero (strict nesting). - model: inst/extdata/model_2a.jags - fit: run_mod_2a() with jags_data_2a, make_inits_2a, prep_priors_2a, add_factor_priors; reuses prep_data() unchanged - cross-biomarker summaries: summarize_cross_2a, cross_cov_from_loadings, marginal_var_2a, cross_cor_from_draw_2a (+ internal jags_node_utils) - comparison vs Chapter 1: compare_mod_2a, fit_chapter1_lean, summarize_curve_params_2a - simulation/validation: sim_params_2a, sim_case_data_2a, build_sigma_2a, validate_recovery_2a, validate_nesting_2a - tests: 7 pure-function unit tests + 1 JAGS-gated smoke test - example: run_mod_2a-examples.R (one fit on nepal_sees) No new dependencies; no Chapter 1 code modified.

claude · 2026-06-13T04:15:40Z

Claude finished @Kwan-Jenny's task in 6m 44s —— View job

Code Review: Model 2a PR #245

Tasks:

The model structure, factor parameterization, and nesting argument are mathematically sound. The code is well-decomposed and the test suite is solid. I found one documentation bug, two CLAUDE.md style violations, and several robustness/performance concerns.

1. Bug — `run_mod_2a` return value missing documented `priors` element

R/run_mod_2a.R:37 documents:

#'   - `priors`: the prior list used;

but the structure() call at lines 81–90 returns prec_lambda (a scalar) instead, with no priors element at all:

structure(
  list(
    mcmc = mcmc,
    cross = cross,
    antigens = antigens,
    prec_lambda = prec_lambda,   # ← priors element is missing
    runjags = fit
  ),
  class = c("model_2a_fit", "list")
)

fit$priors will silently return NULL for any caller following the docs. Either store jags_input (or attr(jags_input, "antigens") is already stored, so save the priors sub-list before c()-ing them) and add it, or drop the priors bullet from @returns and replace with prec_lambda.

Fix this →

2. CLAUDE.md violation — `base::merge()` in `compare_mod_2a.R`

R/compare_mod_2a.R:60:

shared <- merge(s1, s2, by = c("biomarker", "param"),
                suffixes = c("_ch1", "_2a"))

The lab style guide (CLAUDE.md) explicitly flags base::merge() in favour of dplyr::left_join() with by = join_by(...). Replace with:

shared <- dplyr::left_join(s1, s2, by = join_by(biomarker, param),
                           suffix = c("_ch1", "_2a"))

Fix this →

3. CLAUDE.md violation — `stop()` instead of `cli::cli_abort()`

The codebase uses cli::cli_abort() for all input validation errors (see prep_priors.R, as_case_data.R). Four new files use stop():

R/add_factor_priors.R:22–23
R/build_sigma_2a.R:26–28, 42–44
R/prep_priors_2a.R:22–24
R/cross_cov_from_loadings.R:16–18

Replace each stop(...) with cli::cli_abort(...) using the same interpolation style ({.arg}, {.val}, etc.) as the rest of the package. For example:

# before
stop("`prec_lambda` must be a single positive number.")

# after
cli::cli_abort("{.arg prec_lambda} must be a single positive number.")

Fix this →

4. Robustness — `summarize_cross_2a()` will crash confusingly on wrong input

R/summarize_cross_2a.R:33–34:

dims <- jags_node_dims(colnames(draws))
k <- dims[["lambda"]][1]   # NULL[1] if lambda not monitored

If called on an mcmc.list that doesn't contain lambda or prec.par (e.g. accidentally on a Chapter 1 fit), dims[["lambda"]] is NULL, NULL[1] returns NULL, and the error message downstream is cryptic. Add an upfront guard:

if (is.null(dims[["lambda"]]) || is.null(dims[["prec.par"]])) {
  cli::cli_abort(c(
    "{.fn summarize_cross_2a} requires monitored {.code lambda} and {.code prec.par} nodes.",
    "i" = "Use the mcmc.list from {.fn run_mod_2a}, not a Chapter 1 fit."
  ))
}

5. Robustness — positional `probs` indexing in `summarize_cross_2a()`

R/summarize_cross_2a.R:63–66:

cov_q <- apply(cov_draws, 2, stats::quantile, probs = probs, na.rm = TRUE)
...
cov_med = cov_q[2, ], cov_lo = cov_q[1, ], cov_hi = cov_q[3, ],

This hard-codes positions 1/2/3, so any user-supplied probs of a different length silently breaks the indexing. Use named access:

cov_med = cov_q["50%", ], cov_lo = cov_q[1, ], cov_hi = cov_q[3, ],

or carry probs through:

data.frame(
  cov_med = cov_q[2L, ], cov_lo = cov_q[1L, ], cov_hi = cov_q[length(probs), ],
  ...
)

Either way, a stopifnot(length(probs) == 3) guard or a note in @details would prevent silent errors.

6. Memory concern — `fit_chapter1_lean()` monitors unused per-subject nodes

R/fit_chapter1_lean.R:42:

monitor = c("mu.par", "prec.par", "y0", "y1", "t1", "alpha", "shape"),

compare_mod_2a() only calls summarize_curve_params_2a(m1[["mcmc"]]), which uses only mu.par and prec.par. The five per-subject arrays (y0, y1, t1, alpha, shape) — each of shape nsubj × n_antigen_isos × nmc × nchain — consume significant memory for large datasets and are never read. Consider restricting to c("mu.par", "prec.par") in the context of compare_mod_2a(), or document the intent if these are needed for other callers.

7. Performance — per-draw string-indexed loops are slow for large MCMC

R/summarize_cross_2a.R:44–50 and R/summarize_curve_params_2a.R:39–48 iterate over every draw and do:

get_node_matrix() — O(K×P) named-vector lookups per draw via draw_vec[["lambda[i,j]"]]
solve(prec_par_k) — a 5×5 matrix inversion per draw per biomarker

For 4 chains × 1000 draws = 4000 iterations this is ~800,000 named lookups and ~8000 solve() calls. You can pre-vectorize by extracting all lambda and prec.par columns at once before the loop, then reshaping via array(). The functional tests will still pass; this is purely a speed concern for production-length runs.

8. Minor — validation criterion in `validate_recovery_2a()`

R/validate_recovery_2a.R:43–49: the "recovered" verdict fires when the 95% CI excludes zero — which only tests that the model isn't finding a null where there should be a signal. A complete recovery check would additionally verify that the CI covers the true value (out$cor_lo <= out$true_rho & out$cor_hi >= out$true_rho). The current criterion is fine as a coarse sanity check but worth documenting as intentionally asymmetric, or upgrading to the coverage criterion.

JAGS model review

The model file is clean. The factor parameterization correctly gives:

Cov(par[i,1,p], par[i,2,p]) = lambda[1,p]*lambda[2,p] (= c_p)
Cross-parameter cross-biomarker covariance = 0 (different factors)
lambda = 0 ↔ Chapter 1 (strict nesting)

The identifiability constraint lambda[1,p] ~ dnorm(0, prec.lambda) T(0,) pins the first biomarker's loading positive, leaving the sign of c_p free via lambda[2,p]. This is the standard fix for the sign-flip non-identifiability in a one-factor model. ✓

The dmnorm(zero_p[], prec.par[cur_antigen_iso, , ]) slice syntax is carried over from model.jags (where it is tested), so it is safe to use here. ✓

Tests

The test suite is well-structured: pure-function unit tests with known analytic results (no JAGS needed), plus JAGS smoke tests behind skip_if_no_jags(). The make_fixed_draws helper in test-summarize_cross_2a.R is a clean pattern for testing MCMC summarization functions without sampling. The CI-only scope (smoke test) versus local-only scope (full recovery, comparison) is clearly documented.

codecov · 2026-06-13T04:19:09Z

❌ 1 Tests Failed:

Tests completed	Failed	Passed	Skipped
66	1	65	0

View the top 1 failed test(s) by shortest run time

sim_params_2a::sim_params_2a_sample_covariance_recovers_the_truth_large_n_

Stack Traces | 0.13s run time

Expected `emp_c` to equal `cv`.
Differences:
  `actual`: 0.054 0.079 0.001 0.066 0.000
`expected`: 0.054 0.080 0.000 0.064 0.000

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Model 2a: same-parameter cross-biomarker covariance as a JAGS extension of model.jags (strictly nests Chapter 1) + Chapter-1 comparison#245

Add Model 2a: same-parameter cross-biomarker covariance as a JAGS extension of model.jags (strictly nests Chapter 1) + Chapter-1 comparison#245
Kwan-Jenny wants to merge 1 commit into
mainfrom
chapter2-extend

Kwan-Jenny commented Jun 13, 2026

Uh oh!

claude Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kwan-Jenny commented Jun 13, 2026

Summary

What Model 2a is

Implementation (factor parameterization)

What's added

Chapter 1 vs Model 2a — what changes

Verification

How to try it

Scope

Uh oh!

claude Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Model 2a PR #245

1. Bug — run_mod_2a return value missing documented priors element

2. CLAUDE.md violation — base::merge() in compare_mod_2a.R

3. CLAUDE.md violation — stop() instead of cli::cli_abort()

4. Robustness — summarize_cross_2a() will crash confusingly on wrong input

5. Robustness — positional probs indexing in summarize_cross_2a()

6. Memory concern — fit_chapter1_lean() monitors unused per-subject nodes

7. Performance — per-draw string-indexed loops are slow for large MCMC

8. Minor — validation criterion in validate_recovery_2a()

JAGS model review

Tests

Uh oh!

codecov Bot commented Jun 13, 2026

❌ 1 Tests Failed:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jun 13, 2026 •

edited

Loading

1. Bug — `run_mod_2a` return value missing documented `priors` element

2. CLAUDE.md violation — `base::merge()` in `compare_mod_2a.R`

3. CLAUDE.md violation — `stop()` instead of `cli::cli_abort()`

4. Robustness — `summarize_cross_2a()` will crash confusingly on wrong input

5. Robustness — positional `probs` indexing in `summarize_cross_2a()`

6. Memory concern — `fit_chapter1_lean()` monitors unused per-subject nodes

8. Minor — validation criterion in `validate_recovery_2a()`