Skip to content

Add Model 2a: same-parameter cross-biomarker covariance as a JAGS extension of model.jags (strictly nests Chapter 1) + Chapter-1 comparison#245

Open
Kwan-Jenny wants to merge 1 commit into
mainfrom
chapter2-extend
Open

Add Model 2a: same-parameter cross-biomarker covariance as a JAGS extension of model.jags (strictly nests Chapter 1) + Chapter-1 comparison#245
Kwan-Jenny wants to merge 1 commit into
mainfrom
chapter2-extend

Conversation

@Kwan-Jenny

Copy link
Copy Markdown
Collaborator

Summary

Adds Model 2a, an additive extension of model.jags that introduces same-parameter cross-biomarker covariance into the subject-level antibody kinetic parameters while leaving every Chapter 1 within-biomarker block unchanged. Chapter 1 is recovered exactly when the new covariances are zero, so Model 2a strictly nests the current model. It also adds compare_mod_2a(), a head-to-head of Chapter 1 vs Model 2a on the same data.

Nothing in the existing Chapter 1 pipeline is modified — model.jags, prep_data(), prep_priors(), run_mod() are untouched; all code is new.

What Model 2a is

Chapter 1 covariance is biomarker-separated (cross-biomarker = 0):

Sigma = [ Sigma_G   0     ]
        [   0     Sigma_A  ]

Model 2a keeps Sigma_G, Sigma_A and adds a diagonal cross-biomarker
block C = diag(c_1..c_P) (same-parameter only):

Sigma = [ Sigma_G   C      ]      C = diag(c_p)
        [   C^T    Sigma_A  ]

c_p is the IgG~IgA covariance for the same kinetic parameter p; cross-parameter cross-biomarker terms stay 0; C = 0 reproduces Chapter 1 (35 covariance parameters vs 30).

Implementation (factor parameterization)

A Wishart precision cannot pin a sparse covariance's zero-pattern, so the exact pattern is generated with a shared latent factor per kinetic parameter:

par[i,k,p] = mu.par[k,p] + w[i,k,p] + lambda[k,p] * f[i,p]
w[i,k,1:P] ~ dmnorm(0, prec.par[k,,])   # Chapter 1 Wishart block, unchanged
f[i,p]     ~ dnorm(0, 1)                 # shared factor, one per parameter

For two biomarkers this gives c_p = lambda[1,p] * lambda[2,p] (derived from the loadings — this is a model quantity, independent of the sampler; it is the same c_p whether implemented in JAGS or Stan), cross-parameter cross-biomarker
covariance 0, and — with all lambda = 0 — exactly model.jags. The first biomarker's loadings are constrained > 0 for identifiability; the sign of c_p stays free via lambda[2,p]. The model is general in n_antigen_isos: with 2 biomarkers it is exactly Model 2a; with more, the single per-parameter factor couples all of them (rank-1, same-parameter).

inst/extdata/model_2a.jags is identical to model.jags except this par construction and the loading priors.

What's added

  • inst/extdata/model_2a.jags — the extended model.
  • Priors: add_factor_priors(), prep_priors_2a() (reuse Chapter 1 priors).
  • Fitting: jags_data_2a(), make_inits_2a(), run_mod_2a() — a lean wrapper that reuses prep_data() unchanged and returns the MCMC plus a tidy cross-biomarker covariance/correlation summary (no sr_model post-processing).
  • Post-processing (small pure helpers): cross_cov_from_loadings(), marginal_var_2a(), cross_cor_from_draw_2a(), summarize_cross_2a(), shared-internal jags_node_utils_2a.R.
  • Simulation + validation: build_sigma_2a(), sim_params_2a(), sim_case_data_2a(), validate_recovery_2a(), validate_nesting_2a().
  • Comparison: fit_chapter1_lean() (same posterior as run_mod()), summarize_curve_params_2a(), compare_mod_2a().

No new dependencies (all in Imports already); no new data (uses nepal_sees).

Chapter 1 vs Model 2a — what changes

compare_mod_2a() fits both models on the same data and reports:

  • What stays the same — the shared population means mu.par and the within-biomarker variances. Because Model 2a strictly nests Chapter 1, these should agree within MCMC error; a large difference would signal a problem, not
    an improvement.
  • What Model 2a adds — the cross-biomarker covariance c_p / correlation rho_p per kinetic parameter, which Chapter 1 cannot represent (it is structurally 0 there).
  • Where improvement would show — the means are unchanged by design; the value of Model 2a lives in the covariance and in covariance-dependent quantities (credible intervals, joint posterior prediction, DIC/WAIC). A rigorous "is it better" evaluation (WAIC/LOO; downstream time-since-infection / seroincidence MAE/RMSE/coverage) is a planned follow-up, not this PR. compare_mod_2a(dic = TRUE) offers a best-effort DIC as a first signal.

Comparison results on nepal_sees (full-length MCMC, run locally):

(to be filled in from the local comparison run: convergence (max R-hat), the shared-parameter table with max |diff|, the cross-biomarker c_p/rho_p table and which CIs exclude 0, and the mu.par CI-width summary)

Verification

  • CI runs the pure-function unit tests + a tiny-MCMC smoke test that run_mod_2a() / compare_mod_2a() execute and return the expected structure (matching the existing test-run_mod.R style; CI installs JAGS).
  • Recovery of a known cross-biomarker correlation from simulated data, the no-false-positive check on independent data, and the Chapter 1 vs Model 2a comparison on nepal_sees, are run locally at full MCMC length (validate_recovery_2a(), validate_nesting_2a(), compare_mod_2a()); results pasted above.

How to try it

data(nepal_sees)              # already case_data; HlyE_IgG, HlyE_IgA
fit <- run_mod_2a(nepal_sees)
fit$cross                 # IgG ~ IgA covariance & correlation per parameter
cmp <- compare_mod_2a(nepal_sees)
cmp$shared; cmp$added     # shared params (Ch1 vs 2a) + the addition

Scope

This PR is the "add the model + show it nests Chapter 1 + compare to Chapter 1" increment. Quantifying improvement (WAIC/LOO, joint posterior-predictive comparison, seroincidence coverage/width) is a follow-up, since WAIC needs a
model-file change and the seroincidence step needs the serocalculator integration.

…AGS extension of model.jags

Add Model 2a, an additive extension of model.jags that introduces
same-parameter cross-biomarker covariance via a shared per-parameter latent
factor, leaving every Chapter 1 within-biomarker block unchanged. Chapter 1 is
recovered exactly when the loadings are zero (strict nesting).

- model: inst/extdata/model_2a.jags
- fit: run_mod_2a() with jags_data_2a, make_inits_2a, prep_priors_2a,
  add_factor_priors; reuses prep_data() unchanged
- cross-biomarker summaries: summarize_cross_2a, cross_cov_from_loadings,
  marginal_var_2a, cross_cor_from_draw_2a (+ internal jags_node_utils)
- comparison vs Chapter 1: compare_mod_2a, fit_chapter1_lean,
  summarize_curve_params_2a
- simulation/validation: sim_params_2a, sim_case_data_2a, build_sigma_2a,
  validate_recovery_2a, validate_nesting_2a
- tests: 7 pure-function unit tests + 1 JAGS-gated smoke test
- example: run_mod_2a-examples.R (one fit on nepal_sees)

No new dependencies; no Chapter 1 code modified.
@claude

claude Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Claude finished @Kwan-Jenny's task in 6m 44s —— View job


Code Review: Model 2a PR #245

Tasks:

  • Gather context and understand the PR
  • Review JAGS model file (model_2a.jags)
  • Review core fitting functions (run_mod_2a.R, jags_data_2a.R, make_inits_2a.R)
  • Review prior functions (prep_priors_2a.R, add_factor_priors.R)
  • Review post-processing helpers (cross_cov_from_loadings.R, marginal_var_2a.R, cross_cor_from_draw_2a.R, summarize_cross_2a.R)
  • Review simulation/validation functions
  • Review comparison function (compare_mod_2a.R, fit_chapter1_lean.R)
  • Review tests
  • Check style, deprecated calls, and tidyverse idioms
  • Compile and post review

The model structure, factor parameterization, and nesting argument are mathematically sound. The code is well-decomposed and the test suite is solid. I found one documentation bug, two CLAUDE.md style violations, and several robustness/performance concerns.


1. Bug — run_mod_2a return value missing documented priors element

R/run_mod_2a.R:37 documents:

#'   - `priors`: the prior list used;

but the structure() call at lines 81–90 returns prec_lambda (a scalar) instead, with no priors element at all:

structure(
  list(
    mcmc = mcmc,
    cross = cross,
    antigens = antigens,
    prec_lambda = prec_lambda,   # ← priors element is missing
    runjags = fit
  ),
  class = c("model_2a_fit", "list")
)

fit$priors will silently return NULL for any caller following the docs. Either store jags_input (or attr(jags_input, "antigens") is already stored, so save the priors sub-list before c()-ing them) and add it, or drop the priors bullet from @returns and replace with prec_lambda.

Fix this →


2. CLAUDE.md violation — base::merge() in compare_mod_2a.R

R/compare_mod_2a.R:60:

shared <- merge(s1, s2, by = c("biomarker", "param"),
                suffixes = c("_ch1", "_2a"))

The lab style guide (CLAUDE.md) explicitly flags base::merge() in favour of dplyr::left_join() with by = join_by(...). Replace with:

shared <- dplyr::left_join(s1, s2, by = join_by(biomarker, param),
                           suffix = c("_ch1", "_2a"))

Fix this →


3. CLAUDE.md violation — stop() instead of cli::cli_abort()

The codebase uses cli::cli_abort() for all input validation errors (see prep_priors.R, as_case_data.R). Four new files use stop():

  • R/add_factor_priors.R:22–23
  • R/build_sigma_2a.R:26–28, 42–44
  • R/prep_priors_2a.R:22–24
  • R/cross_cov_from_loadings.R:16–18

Replace each stop(...) with cli::cli_abort(...) using the same interpolation style ({.arg}, {.val}, etc.) as the rest of the package. For example:

# before
stop("`prec_lambda` must be a single positive number.")

# after
cli::cli_abort("{.arg prec_lambda} must be a single positive number.")

Fix this →


4. Robustness — summarize_cross_2a() will crash confusingly on wrong input

R/summarize_cross_2a.R:33–34:

dims <- jags_node_dims(colnames(draws))
k <- dims[["lambda"]][1]   # NULL[1] if lambda not monitored

If called on an mcmc.list that doesn't contain lambda or prec.par (e.g. accidentally on a Chapter 1 fit), dims[["lambda"]] is NULL, NULL[1] returns NULL, and the error message downstream is cryptic. Add an upfront guard:

if (is.null(dims[["lambda"]]) || is.null(dims[["prec.par"]])) {
  cli::cli_abort(c(
    "{.fn summarize_cross_2a} requires monitored {.code lambda} and {.code prec.par} nodes.",
    "i" = "Use the mcmc.list from {.fn run_mod_2a}, not a Chapter 1 fit."
  ))
}

5. Robustness — positional probs indexing in summarize_cross_2a()

R/summarize_cross_2a.R:63–66:

cov_q <- apply(cov_draws, 2, stats::quantile, probs = probs, na.rm = TRUE)
...
cov_med = cov_q[2, ], cov_lo = cov_q[1, ], cov_hi = cov_q[3, ],

This hard-codes positions 1/2/3, so any user-supplied probs of a different length silently breaks the indexing. Use named access:

cov_med = cov_q["50%", ], cov_lo = cov_q[1, ], cov_hi = cov_q[3, ],

or carry probs through:

data.frame(
  cov_med = cov_q[2L, ], cov_lo = cov_q[1L, ], cov_hi = cov_q[length(probs), ],
  ...
)

Either way, a stopifnot(length(probs) == 3) guard or a note in @details would prevent silent errors.


6. Memory concern — fit_chapter1_lean() monitors unused per-subject nodes

R/fit_chapter1_lean.R:42:

monitor = c("mu.par", "prec.par", "y0", "y1", "t1", "alpha", "shape"),

compare_mod_2a() only calls summarize_curve_params_2a(m1[["mcmc"]]), which uses only mu.par and prec.par. The five per-subject arrays (y0, y1, t1, alpha, shape) — each of shape nsubj × n_antigen_isos × nmc × nchain — consume significant memory for large datasets and are never read. Consider restricting to c("mu.par", "prec.par") in the context of compare_mod_2a(), or document the intent if these are needed for other callers.


7. Performance — per-draw string-indexed loops are slow for large MCMC

R/summarize_cross_2a.R:44–50 and R/summarize_curve_params_2a.R:39–48 iterate over every draw and do:

  1. get_node_matrix() — O(K×P) named-vector lookups per draw via draw_vec[["lambda[i,j]"]]
  2. solve(prec_par_k) — a 5×5 matrix inversion per draw per biomarker

For 4 chains × 1000 draws = 4000 iterations this is ~800,000 named lookups and ~8000 solve() calls. You can pre-vectorize by extracting all lambda and prec.par columns at once before the loop, then reshaping via array(). The functional tests will still pass; this is purely a speed concern for production-length runs.


8. Minor — validation criterion in validate_recovery_2a()

R/validate_recovery_2a.R:43–49: the "recovered" verdict fires when the 95% CI excludes zero — which only tests that the model isn't finding a null where there should be a signal. A complete recovery check would additionally verify that the CI covers the true value (out$cor_lo <= out$true_rho & out$cor_hi >= out$true_rho). The current criterion is fine as a coarse sanity check but worth documenting as intentionally asymmetric, or upgrading to the coverage criterion.


JAGS model review

The model file is clean. The factor parameterization correctly gives:

  • Cov(par[i,1,p], par[i,2,p]) = lambda[1,p]*lambda[2,p] (= c_p)
  • Cross-parameter cross-biomarker covariance = 0 (different factors)
  • lambda = 0 ↔ Chapter 1 (strict nesting)

The identifiability constraint lambda[1,p] ~ dnorm(0, prec.lambda) T(0,) pins the first biomarker's loading positive, leaving the sign of c_p free via lambda[2,p]. This is the standard fix for the sign-flip non-identifiability in a one-factor model. ✓

The dmnorm(zero_p[], prec.par[cur_antigen_iso, , ]) slice syntax is carried over from model.jags (where it is tested), so it is safe to use here. ✓


Tests

The test suite is well-structured: pure-function unit tests with known analytic results (no JAGS needed), plus JAGS smoke tests behind skip_if_no_jags(). The make_fixed_draws helper in test-summarize_cross_2a.R is a clean pattern for testing MCMC summarization functions without sampling. The CI-only scope (smoke test) versus local-only scope (full recovery, comparison) is clearly documented.


@codecov

codecov Bot commented Jun 13, 2026

Copy link
Copy Markdown

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
66 1 65 0
View the top 1 failed test(s) by shortest run time
sim_params_2a::sim_params_2a_sample_covariance_recovers_the_truth_large_n_
Stack Traces | 0.13s run time
Expected `emp_c` to equal `cv`.
Differences:
  `actual`: 0.054 0.079 0.001 0.066 0.000
`expected`: 0.054 0.080 0.000 0.064 0.000

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant