perf: C++ backend performance improvements (Phase 2) by sghng · Pull Request #13 · Wenchao-Ma/GDINA

sghng · 2026-04-17T14:23:12Z

Summary

Hoist the missing-data mask (mX0/mX1/mXMissing) out of the fast_GDINA_EM while-loop in Lik2.cpp. The data matrix never changes during estimation, so the mask and has_nan() check were being redundantly rebuilt every EM iteration.
Save a 1-based copy of mloc before decrement in fast_GDINA_EM, eliminating a J x L temporary matrix created via mloc+1 on every iteration.
Precompute per-item, per-category column-index vectors from mloc before the EM loop in fast_GDINA_EM. Previously arma::find(mloc.row(j)==k) was called twice per (j,k) pair per iteration; now each lookup is O(1).
Replace ones(J,N) * msdPost with repmat(sum(msdPost,0), J, 1) in LikNR, LikNR_LC, and fast_GDINA_EM. In the no-missing-data case every row of expN is identical, so the J x N ones-matrix multiply (O(J x N x L)) is wasteful; the replacement costs O(N x L + J x L).
Precompute per-item LCprob row slices into a std::vector<arma::mat> in Mord(). The Xi21 loop is O(nitem^3) and Xi22 is O(nitem^4); without caching, each iteration re-ran rows(find(...)) over the full category index vector.

Benchmark

Measured on the built-in sim30GDINA dataset (N=1000, J=30, K=5, L=32):

Function	Before	After	Speedup
`LikNR`	2.22 ms/call	1.90 ms/call	1.17x
`fast_GDINA_EM` (30 itr)	24.03 ms/call	20.50 ms/call	1.17x
`Mord`	95.75 ms/call	67.52 ms/call	1.42x

Gains scale with larger J, N, or K. Mord benefits most because the savings compound across its O(nitem^4) Xi22 loop.

Verification

All changes preserve exact numerical output. Only allocation and indexing patterns change, not any formula.
Test suite: 34 passing / 5 pre-existing failures (unexported internal functions called without ::: in test files, unrelated to this PR).
Package builds cleanly under R CMD INSTALL.

Three targeted changes to the single-group fast EM path: 1. Save a 1-based copy of mloc before the decrement so uP2() is called with the pre-saved matrix instead of creating a J×L temporary via `mloc+1` on every EM iteration. 2. Hoist the missing-data mask (mX0 / mX1 / mXMissing) and the `has_nan()` check out of the while-loop. The data matrix mX never changes during estimation, so the mask only needs to be built once. 3. Precompute per-item, per-category column-index vectors from mloc before the loop. Previously arma::find(mloc.row(j)==k) was called twice per (j,k) pair per EM iteration; now each lookup is O(1). Also replace the J×N ones-matrix multiply used to compute expN in the no-missing case with arma::repmat(arma::sum(msdPost,0),J,1), reducing cost from O(J·N·L) to O(N·L + J·L). Numerical results are identical; only wall-clock time changes.

In the no-missing-data branch of both LikNR and LikNR_LC, expN was computed as ones(J,N) * msdPost — an O(J·N·L) matrix multiplication whose result has identical rows equal to sum(msdPost, 0). Replace with arma::repmat(arma::sum(msdPost,0), J, 1) which costs O(N·L + J·L) and produces the same result exactly.

Mord() computes ordinal moment matrices (Xi11, Xi21, Xi22) used for fit statistics. The Xi21 loop is O(nitem^3) and Xi22 is O(nitem^4); both previously called LCprob.rows(arma::find(item_no==(x+1))) inside their innermost iterations, re-scanning the full item_no vector on every visit. Extract a one-time precomputation step that builds lc[i] for each item before any loop runs. All inner-loop variables (lci, lcj, lck, lcl) become const references into this cache, so no matrix data is copied and no find() call is repeated. Numerical output is identical; the saving grows as O(nitem^4) for large tests with many items.

Wenchao-Ma · 2026-05-06T20:43:34Z

Thanks!

sghng added 3 commits April 16, 2026 16:29

sghng force-pushed the phase-2-cpp-performance branch from 6e76a30 to 9baf9e6 Compare April 17, 2026 14:25

Wenchao-Ma merged commit 8229d7c into Wenchao-Ma:master May 6, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: C++ backend performance improvements (Phase 2)#13

perf: C++ backend performance improvements (Phase 2)#13
Wenchao-Ma merged 3 commits into
Wenchao-Ma:masterfrom
sghng:phase-2-cpp-performance

sghng commented Apr 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Wenchao-Ma commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sghng commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark

Verification

Uh oh!

Uh oh!

Wenchao-Ma commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sghng commented Apr 17, 2026 •

edited

Loading