Skip to content

perf: C++ backend performance improvements (Phase 2)#13

Merged
Wenchao-Ma merged 3 commits into
Wenchao-Ma:masterfrom
sghng:phase-2-cpp-performance
May 6, 2026
Merged

perf: C++ backend performance improvements (Phase 2)#13
Wenchao-Ma merged 3 commits into
Wenchao-Ma:masterfrom
sghng:phase-2-cpp-performance

Conversation

@sghng
Copy link
Copy Markdown
Contributor

@sghng sghng commented Apr 17, 2026

Summary

  • Hoist the missing-data mask (mX0/mX1/mXMissing) out of the fast_GDINA_EM while-loop in Lik2.cpp. The data matrix never changes during estimation, so the mask and has_nan() check were being redundantly rebuilt every EM iteration.
  • Save a 1-based copy of mloc before decrement in fast_GDINA_EM, eliminating a J x L temporary matrix created via mloc+1 on every iteration.
  • Precompute per-item, per-category column-index vectors from mloc before the EM loop in fast_GDINA_EM. Previously arma::find(mloc.row(j)==k) was called twice per (j,k) pair per iteration; now each lookup is O(1).
  • Replace ones(J,N) * msdPost with repmat(sum(msdPost,0), J, 1) in LikNR, LikNR_LC, and fast_GDINA_EM. In the no-missing-data case every row of expN is identical, so the J x N ones-matrix multiply (O(J x N x L)) is wasteful; the replacement costs O(N x L + J x L).
  • Precompute per-item LCprob row slices into a std::vector<arma::mat> in Mord(). The Xi21 loop is O(nitem^3) and Xi22 is O(nitem^4); without caching, each iteration re-ran rows(find(...)) over the full category index vector.

Benchmark

Measured on the built-in sim30GDINA dataset (N=1000, J=30, K=5, L=32):

Function Before After Speedup
LikNR 2.22 ms/call 1.90 ms/call 1.17x
fast_GDINA_EM (30 itr) 24.03 ms/call 20.50 ms/call 1.17x
Mord 95.75 ms/call 67.52 ms/call 1.42x

Gains scale with larger J, N, or K. Mord benefits most because the savings compound across its O(nitem^4) Xi22 loop.

Verification

  • All changes preserve exact numerical output. Only allocation and indexing patterns change, not any formula.
  • Test suite: 34 passing / 5 pre-existing failures (unexported internal functions called without ::: in test files, unrelated to this PR).
  • Package builds cleanly under R CMD INSTALL.

sghng added 3 commits April 16, 2026 16:29
Three targeted changes to the single-group fast EM path:

1. Save a 1-based copy of mloc before the decrement so uP2() is called
   with the pre-saved matrix instead of creating a J×L temporary via
   `mloc+1` on every EM iteration.

2. Hoist the missing-data mask (mX0 / mX1 / mXMissing) and the
   `has_nan()` check out of the while-loop.  The data matrix mX never
   changes during estimation, so the mask only needs to be built once.

3. Precompute per-item, per-category column-index vectors from mloc
   before the loop.  Previously arma::find(mloc.row(j)==k) was called
   twice per (j,k) pair per EM iteration; now each lookup is O(1).
   Also replace the J×N ones-matrix multiply used to compute expN in
   the no-missing case with arma::repmat(arma::sum(msdPost,0),J,1),
   reducing cost from O(J·N·L) to O(N·L + J·L).

Numerical results are identical; only wall-clock time changes.
In the no-missing-data branch of both LikNR and LikNR_LC, expN was
computed as ones(J,N) * msdPost — an O(J·N·L) matrix multiplication
whose result has identical rows equal to sum(msdPost, 0).

Replace with arma::repmat(arma::sum(msdPost,0), J, 1) which costs
O(N·L + J·L) and produces the same result exactly.
Mord() computes ordinal moment matrices (Xi11, Xi21, Xi22) used for
fit statistics.  The Xi21 loop is O(nitem^3) and Xi22 is O(nitem^4);
both previously called LCprob.rows(arma::find(item_no==(x+1))) inside
their innermost iterations, re-scanning the full item_no vector on
every visit.

Extract a one-time precomputation step that builds lc[i] for each item
before any loop runs.  All inner-loop variables (lci, lcj, lck, lcl)
become const references into this cache, so no matrix data is copied
and no find() call is repeated.

Numerical output is identical; the saving grows as O(nitem^4) for large
tests with many items.
@sghng sghng force-pushed the phase-2-cpp-performance branch from 6e76a30 to 9baf9e6 Compare April 17, 2026 14:25
@Wenchao-Ma Wenchao-Ma merged commit 8229d7c into Wenchao-Ma:master May 6, 2026
5 checks passed
@Wenchao-Ma
Copy link
Copy Markdown
Owner

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants