Skip to content

Speed up majority/threshold consensus to O(kn)#272

Merged
ms609 merged 3 commits into
mainfrom
claude/objective-cerf-9054f5
Jun 1, 2026
Merged

Speed up majority/threshold consensus to O(kn)#272
ms609 merged 3 commits into
mainfrom
claude/objective-cerf-9054f5

Conversation

@ms609

@ms609 ms609 commented Jun 1, 2026

Copy link
Copy Markdown
Owner

Compute majority-rule and threshold Consensus() (and SplitFrequency()) by counting every split's frequency in a single pass and keeping those at or above the threshold, replacing the O(k^2 n) multi-reference comparison. Majority splits each occur in more than half the trees, so they are pairwise (hence globally) compatible and form a valid tree directly. Time is linear in the number of trees (~25x faster at k = 1600; scaling exponent 0.97 vs 1.91).

The count defaults to 128-bit hashing (O(kn), exact with overwhelming probability); exact = TRUE selects a slower deterministic bitmask count. Consensus() and SplitFrequency() share one counting core. Strict consensus (p = 1) keeps its already-linear single-reference path.

Cites Jansson, Shen & Sung (2016); implementation informed by their FACT package (whose majority-consensus code proved unusable on edge cases, so no FACT source is incorporated).

ms609 and others added 3 commits May 31, 2026 19:37
Compute majority-rule and threshold `Consensus()` (and `SplitFrequency()`)
by counting every split's frequency in a single pass and keeping those at or
above the threshold, replacing the O(k^2 n) multi-reference comparison.
Majority splits each occur in more than half the trees, so they are pairwise
(hence globally) compatible and form a valid tree directly. Time is linear in
the number of trees (~25x faster at k = 1600; scaling exponent 0.97 vs 1.91).

The count defaults to 128-bit hashing (O(kn), exact with overwhelming
probability); `exact = TRUE` selects a slower deterministic bitmask count.
`Consensus()` and `SplitFrequency()` share one counting core. Strict consensus
(p = 1) keeps its already-linear single-reference path.

Cites Jansson, Shen & Sung (2016); implementation informed by their FACT
package (whose majority-consensus code proved unusable on edge cases, so no
FACT source is incorporated).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

This comment was marked as outdated.

@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.66667% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.19%. Comparing base (f54ca85) to head (026b25c).

Files with missing lines Patch % Lines
src/consensus.cpp 96.62% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #272      +/-   ##
==========================================
+ Coverage   96.12%   96.19%   +0.07%     
==========================================
  Files          81       81              
  Lines        6032     6039       +7     
==========================================
+ Hits         5798     5809      +11     
+ Misses        234      230       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Performance benchmark results

Call Status Change Time (ms)
as.Splits(bigTrees) ⚪ NSD -7.88% 25.1 →
26.7, 27.5
as.Splits(someTrees) ⚪ NSD -1.53% 11.7 →
11.7, 11.9
Consensus(forest1k.888, check = FALSE) ⚪ NSD -0.79% 105 →
106, 107
Consensus(forest201.80, check = FALSE) ⚪ NSD -1.84% 4.29 →
4.14, 4.63
Consensus(forest21.260, 0.5, FALSE) ⚪ NSD 1.89% 1.27 →
1.23, 1.26
Consensus(forest21.260) ⚪ NSD 1.86% 1.29 →
1.25, 1.28
Consensus(forestMaj, 0.5, FALSE) ⚪ NSD -5.1% 3.16 →
3.19, 3.4
DropTip(tr2000, 5) ⚪ NSD 1.76% 17.5 →
16.8, 17.6
DropTip(tr80, 5) ⚪ NSD -0.24% 0.106 →
0.105, 0.107
DropTip(unlen2k, 5) ⚪ NSD 25.82% 0.285 →
0.211, 0.212
DropTip(unlen80, 5) ⚪ NSD -0.52% 0.0409 →
0.0405, 0.0415
lapply(bigSplits, as.phylo) ⚪ NSD 0.02% 30 →
30, 30
lapply(someSplits, as.phylo) ⚪ NSD -0.44% 14.2 →
14.2, 14.3
PathLengths(tr2000, full = TRUE) ⚪ NSD 2.91% 17.5 →
16.4, 18.4
PathLengths(tr80, full = TRUE) ⚪ NSD 4.86% 0.112 →
0.105, 0.108
PathLengths(tr80Unif, full = TRUE) ⚪ NSD 3.74% 0.113 →
0.107, 0.111
RootTree(tr2000, 5) ⚪ NSD 0.88% 0.406 →
0.408, 0.396
RootTree(tr80, c("t3", "t36")) ⚪ NSD -1.79% 0.0721 →
0.0731, 0.0737
RootTree(tr80, "t3") ⚪ NSD -0.37% 0.0514 →
0.0515, 0.0516
RootTree(tr80, "t30") ⚪ NSD -1.16% 0.0516 →
0.0516, 0.0527
RootTree(unlen2k, 5) ⚪ NSD -2.71% 0.337 →
0.335, 0.351
RootTree(unlen80, c("t3", "t36")) ⚪ NSD -3.11% 0.0656 →
0.0664, 0.0688
RootTree(unlen80, "t3") ⚪ NSD -4.03% 0.0438 →
0.0445, 0.0465
RootTree(unlen80, "t30") ⚪ NSD -3.78% 0.0443 →
0.0448, 0.0468
TreeDist::RobinsonFoulds(forest201.80) ⚪ NSD -2.81% 16.4 →
17, 16.7
TreeDist::RobinsonFoulds(forest21.888) ⚪ NSD -1.48% 3.48 →
3.46, 3.55
TreeTools:::path_lengths(tr80$edge, tr80$edge.length, FALSE) ⚪ NSD 5.39% 0.103 →
0.0957, 0.0988
TreeTools:::postorder_order(bal40) ⚪ NSD 2.32% 0.00172 →
0.00168, 0.00167
TreeTools:::postorder_order(bal40k) ⚪ NSD -0.73% 0.544 →
0.546, 0.551
TreeTools:::postorder_order(dbal40) ⚪ NSD 2.3% 0.00178 →
0.00175, 0.00173
TreeTools:::postorder_order(dbal40k) ⚪ NSD -1.18% 2.13 →
2.15, 2.15
TreeTools:::postorder_order(dpec40) ⚪ NSD 1.16% 0.00259 →
0.00256, 0.00256
TreeTools:::postorder_order(dpec40k) ⚪ NSD 0.12% 3310 →
3300, 3300
TreeTools:::postorder_order(drnd80) ⚪ NSD 1.48% 0.00412 →
0.00407, 0.00405
TreeTools:::postorder_order(nbal40) ⚪ NSD 1.46% 0.00211 →
0.00208, 0.00207
TreeTools:::postorder_order(nbal40k) ⚪ NSD -1.76% 2.18 →
2.24, 2.22
TreeTools:::postorder_order(npec40) ⚪ NSD 0.38% 0.00289 →
0.00289, 0.00286
TreeTools:::postorder_order(npec40k) ⚪ NSD 0.08% 3320 →
3320, 3320
TreeTools:::postorder_order(nrnd80) ⚪ NSD -0.21% 0.00465 →
0.00464, 0.00468
TreeTools:::postorder_order(pec40) ⚪ NSD 0.66% 0.00168 →
0.00167, 0.00166
TreeTools:::postorder_order(pec40k) ⚪ NSD -1.9% 0.433 →
0.549, 0.434
TreeTools:::postorder_order(rnd80) ⚪ NSD 3.17% 0.0022 →
0.00215, 0.00211

@ms609 ms609 merged commit 3a1c6d0 into main Jun 1, 2026
36 checks passed
@ms609 ms609 deleted the claude/objective-cerf-9054f5 branch June 1, 2026 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant