Skip to content

Perf/clickbench improvements#220

Merged
singaraiona merged 3 commits into
masterfrom
perf/clickbench-improvements
Jun 2, 2026
Merged

Perf/clickbench improvements#220
singaraiona merged 3 commits into
masterfrom
perf/clickbench-improvements

Conversation

@singaraiona
Copy link
Copy Markdown
Collaborator

No description provided.

ser-vasilich and others added 3 commits May 30, 2026 15:12
The fused multi-key path already accepts a NULL predicate; only the
planner gate required where_expr.  Allow no-WHERE when n_keys >= 2 AND
count-only.  Single-key no-WHERE and multi-agg over near-unique
composites stay on exec_group's radix — fusing them regresses at very
high cardinality.

ClickBench 10M:
  q16  744 → 154 ms
  total 8.0 → 7.3 s

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mk_compile packs the composite by-key into a 16-byte slot.  An I64
column for minute() (values 0..59) blows the budget on q18's
{UserID, minute, SearchPhrase} composite (~20 bytes) and the query
drops to exec_group.

After eval'ing a computed by-val whose AST head is minute / hh / ss /
dd / dow / mm / doy / yyyy, downcast the I64 result to I16 before
adding it to the table.  I16 is the smallest type that holds every
output range (year up to 32767, doy up to 366) and still prints as
decimal (U8 prints hex, unreadable for a minute value).

Skipped when the source column has nulls.

ClickBench 10M:
  q18  1748 → 449 ms
  total 6.6 → 5.2 s

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
perf(query): fuse no-WHERE multi-key count-only group-by
@singaraiona singaraiona merged commit 9505a86 into master Jun 2, 2026
ser-vasilich added a commit that referenced this pull request Jun 3, 2026
…verse

Following the rebase onto PR #218/#219/#220 master (new attribute
system, asof fast-path, RAY_IDX_PART, HLL routing, MG top-K for
TIMESTAMP), targeted the still-large branch-coverage gaps:

  query.c         62.54 → 63.54% (+1.00pp, -107 missed)
  fused_group.c   65.69 → 67.26% (+1.57pp, -55 missed)
  group.c         67.50 → 67.95% (+0.45pp, -39 missed)
  traverse.c      60.16 → 60.68% (+0.52pp, -12 missed)
  eval.c          60.73 → 60.87% (+0.14pp, -4 missed)

Additions:
- query_branch_cov.rfl    +670 lines (§19-§63: 2-stage count-distinct
  rewrite for I64/I32/TIMESTAMP, match_group_desc_count_take per-op,
  wide-key fused, asof wrapper, narrow_known_small_extract, HLL
  inner-type cascade, prefilter computed-by + WHERE + desc:count)
- fused_group_branch_cov.rfl  +190 lines + 1156 C lines (chunk_zone
  fast path EQ/GT/LT/NE/LE/GE, IN/EQ masked dispatch, BOOL/SYM key
  topk, U8/I16 hash-eq kbits, strlen agg input)
- group_branch_cov.rfl    +488 lines §21-§38 (maxmin/pearson rowform
  with null x/y/k, per-partition STDDEV/VAR/FIRST/LAST, multi-key
  heavy-hitter, v2 multi-key TIMESTAMP+I64 / DATE+TIME+I64,
  count_distinct STR/GUID/LIST, accum_from_entry skip path)
- eval_branch_cov.rfl     +300 lines §9-§30 (OP_STOREGLOBAL error,
  lambda dispatch errors, try handler dispatch, try_sum_affine bail
  paths, nested-try depth, raise vec/dict/table payload survival)
- test_traverse.c         +760 lines / 18 C tests (A* relax fail,
  cluster_coeff parallel/asym, SIP dir2 neg/oob src, betweenness/
  closeness sample-clamp)
- traverse_branch_cov.rfl +311 lines (bidirectional cliques, parallel
  edges, K4, disjoint comps, diamond/2-cycle/back-edge fixtures)

Suite: 3231 of 3233 pass under ASan+UBSan. Unreachable branches
documented inline per file (OOM-injection, VM trap stack, restricted-
mode, MAPCOMMON/PARTED I/O-only, CSR invariants).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants