feat(cubejs): smart-gen — YAML default, shorter names, smarter skips, no auto pre-aggs by acmeguy · Pull Request #53 · smartdataHQ/synmetrix

acmeguy · 2026-05-10T12:56:57Z

Summary

Several improvements to smart-gen so it produces noticeably cleaner, more useful Cube models — fewer rollups, shorter field names, more accurate field-skip rules.

Output format

Default to YAML (.yml). Falls back to JS only when a cube has a FILTER_PARAMS arrow callback we cannot translate. Identity arrows (v) => v are auto-transpiled to Python lambdas (lambda v: v) so YAML works for the nested-lookup-key path too.
Filename resolution: explicit file_name wins → reuse existing model's extension on re-run → otherwise .yml.

Cube naming from filters

Filtering by e.g. event = 'Stockout Ended' with no explicit name now derives stockout_ended.yml. Re-running same filter set updates the same file.

Field naming — shortest-unique resolver

Each field carries an ordered candidate list from leaf → fully-qualified. A new resolver picks the shortest non-clashing candidate; basic columns hold their leaf names and longer-candidate fields advance around them. FILTER_PARAMS refs auto-rebind when a lookup-key dim is renamed.

Before	After
`props_color`	`color`
`commerce_products_id`	`id`
`commerce_products_type`	`type`
basic `lat` + `location.lat`	`lat`, `location_lat`

Value counting fix for `Nested(...)` parallel arrays

Profiler's arrayColumnSql switched from uniq() (counts whole-array values) to uniqArray() + arrayFilter (counts elements). value_rows now requires at least one meaningful element — string non-empty, number non-null/non-zero, other non-null — so an Array(Nullable(Bool)) of all NULL no longer reads as 100% populated just because the parallel array has length 1. Numeric arrays also emit minArray/maxArray so the all-zero skip rule works for them.

Auto-skip cascade

cubeBuilder.processColumns now skips:

STRING/UUID with uniqueValues===1 and lc_values[0] empty / whitespace / '0' / 0
NUMBER with min === max === 0
BOOLEAN/Int8 with min === max OR uniqueValues === 1 (catches the customer_facing Int8 always-zero case)

Same cascade mirrored into the nested-AJ children path.

No auto pre-aggregations

buildRawCube and the nested-AJ path no longer emit daily_rollup / monthly_rollup. Heuristic rollups bloat CubeStore and surprise users with hidden refresh schedules. User-added pre-aggs in existing models are preserved through merge.

Pre-existing test fixes (was 4 baseline failures)

package.json test script: --experimental-test-module-mocks so mock.module() works on Node 22.12 (provisionFraiOS suite).
Legacy ARRAY JOIN path: surface user-supplied alias as a dimension on the flattened cube (collision-safe).
buildWhereClause test: assert current "no allowlist ⇒ all tables internal" semantics.
profileTable emitter test: assert current step names (init / initial_profile / profiling).

Test plan

node --experimental-test-module-mocks --test 'src/**/__tests__/*.test.js' → 511 / 511 passing
Smoke test in dev: profile a table with nested Nested(...) columns, generate; confirm shorter field names + .yml output
Smoke test: filter by event = 'X'; confirm derived cube/file name
Smoke test: nested-array lookup-key cube; confirm lambda v: v in YAML and queries still resolve
Verify existing .js smart-gen models keep their .js extension on re-run

Companion PR

Frontend mirror of the skip rules in step-2 auto-select: smartdataHQ/client-v2#feat/smart-gen-field-selection

🤖 Generated with Claude Code

… no auto pre-aggs Cube generation now produces noticeably cleaner models with fewer surprises. Output format - Default to YAML (.yml). Falls back to JS only when a cube contains a FILTER_PARAMS arrow callback we cannot translate. Identity arrows ((v) => v) are auto-transpiled to Python lambda (lambda v: v) so YAML works for the nested-lookup-key path too. - Filename: explicit file_name wins; otherwise reuse the existing model's extension when re-running; otherwise .yml. Cube naming from filters - When generating with a flat filter like event = 'Stockout Ended' and no explicit cube/file name, derive the cube + file name from the filter values (stockout_ended.yml). Re-running the same filter set updates the same file. Field naming (shortest-unique resolver) - Each field carries an ordered candidate list from leaf to fully- qualified. A new resolver picks the shortest candidate that does not collide. Examples: props.color (map key) → color (was props_color) commerce.products.id → id (was commerce_products_id) commerce.products.entry_type → type (was commerce_products_type) - Nested-AJ flattened cubes route around already-claimed names too. - FILTER_PARAMS refs auto-rebind when a lookup-key dim is renamed. Value counting fixes for nested / parallel-array Nested(...) columns - profiler.arrayColumnSql now uses uniqArray + arrayFilter so distinct- count reflects element-level cardinality, not the count of distinct whole arrays. value_rows requires at least one meaningful element (string: non-null + non-empty; number: non-null + non-zero; other: non-null) so an Array(Nullable(Bool)) of all-NULL no longer reads as "100% populated" just because it inherits the parallel array length. - Numeric arrays also emit minArray/maxArray for the all-zero skip rule. Auto-skip cascade in cubeBuilder.processColumns - STRING/UUID with uniqueValues===1 and lc_values being '', whitespace, '0' or 0 → skip - NUMBER with min===max===0 → skip - BOOLEAN/Int8 with min===max OR uniqueValues===1 → skip (catches the customer_facing Int8 'always 0' case) - Mirrored into the nested-AJ children path. No auto pre-aggregations - buildRawCube and the nested-AJ path no longer emit daily_rollup / monthly_rollup. Heuristic rollups bloat CubeStore with unused materializations and surprise users with hidden refresh schedules. Users add pre-aggs explicitly when they understand query patterns. User-added pre-aggs in existing models are preserved through merge. Pre-existing test cleanup - package.json test: add --experimental-test-module-mocks so mock.module() works on Node 22.12 (provisionFraiOS suite). - Legacy ARRAY JOIN path: surface the user-supplied alias as a dimension on the flattened cube (collision-safe). - buildWhereClause test: assert current 'no allowlist ⇒ all tables internal' semantics. - profileTable emitter test: assert current step names ('init', 'initial_profile', 'profiling'). Tests: 511 / 511 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hamdi-ibrahim approved these changes May 10, 2026

View reviewed changes

acmeguy force-pushed the feat/cubejs-smart-gen-improvements branch from e381466 to 601e32b Compare May 10, 2026 13:17

acmeguy merged commit 3a8784d into main May 10, 2026
3 checks passed

acmeguy deleted the feat/cubejs-smart-gen-improvements branch May 10, 2026 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cubejs): smart-gen — YAML default, shorter names, smarter skips, no auto pre-aggs#53

feat(cubejs): smart-gen — YAML default, shorter names, smarter skips, no auto pre-aggs#53
acmeguy merged 1 commit into
mainfrom
feat/cubejs-smart-gen-improvements

acmeguy commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

acmeguy commented May 10, 2026

Summary

Output format

Cube naming from filters

Field naming — shortest-unique resolver

Value counting fix for Nested(...) parallel arrays

Auto-skip cascade

No auto pre-aggregations

Pre-existing test fixes (was 4 baseline failures)

Test plan

Companion PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Value counting fix for `Nested(...)` parallel arrays