feat(cubejs): smart-gen — YAML default, shorter names, smarter skips, no auto pre-aggs#53
Merged
Merged
Conversation
hamdi-ibrahim
approved these changes
May 10, 2026
… no auto pre-aggs
Cube generation now produces noticeably cleaner models with fewer surprises.
Output format
- Default to YAML (.yml). Falls back to JS only when a cube contains a
FILTER_PARAMS arrow callback we cannot translate. Identity arrows
((v) => v) are auto-transpiled to Python lambda (lambda v: v) so YAML
works for the nested-lookup-key path too.
- Filename: explicit file_name wins; otherwise reuse the existing
model's extension when re-running; otherwise .yml.
Cube naming from filters
- When generating with a flat filter like event = 'Stockout Ended' and
no explicit cube/file name, derive the cube + file name from the
filter values (stockout_ended.yml). Re-running the same filter set
updates the same file.
Field naming (shortest-unique resolver)
- Each field carries an ordered candidate list from leaf to fully-
qualified. A new resolver picks the shortest candidate that does not
collide. Examples:
props.color (map key) → color (was props_color)
commerce.products.id → id (was commerce_products_id)
commerce.products.entry_type → type (was commerce_products_type)
- Nested-AJ flattened cubes route around already-claimed names too.
- FILTER_PARAMS refs auto-rebind when a lookup-key dim is renamed.
Value counting fixes for nested / parallel-array Nested(...) columns
- profiler.arrayColumnSql now uses uniqArray + arrayFilter so distinct-
count reflects element-level cardinality, not the count of distinct
whole arrays. value_rows requires at least one meaningful element
(string: non-null + non-empty; number: non-null + non-zero; other:
non-null) so an Array(Nullable(Bool)) of all-NULL no longer reads as
"100% populated" just because it inherits the parallel array length.
- Numeric arrays also emit minArray/maxArray for the all-zero skip rule.
Auto-skip cascade in cubeBuilder.processColumns
- STRING/UUID with uniqueValues===1 and lc_values being '', whitespace,
'0' or 0 → skip
- NUMBER with min===max===0 → skip
- BOOLEAN/Int8 with min===max OR uniqueValues===1 → skip
(catches the customer_facing Int8 'always 0' case)
- Mirrored into the nested-AJ children path.
No auto pre-aggregations
- buildRawCube and the nested-AJ path no longer emit daily_rollup /
monthly_rollup. Heuristic rollups bloat CubeStore with unused
materializations and surprise users with hidden refresh schedules.
Users add pre-aggs explicitly when they understand query patterns.
User-added pre-aggs in existing models are preserved through merge.
Pre-existing test cleanup
- package.json test: add --experimental-test-module-mocks so
mock.module() works on Node 22.12 (provisionFraiOS suite).
- Legacy ARRAY JOIN path: surface the user-supplied alias as a
dimension on the flattened cube (collision-safe).
- buildWhereClause test: assert current 'no allowlist ⇒ all tables
internal' semantics.
- profileTable emitter test: assert current step names ('init',
'initial_profile', 'profiling').
Tests: 511 / 511 passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e381466 to
601e32b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Several improvements to smart-gen so it produces noticeably cleaner, more useful Cube models — fewer rollups, shorter field names, more accurate field-skip rules.
Output format
.yml). Falls back to JS only when a cube has aFILTER_PARAMSarrow callback we cannot translate. Identity arrows(v) => vare auto-transpiled to Python lambdas (lambda v: v) so YAML works for the nested-lookup-key path too.file_namewins → reuse existing model's extension on re-run → otherwise.yml.Cube naming from filters
event = 'Stockout Ended'with no explicit name now derivesstockout_ended.yml. Re-running same filter set updates the same file.Field naming — shortest-unique resolver
Each field carries an ordered candidate list from leaf → fully-qualified. A new resolver picks the shortest non-clashing candidate; basic columns hold their leaf names and longer-candidate fields advance around them. FILTER_PARAMS refs auto-rebind when a lookup-key dim is renamed.
props_colorcolorcommerce_products_ididcommerce_products_typetypelat+location.latlat,location_latValue counting fix for
Nested(...)parallel arraysProfiler's
arrayColumnSqlswitched fromuniq()(counts whole-array values) touniqArray()+arrayFilter(counts elements).value_rowsnow requires at least one meaningful element — string non-empty, number non-null/non-zero, other non-null — so anArray(Nullable(Bool))of all NULL no longer reads as 100% populated just because the parallel array has length 1. Numeric arrays also emitminArray/maxArrayso the all-zero skip rule works for them.Auto-skip cascade
cubeBuilder.processColumnsnow skips:uniqueValues===1andlc_values[0]empty / whitespace /'0'/0min === max === 0min === maxORuniqueValues === 1(catches thecustomer_facing Int8always-zero case)Same cascade mirrored into the nested-AJ children path.
No auto pre-aggregations
buildRawCubeand the nested-AJ path no longer emitdaily_rollup/monthly_rollup. Heuristic rollups bloat CubeStore and surprise users with hidden refresh schedules. User-added pre-aggs in existing models are preserved through merge.Pre-existing test fixes (was 4 baseline failures)
package.jsontest script:--experimental-test-module-mockssomock.module()works on Node 22.12 (provisionFraiOSsuite).aliasas a dimension on the flattened cube (collision-safe).buildWhereClausetest: assert current "no allowlist ⇒ all tables internal" semantics.profileTableemitter test: assert current step names (init/initial_profile/profiling).Test plan
node --experimental-test-module-mocks --test 'src/**/__tests__/*.test.js'→ 511 / 511 passingNested(...)columns, generate; confirm shorter field names +.ymloutputevent = 'X'; confirm derived cube/file namelambda v: vin YAML and queries still resolve.jssmart-gen models keep their.jsextension on re-runCompanion PR
Frontend mirror of the skip rules in step-2 auto-select: smartdataHQ/client-v2#feat/smart-gen-field-selection
🤖 Generated with Claude Code