Skip to content

[ZC-1342]: Disable variable syntax in derivations#489

Merged
slobodan-ilic merged 7 commits into
masterfrom
ZC-1342-disable-variable-syntax-in-derivations
Jun 12, 2026
Merged

[ZC-1342]: Disable variable syntax in derivations#489
slobodan-ilic merged 7 commits into
masterfrom
ZC-1342-disable-variable-syntax-in-derivations

Conversation

@slobodan-ilic

Copy link
Copy Markdown
Contributor

All derivation expressions should use {"var": <alias>} syntax. The {"variable": <varid>} syntax is deprecated and will not be supported in the fututre.

@slobodan-ilic slobodan-ilic requested a review from jjdelc August 5, 2025 16:39
@slobodan-ilic slobodan-ilic force-pushed the ZC-1342-disable-variable-syntax-in-derivations branch from c4633ae to 25bcb3d Compare August 8, 2025 09:14

@jjdelc jjdelc left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing work!! Thanks @slobodan-ilic

Comment thread scrunch/expressions.py
Comment thread scrunch/tests/test_datasets.py Outdated
'expression': {
'function': '!=',
'args': [
{'variable': var.url}, # Crunch needs variable URLs!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there somewhere a var variable that isn't used anymore now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed those unused var assignments in the exclusion-filter tests in 50c268c.

"function": "in",
"args": [
{
"variable": var1.url

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a var1 and var2

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, removed the unused var1/var2 setup too in 50c268c.

Comment thread scrunch/tests/test_expressions.py
Comment thread scrunch/datasets.py Outdated
# We are in the case of a default fill, replace the -1 with the new
# variable
fill_map["-1"] = {"variable": vars_by_alias[else_case["variable"]]["id"]}
fill_map["-1"] = {"var": else_case["var"]}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there still a vars_by_alias that's being fetched that we may not need anymore?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept vars_by_alias here. We no longer need it to turn fill_map entries into ids, but create_fill_values still uses it to validate source variables and keep the existing scrunch public API where callers pass variable aliases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarifying after the cleanup in 1717fcb: vars_by_alias is no longer used in the fill_map construction. The fill map now uses the scrunch input alias and emits var. The vars_by_alias fetch remains earlier only for the existing categorical type validation.

Comment thread scrunch/expressions.py
Comment thread scrunch/expressions.py

# inspect function, then inspect variable, if multiple_response,
# then change in --> any
if 'function' in obj and 'args' in obj:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not making this replacement anymore?

This is because in scrunch they say subvar in mr_var but in zz9 it is more like any(mr_var, [subvar]) so they need to replace IN for ANY

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Restored this path in 50c268c: MR variables parsed with in are normalized through the any handling again, with unit coverage for the var/axes payload.

Comment thread scrunch/expressions.py
Comment thread scrunch/expressions.py

if 'var' in fragment:
if 'axes' in fragment:
return "{}[{}]".format(fragment['var'], fragment['axes'][0])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhhh are we parsing this back? I can't remember if we have tests that show that scrunch can understand this. It's been so long 🧓🏽

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests for this. prettify now covers var directly and var+axes as array syntax, and the parser/process path keeps emitting the Crunch API var/axes shape.

@slobodan-ilic slobodan-ilic Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More concretely, the prettify coverage is:

  • TestPrettify.test_var_alias_no_dataset for {"var": ...}
  • TestPrettify.test_var_square_bracket_subvariables for { "var": ..., "axes": [...] }
  • TestPrettify.test_transitional_variable_url for the temporary old variable URL response shape
  • TestPrettify.test_transitional_variable_url_square_bracket_subvariables for old subvariable URL responses

The parser/process outbound var/axes assertions are in:

  • TestExpressionProcessing.test_process_all_multiple_response
  • TestExpressionProcessing.test_process_in_multiple_response

@slobodan-ilic slobodan-ilic force-pushed the ZC-1342-disable-variable-syntax-in-derivations branch from 72263e3 to 796a0c2 Compare June 5, 2026 15:42

@jjdelc jjdelc left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function process_expr is not needed anymore, it exists to convert aliases to URLs, if we're now doing aliases directly. Keeping that function is unevaluated code debt that should also be removed.

With this PR there is no reason to keep all that complex code still around generating confusion.

Comment thread scrunch/expressions.py Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these docstrings need to change too

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 85963c3. Updated the example to use the var term to match what the parser now emits.

@slobodan-ilic slobodan-ilic force-pushed the ZC-1342-disable-variable-syntax-in-derivations branch 4 times, most recently from 3d9d1d9 to 85963c3 Compare June 10, 2026 18:09
@slobodan-ilic

Copy link
Copy Markdown
Contributor Author

@jjdelc — pushed 85963c3 addressing this. Took a closer look at what process_expr does now, and the alias→URL conversion you mentioned is in fact already gone (removed in earlier commits on this branch, e.g. 796a0c2). What still lives in there is metadata-driven enrichment that parse_expr structurally cannot do without a dataset entity:

  1. subvariable → parent + axes rewrite
  2. category name → id resolution
  3. multiple-response adaptation (column form)
  4. any↔in swap based on array vs categorical
  5. array variable expansion into or(...) over subvariables

Rather than remove these, I refactored for honesty:

  • Module + process_expr docstrings now describe the five real tasks, state plainly that aliases are no longer converted to URLs, and the top-of-file example uses var to match.
  • Dropped dead code: base_url, the unused variable_id inner helper, the never-written outer subvariables accumulator.
  • Extracted the 60-line array-expansion block into _expand_array_filter with a docstring, and replaced the loop-leaking var name with an explicitly tracked array_var.
  • Section headers inside _process label each phase: walk / subvariable rewrite / any↔in swap / array expansion.

Behavior is unchanged — 337 unit tests still pass. If you can point to specific cases where any of the five rewrites are actually unnecessary against the current API, happy to peel more.

Clarify process_expr instead of removing it. JJ's note assumed the
function only translates aliases to URLs; the alias->URL part is in
fact already gone (earlier commits on this branch). What remains is
metadata-driven enrichment that parse_expr structurally cannot do:
subvariable -> parent+axes rewrite, category name -> id resolution,
multiple-response adaptation, any<->in swap by variable type, and
array variable expansion into or(...) chains.

Refactor for honesty:
- Module + process_expr docstrings now describe the five real tasks
  and state plainly that aliases are not converted to URLs. The
  top-of-file example payload now uses `var` to match.
- Drop dead code: base_url, the unused variable_id inner helper, the
  outer subvariables accumulator that is never written to.
- Extract the array-expansion block into _expand_array_filter with a
  docstring, and replace the loop-leaking `var` name with an
  explicitly tracked array_var.
- Section headers inside _process label each phase: walk,
  subvariable rewrite, any<->in swap, array expansion.

No behavior change — 337 unit tests still pass.
@slobodan-ilic slobodan-ilic force-pushed the ZC-1342-disable-variable-syntax-in-derivations branch from 85963c3 to d5fc1ea Compare June 10, 2026 18:18

@jjdelc jjdelc left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this work! Great reworking, mustn't been easy to understand all of this to make this improvements. 🚀

Comment thread scrunch/datasets.py
folders_by_name[folder_name].entity.delete()
# Always delete the tmp dataset no matter what
tmp_ds.delete() No newline at end of file
tmp_ds.delete()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, closing newline, good!

Comment thread scrunch/expressions.py
Expression objects produced by `parse_expr` are purely syntactic: they
refer to variables by `alias` (in `var` terms) and know nothing about
the dataset. They are not yet ready to be sent to Crunch -- the API still
needs some metadata-driven rewrites (subvariable -> parent+axes, category

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting point, we have something similar happening in automation in which even after having an alias based expression, we still need to do a 2nd pass for the exact same motives.

This makes me think if we should start doing some of this in the API instead of demanding consumers (Scrunch, Crunch Automation) having to deal with this as well.

Comment thread scrunch/expressions.py
return list(range(lower, upper + 1))


def parse_expr(expr, platonic=False):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3

Comment thread scrunch/expressions.py

Aliases are NOT converted to URLs -- the API accepts alias-based `var`
terms directly. Alias existence is validated here as a side-effect of
the metadata lookups.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great docstring! Thanks for the clarity!

@slobodan-ilic slobodan-ilic merged commit c7dc749 into master Jun 12, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants