[ZC-1342]: Disable variable syntax in derivations#489
Conversation
c4633ae to
25bcb3d
Compare
jjdelc
left a comment
There was a problem hiding this comment.
This is amazing work!! Thanks @slobodan-ilic
| 'expression': { | ||
| 'function': '!=', | ||
| 'args': [ | ||
| {'variable': var.url}, # Crunch needs variable URLs! |
There was a problem hiding this comment.
Is there somewhere a var variable that isn't used anymore now?
There was a problem hiding this comment.
Removed those unused var assignments in the exclusion-filter tests in 50c268c.
| "function": "in", | ||
| "args": [ | ||
| { | ||
| "variable": var1.url |
There was a problem hiding this comment.
Yep, removed the unused var1/var2 setup too in 50c268c.
| # We are in the case of a default fill, replace the -1 with the new | ||
| # variable | ||
| fill_map["-1"] = {"variable": vars_by_alias[else_case["variable"]]["id"]} | ||
| fill_map["-1"] = {"var": else_case["var"]} |
There was a problem hiding this comment.
Is there still a vars_by_alias that's being fetched that we may not need anymore?
There was a problem hiding this comment.
I kept vars_by_alias here. We no longer need it to turn fill_map entries into ids, but create_fill_values still uses it to validate source variables and keep the existing scrunch public API where callers pass variable aliases.
There was a problem hiding this comment.
Clarifying after the cleanup in 1717fcb: vars_by_alias is no longer used in the fill_map construction. The fill map now uses the scrunch input alias and emits var. The vars_by_alias fetch remains earlier only for the existing categorical type validation.
|
|
||
| # inspect function, then inspect variable, if multiple_response, | ||
| # then change in --> any | ||
| if 'function' in obj and 'args' in obj: |
There was a problem hiding this comment.
We're not making this replacement anymore?
This is because in scrunch they say subvar in mr_var but in zz9 it is more like any(mr_var, [subvar]) so they need to replace IN for ANY
There was a problem hiding this comment.
Good catch. Restored this path in 50c268c: MR variables parsed with in are normalized through the any handling again, with unit coverage for the var/axes payload.
|
|
||
| if 'var' in fragment: | ||
| if 'axes' in fragment: | ||
| return "{}[{}]".format(fragment['var'], fragment['axes'][0]) |
There was a problem hiding this comment.
uhhh are we parsing this back? I can't remember if we have tests that show that scrunch can understand this. It's been so long 🧓🏽
There was a problem hiding this comment.
Added tests for this. prettify now covers var directly and var+axes as array syntax, and the parser/process path keeps emitting the Crunch API var/axes shape.
There was a problem hiding this comment.
More concretely, the prettify coverage is:
TestPrettify.test_var_alias_no_datasetfor{"var": ...}TestPrettify.test_var_square_bracket_subvariablesfor{ "var": ..., "axes": [...] }TestPrettify.test_transitional_variable_urlfor the temporary oldvariableURL response shapeTestPrettify.test_transitional_variable_url_square_bracket_subvariablesfor old subvariable URL responses
The parser/process outbound var/axes assertions are in:
TestExpressionProcessing.test_process_all_multiple_responseTestExpressionProcessing.test_process_in_multiple_response
72263e3 to
796a0c2
Compare
jjdelc
left a comment
There was a problem hiding this comment.
The function process_expr is not needed anymore, it exists to convert aliases to URLs, if we're now doing aliases directly. Keeping that function is unevaluated code debt that should also be removed.
With this PR there is no reason to keep all that complex code still around generating confusion.
There was a problem hiding this comment.
I think these docstrings need to change too
There was a problem hiding this comment.
Good catch — fixed in 85963c3. Updated the example to use the var term to match what the parser now emits.
3d9d1d9 to
85963c3
Compare
|
@jjdelc — pushed 85963c3 addressing this. Took a closer look at what
Rather than remove these, I refactored for honesty:
Behavior is unchanged — 337 unit tests still pass. If you can point to specific cases where any of the five rewrites are actually unnecessary against the current API, happy to peel more. |
Clarify process_expr instead of removing it. JJ's note assumed the function only translates aliases to URLs; the alias->URL part is in fact already gone (earlier commits on this branch). What remains is metadata-driven enrichment that parse_expr structurally cannot do: subvariable -> parent+axes rewrite, category name -> id resolution, multiple-response adaptation, any<->in swap by variable type, and array variable expansion into or(...) chains. Refactor for honesty: - Module + process_expr docstrings now describe the five real tasks and state plainly that aliases are not converted to URLs. The top-of-file example payload now uses `var` to match. - Drop dead code: base_url, the unused variable_id inner helper, the outer subvariables accumulator that is never written to. - Extract the array-expansion block into _expand_array_filter with a docstring, and replace the loop-leaking `var` name with an explicitly tracked array_var. - Section headers inside _process label each phase: walk, subvariable rewrite, any<->in swap, array expansion. No behavior change — 337 unit tests still pass.
85963c3 to
d5fc1ea
Compare
jjdelc
left a comment
There was a problem hiding this comment.
Thanks so much for this work! Great reworking, mustn't been easy to understand all of this to make this improvements. 🚀
| folders_by_name[folder_name].entity.delete() | ||
| # Always delete the tmp dataset no matter what | ||
| tmp_ds.delete() No newline at end of file | ||
| tmp_ds.delete() |
| Expression objects produced by `parse_expr` are purely syntactic: they | ||
| refer to variables by `alias` (in `var` terms) and know nothing about | ||
| the dataset. They are not yet ready to be sent to Crunch -- the API still | ||
| needs some metadata-driven rewrites (subvariable -> parent+axes, category |
There was a problem hiding this comment.
This is an interesting point, we have something similar happening in automation in which even after having an alias based expression, we still need to do a 2nd pass for the exact same motives.
This makes me think if we should start doing some of this in the API instead of demanding consumers (Scrunch, Crunch Automation) having to deal with this as well.
| return list(range(lower, upper + 1)) | ||
|
|
||
|
|
||
| def parse_expr(expr, platonic=False): |
|
|
||
| Aliases are NOT converted to URLs -- the API accepts alias-based `var` | ||
| terms directly. Alias existence is validated here as a side-effect of | ||
| the metadata lookups. |
There was a problem hiding this comment.
Great docstring! Thanks for the clarity!
All derivation expressions should use
{"var": <alias>}syntax. The{"variable": <varid>}syntax is deprecated and will not be supported in the fututre.