Fix type variable leakage in block variables list (issue #1312)#1313
Fix type variable leakage in block variables list (issue #1312)#1313seirl wants to merge 1 commit into
Conversation
Under the hood, when you compile a CEL policy with local variables, the compiler bundles them into a single internal function call called cel.@block. The first argument of this call is a list containing all the variable initializers, and the second is the body of the expression: cel.@block([var1_init, var2_init, ...], body) For example, if you define two variables: empty_list = [] // (inferred as list(dyn)) string_list = ["foo"] // (inferred as list(string)) They get bundled into: cel.@block([[], ["foo"]], ...) When CEL type-checks this composed expression, it looks at the first argument [[], ["foo"]] and sees a standard list literal. CEL lists are designed to be homogeneous. To enforce this, the type checker eagerly unifies the types of all elements in a list literal. However, when it unifies [] (which has an unbound type parameter list(_var0)) with ["foo"] (list(string)), it decides they are compatible by binding the type parameter _var0 to string. Even if the list as a whole later defaults to list(dyn) because of other incompatible elements, it's already too late: the type checker has already recorded that the independent empty_list is of type list(string) and will keep this information for the rest of the type checking step. If you later try to concatenate empty_list + [1], the type checker returns an error: > found no matching overload for '_+_' applied to '(list(string), list(int))' Basically, the type of string_list bled into the completely independent empty_list simply because they were temporarily bundled in the same initializer list. The problem is that the variables "list" passed to cel.@block is semantically not a normal homogeneous list. It is more like a heterogeneous tuple of independent expressions. To fix this, we special-case the internal cel.@block function in the core type checker. * When we encounter cel.@block, we type-check each variable initializer in the list independently. * We do not run the unification logic on them, completely preventing any type bleeding. * We set the type of the variables list directly to list(dyn) (matching the cel.@block signature) and then proceed to type-check the body. This allows local variables to maintain their independent types, resolving the compilation failures while preserving the performance benefits of cel.@block.
|
@seirl The individual expressions placed into the cel.@block are type-checked ahead of time, and then assembled into the block format as an optimization. Can you help me understand what the issue is in practice? Is this mostly that the individual slots need to have their type information preserved during the optimization for the final type-check to provide a sensible output? |
|
@TristonianJones have you checked the minimal reproducible example in #1312 ? The problem is that compiling: fails with: |
|
That's actually a known bug in CEL type checking where the type parameter assignment gets confused by empty list sometimes. The expression As for CEL block, the individual entries are type checked separately, and the type assignment propagated to the @index variables. If you fix the main bug, I believe you'll get more predictable results. |
|
I can't reproduce the bug with the expression Do you have examples for the "main bug" that should be fixed alongside this one, or is there an existing bug for it? |
|
Hi @seirl, I believe the issue that's causing troubles here is the same or similar to the one which fails for type promotion within a list In the past, there have been a few other cases of type resolution challenges due to the erasure of type parameters to There should be a carve-out for -Tristan |
|
OK, I think I understand where you see the connection. The type unification is greedy for lists, so if you have a type unification for However, I think in my case, fixing this behavior still wouldn't be enough. Imagine you had a non-greedy type unification, and you waited for the end of your list literal to assign a type to your list elements. If you had And I think this behavior is correct for regular lists, but not for |
|
Poking a bit, it looks like the mis identified type happens when the composer does the second pass to unnest rules (in the basic compose indexes are manually mapped from the inferred type of the independent expressions, in the unnester it reads the inferred type from checking the init list type inferences). https://github.com/google/cel-go/blob/a82c68b770ac0cb67f7b4f76166827c14b145eb8/policy/composer.go#L275. I think cel-go handles this in conformance tests by just marking all of the index variables as dyn for the purposes of checking a hand-rolled AST with the block construct. To get the conformance tests for cel.block working in C++ I added a similar change to this PR -- cel-expr/cel-cpp#1968. It still has some odd behaviors around free type parameters that we might need to address. Block makes it extra tricky since the variable is intended to be referenced in multiple places.
|
|
@jnthntatum Cool! So just so I understand fully, you had to implement something similar to what I do in this pull request to make the C++ checker pass the conformance tests? Does that mean that the cel-go code doesn't pass the conformance tests? Is the C++ implementation able to compile the example that I gave correctly, with the empty list variable? |
|
Both Java and Go have some issues in type resolution with empty container types. Often the result is dyn and doesn't have a meaningful impact, but it would be good to fix. I just don't know if the solution proposed is more of a bandaid over the core issue or a fix for an issue that makes the underlying cause worse |
|
A potential alternative: we could simply use the declared type on the CompiledRule itself as the source of truth rather than trying to use the inferred type from the checker: - celType := a.GetType(v.ID())
+ celType := opt.rule.Variables()[i].Declaration().Type() // Illustrative example. May have to collect all variables in this scope. This is what we do in CEL-Java's composer implementation to bypass the unification pollution problem. I agree though that supporting this in checker is probably more ideal: |
Under the hood, when you compile a CEL policy with local variables, the compiler bundles them into a single internal function call called
cel.@block.The first argument of this call is a list containing all the variable initializers, and the second is the body of the expression:
cel.@block([var1_init, var2_init, ...], body)For example, if you define two variables:
They get bundled into:
cel.@block([[], ["foo"]], ...)When CEL type-checks this composed expression, it looks at the first argument
[[], ["foo"]]and sees a standard list literal. CEL lists are designed to be homogeneous. To enforce this, the type checker eagerly unifies the types of all elements in a list literal.However, when it unifies
[](which has an unbound type parameterlist(_var0)) with["foo"](list(string)), it decides they are compatible by binding the type parameter_var0tostring.Even if the list as a whole later defaults to
list(dyn)because of other incompatible elements, it's already too late: the type checker has already recorded that the independentempty_listis of typelist(string)and will keep this information for the rest of the type checking step.If you later try to concatenate
empty_list + [1], the type checker returns an error:Basically, the type of
string_listbled into the completely independentempty_listsimply because they were temporarily bundled in the same initializer list.The problem is that the variables "list" passed to
cel.@blockis semantically not a normal homogeneous list. It is more like a heterogeneous tuple of independent expressions.To fix this, we special-case the internal
cel.@blockfunction in the core type checker.cel.@block, we type-check each variable initializer in the list independently.list(dyn)(matching thecel.@blocksignature) and then proceed to type-check the body.This allows local variables to maintain their independent types, resolving the compilation failures while preserving the performance benefits of
cel.@block.Fix #1312