Skip to content

Fix type variable leakage in block variables list (issue #1312)#1313

Open
seirl wants to merge 1 commit into
cel-expr:masterfrom
seirl:fix-cel-issue-1312
Open

Fix type variable leakage in block variables list (issue #1312)#1313
seirl wants to merge 1 commit into
cel-expr:masterfrom
seirl:fix-cel-issue-1312

Conversation

@seirl

@seirl seirl commented May 8, 2026

Copy link
Copy Markdown
Collaborator

Under the hood, when you compile a CEL policy with local variables, the compiler bundles them into a single internal function call called cel.@block.

The first argument of this call is a list containing all the variable initializers, and the second is the body of the expression: cel.@block([var1_init, var2_init, ...], body)

For example, if you define two variables:

empty_list = []  // (inferred as list(dyn))
string_list = ["foo"]  // (inferred as list(string))

They get bundled into: cel.@block([[], ["foo"]], ...)

When CEL type-checks this composed expression, it looks at the first argument [[], ["foo"]] and sees a standard list literal. CEL lists are designed to be homogeneous. To enforce this, the type checker eagerly unifies the types of all elements in a list literal.

However, when it unifies [] (which has an unbound type parameter list(_var0)) with ["foo"] (list(string)), it decides they are compatible by binding the type parameter _var0 to string.

Even if the list as a whole later defaults to list(dyn) because of other incompatible elements, it's already too late: the type checker has already recorded that the independent empty_list is of type list(string) and will keep this information for the rest of the type checking step.

If you later try to concatenate empty_list + [1], the type checker returns an error:

found no matching overload for '_+_' applied to '(list(string), list(int))'

Basically, the type of string_list bled into the completely independent empty_list simply because they were temporarily bundled in the same initializer list.

The problem is that the variables "list" passed to cel.@block is semantically not a normal homogeneous list. It is more like a heterogeneous tuple of independent expressions.

To fix this, we special-case the internal cel.@block function in the core type checker.

  • When we encounter cel.@block, we type-check each variable initializer in the list independently.
  • We do not run the unification logic on them, completely preventing any type bleeding.
  • We set the type of the variables list directly to list(dyn) (matching the cel.@block signature) and then proceed to type-check the body.

This allows local variables to maintain their independent types, resolving the compilation failures while preserving the performance benefits of cel.@block.

Fix #1312

Under the hood, when you compile a CEL policy with local variables, the
compiler bundles them into a single internal function call called
cel.@block.

The first argument of this call is a list containing all the variable
initializers, and the second is the body of the expression:
cel.@block([var1_init, var2_init, ...], body)

For example, if you define two variables:

    empty_list = []  // (inferred as list(dyn))
    string_list = ["foo"]  // (inferred as list(string))

They get bundled into: cel.@block([[], ["foo"]], ...)

When CEL type-checks this composed expression, it looks at the first
argument [[], ["foo"]] and sees a standard list literal.
CEL lists are designed to be homogeneous. To enforce this, the type
checker eagerly unifies the types of all elements in a list literal.

However, when it unifies [] (which has an unbound type parameter
list(_var0)) with ["foo"] (list(string)), it decides they are compatible
by binding the type parameter _var0 to string.

Even if the list as a whole later defaults to list(dyn) because of other
incompatible elements, it's already too late: the type checker has
already recorded that the independent empty_list is of type
list(string) and will keep this information for the rest of the type
checking step.

If you later try to concatenate empty_list + [1], the type checker
returns an error:

> found no matching overload for '_+_' applied to '(list(string), list(int))'

Basically, the type of string_list bled into the completely independent
empty_list simply because they were temporarily bundled in the same
initializer list.

The problem is that the variables "list" passed to cel.@block is
semantically not a normal homogeneous list. It is more like a
heterogeneous tuple of independent expressions.

To fix this, we special-case the internal cel.@block function in the
core type checker.

* When we encounter cel.@block, we type-check each variable initializer
  in the list independently.
* We do not run the unification logic on them, completely preventing any
  type bleeding.
* We set the type of the variables list directly to list(dyn) (matching
  the cel.@block signature) and then proceed to type-check the body.

This allows local variables to maintain their independent types,
resolving the compilation failures while preserving the performance
benefits of cel.@block.
@seirl seirl requested a review from TristonianJones May 8, 2026 23:24
@TristonianJones

TristonianJones commented May 9, 2026

Copy link
Copy Markdown
Collaborator

@seirl The individual expressions placed into the cel.@block are type-checked ahead of time, and then assembled into the block format as an optimization. Can you help me understand what the issue is in practice?

Is this mostly that the individual slots need to have their type information preserved during the optimization for the final type-check to provide a sensible output?

@seirl

seirl commented May 9, 2026

Copy link
Copy Markdown
Collaborator Author

@TristonianJones have you checked the minimal reproducible example in #1312 ? The problem is that compiling:

name: "mre"
rule:
  variables:
    - name: "empty_list"
      expression: "[]"
    - name: "string_list"
      expression: "['foo']"
  match:
    - output: "variables.empty_list + [1]"

fails with:

found no matching overload for '_+_' applied to '(list(string), list(int))'

@TristonianJones

Copy link
Copy Markdown
Collaborator

That's actually a known bug in CEL type checking where the type parameter assignment gets confused by empty list sometimes. The expression [] + [1] (or some variation) should be all you need to repro.

As for CEL block, the individual entries are type checked separately, and the type assignment propagated to the @index variables. If you fix the main bug, I believe you'll get more predictable results.

@seirl

seirl commented May 9, 2026

Copy link
Copy Markdown
Collaborator Author

I can't reproduce the bug with the expression [] + [1] or any other similar variation that I tried. Actually there's a bunch of tests in checker_test.go that check exactly that, and it seems to work fine AFAICT. The only thing where I saw this behavior is for cel.@block.

Do you have examples for the "main bug" that should be fixed alongside this one, or is there an existing bug for it?

@TristonianJones

Copy link
Copy Markdown
Collaborator

Hi @seirl, I believe the issue that's causing troubles here is the same or similar to the one which fails for type promotion within a list wrapper_int and int types:

https://github.com/google/cel-spec/blob/cb51b4176013ad19bd00df94be273c322916a620/tests/simple/testdata/type_deduction.textproto#L520

In the past, there have been a few other cases of type resolution challenges due to the erasure of type parameters to dyn, which may also be causing some trouble here, but I'm not sure.

There should be a carve-out for cel.@block already to be heterogeneously typed and the appropriate @index variables have a strong type; however this would still reduce to a type-promotion failure in the type-checker that looks similar to the test in the cel-spec that cel-go is currently failing (it's expressly disabled in the conformance tests)

-Tristan

@seirl

seirl commented May 11, 2026

Copy link
Copy Markdown
Collaborator Author

OK, I think I understand where you see the connection. The type unification is greedy for lists, so if you have a type unification for [wrapper(int), int], it thinks the list is heterogeneous, even though theoretically you could unify this as [int, int].

However, I think in my case, fixing this behavior still wouldn't be enough. Imagine you had a non-greedy type unification, and you waited for the end of your list literal to assign a type to your list elements. If you had [[], [3]] I think you would still want to unify this as list(list(int)), because an empty list shouldn't make an otherwise homogeneous list a dyn list.

And I think this behavior is correct for regular lists, but not for cel.@block, because you don't care about the list being homogeneous, conceptually you just want to type-check it as a tuple. So I think we still need to special-case it in some way. Does that make sense?

@jnthntatum

jnthntatum commented May 11, 2026

Copy link
Copy Markdown
Collaborator

Poking a bit, it looks like the mis identified type happens when the composer does the second pass to unnest rules (in the basic compose indexes are manually mapped from the inferred type of the independent expressions, in the unnester it reads the inferred type from checking the init list type inferences). https://github.com/google/cel-go/blob/a82c68b770ac0cb67f7b4f76166827c14b145eb8/policy/composer.go#L275. I think cel-go handles this in conformance tests by just marking all of the index variables as dyn for the purposes of checking a hand-rolled AST with the block construct.

To get the conformance tests for cel.block working in C++ I added a similar change to this PR -- cel-expr/cel-cpp#1968. It still has some odd behaviors around free type parameters that we might need to address. Block makes it extra tricky since the variable is intended to be referenced in multiple places.

  • variables.empty_list + ['foo'] -> list(dyn)
  • variables.empty_list + [1] -> list(dyn)
  • variables.empty_optional.orValue('foo') -> dyn

@seirl

seirl commented May 12, 2026

Copy link
Copy Markdown
Collaborator Author

@jnthntatum Cool! So just so I understand fully, you had to implement something similar to what I do in this pull request to make the C++ checker pass the conformance tests? Does that mean that the cel-go code doesn't pass the conformance tests?

Is the C++ implementation able to compile the example that I gave correctly, with the empty list variable?

@TristonianJones

Copy link
Copy Markdown
Collaborator

Both Java and Go have some issues in type resolution with empty container types. Often the result is dyn and doesn't have a meaningful impact, but it would be good to fix. I just don't know if the solution proposed is more of a bandaid over the core issue or a fix for an issue that makes the underlying cause worse

@l46kok

l46kok commented May 13, 2026

Copy link
Copy Markdown
Collaborator

A potential alternative: we could simply use the declared type on the CompiledRule itself as the source of truth rather than trying to use the inferred type from the checker:

-                       celType := a.GetType(v.ID()) 
+                       celType := opt.rule.Variables()[i].Declaration().Type() // Illustrative example. May have to collect all variables in this scope. 

This is what we do in CEL-Java's composer implementation to bypass the unification pollution problem. I agree though that supporting this in checker is probably more ideal:

https://github.com/google/cel-java/blob/14d4c2e39151f2e99e36f9818a9118b01c1d9ed3/policy/src/main/java/dev/cel/policy/RuleComposer.java#L66

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compiler type-checking failure with empty list variables

4 participants