[snapshot-regression-fix] Fix MBQI ho_var term-enumeration timeout regression (iss-6174/small.smt2) by levnach · Pull Request #9954 · Z3Prover/z3

levnach · 2026-06-25T09:17:14Z

Summary

Fixes an MBQI model-finder timeout regression surfaced by the snapshot-regression corpus.

Originating discussion: https://github.com/Z3Prover/bench/discussions/2664
Benchmark: iss-6174/small.smt2 (in Z3Prover/bench, under inputs/issues/iss-6174/)
z3 under test (nightly): z3-4.17.0-x64-glibc-2.39

Reported divergence

--- small.expected.out (expected)
+++ produced (current z3)
@@ -1,3 +1,3 @@
 sat
 sat
-unknown
+sat

What current `master` actually does (the real regression)

The recorded oracle for the 3rd (check-sat) is unknown and the nightly under test produced sat, but current z3 master (HEAD 6fd303c4b) produces neither — it hangs / times out on the 3rd query, even at -T:60. A timeout is strictly worse than both unknown and sat, so this is a genuine, separate regression. smt.mbqi=false returns unknown instantly, localizing the hang to MBQI.

Root cause

HEAD commit 6fd303c4b ("set the auf flag to false in all cases") made two changes to src/smt/smt_model_finder.cpp. Besides the titular m_is_auf = false change, it also added two Boolean productions to the ho_var instantiation-set enumeration in ho_var::populate_inst_sets:

tn.add_production(m.mk_true());
tn.add_production(m.mk_false());

In this benchmark the universally-quantified variable r has the deeply nested array sort (Array (Array Bool Bool) (Array Bool Bool)), whose term_enumeration target sort unfolds to Bool. The extra true/false Bool leaves combine combinatorially with the select/store/equality operators, so term_enumeration::enum_terms blows up and MBQI hangs. The pre-existing max_count = 20 guard bounds how many enumerated terms are inserted, but not the enumerator's internal work. (The same enumeration previously needed bounding in commit d1170d19b, "Bound ho_var term enumeration to fix MBQI timeout regression".)

Confirmed by empirical bisection — rebuild + re-run each state:

state	3rd `(check-sat)`
both lines reverted (parent)	`sat` (fast)
keep only `m_is_auf = false`, drop the two productions	`sat` (fast)
keep only the two productions, revert `m_is_auf`	timeout
HEAD `6fd303c4b` (both)	timeout

The two add_production(true/false) lines are the sole cause; the m_is_auf change is benign for this benchmark.

Fix

Remove just the two auxiliary Boolean productions; the commit's intended m_is_auf = false change is preserved.

                 ast_mark visited;
-                tn.add_production(m.mk_true());
-                tn.add_production(m.mk_false());
                 for (enode *n : ctx->enodes()) {

Validation (rebuilt z3 + re-ran benchmark)

Rebuilt the checkout: ./configure && make -C build -j$(nproc).
Re-ran inputs/issues/iss-6174/small.smt2 with -T:20 three times → sat / sat / sat every time, no timeout (previously the 3rd query hung).
Isolated the 3rd query and ran with model_validate=true → sat with a concrete model that passes validation. The formula is genuinely satisfiable (constraints reduce to x2 = true, ar[a] = true, x[r5] = ar).

Note on the oracle

After the fix z3 returns sat for the 3rd query — matching the nightly, and sound (model-validated) — rather than the older recorded oracle unknown. The unknown → sat difference is a benign precision improvement that predates this regression; the bug fixed here is the timeout. Per the snapshot-regression guardrails I did not modify the oracle (small.expected.out); whether to refresh it to sat is a separate human decision.

Opened as a draft for human review.

Generated by Fix a Z3 snapshot-regression divergence · 2.8K AIC · ⌖ 82 AIC · ⊞ 41.2K · ◷

…mt2) Commit 6fd303c ("set the auf flag to false in all cases") additionally added two Boolean productions to the ho_var instantiation-set enumeration in smt_model_finder.cpp: tn.add_production(m.mk_true()); tn.add_production(m.mk_false()); For benchmarks whose universally-quantified array variable has a deeply nested array sort that unfolds to a Bool target sort (e.g. the forall over r : (Array (Array Bool Bool) (Array Bool Bool)) in iss-6174/small.smt2), these extra Bool leaves combine combinatorially with the select/store/eq operators in term_enumeration. enum_terms then blows up and MBQI hangs: the third check-sat, which previously returned quickly, now times out at -T:20 (and even -T:60). The pre-existing max_count=20 bound limits how many terms are inserted but not the enumerator's internal work. Remove the two auxiliary Boolean productions. The commit's titular change (setting m_is_auf = false in all cases) is benign for this benchmark and is preserved; only the enumeration-blowup lines are dropped. After the fix the benchmark returns "sat sat sat" with no timeout, and the third query's model passes model_validate=true. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

levnach added automated snapshot-regression labels Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[snapshot-regression-fix] Fix MBQI ho_var term-enumeration timeout regression (iss-6174/small.smt2)#9954

[snapshot-regression-fix] Fix MBQI ho_var term-enumeration timeout regression (iss-6174/small.smt2)#9954
levnach wants to merge 1 commit into
masterfrom
fix/iss-6174-mbqi-ho-var-enum-timeout-861d52f0281e5cde

levnach commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

levnach commented Jun 25, 2026

Summary

Reported divergence

What current master actually does (the real regression)

Root cause

Fix

Validation (rebuilt z3 + re-ran benchmark)

Note on the oracle

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

What current `master` actually does (the real regression)