Skip to content

[snapshot-regression-fix] Fix MBQI ho_var term-enumeration timeout regression (iss-6174/small.smt2)#9954

Draft
levnach wants to merge 1 commit into
masterfrom
fix/iss-6174-mbqi-ho-var-enum-timeout-861d52f0281e5cde
Draft

[snapshot-regression-fix] Fix MBQI ho_var term-enumeration timeout regression (iss-6174/small.smt2)#9954
levnach wants to merge 1 commit into
masterfrom
fix/iss-6174-mbqi-ho-var-enum-timeout-861d52f0281e5cde

Conversation

@levnach

@levnach levnach commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes an MBQI model-finder timeout regression surfaced by the snapshot-regression corpus.

Reported divergence

--- small.expected.out (expected)
+++ produced (current z3)
@@ -1,3 +1,3 @@
 sat
 sat
-unknown
+sat

What current master actually does (the real regression)

The recorded oracle for the 3rd (check-sat) is unknown and the nightly under test produced sat, but current z3 master (HEAD 6fd303c4b) produces neither — it hangs / times out on the 3rd query, even at -T:60. A timeout is strictly worse than both unknown and sat, so this is a genuine, separate regression. smt.mbqi=false returns unknown instantly, localizing the hang to MBQI.

Root cause

HEAD commit 6fd303c4b ("set the auf flag to false in all cases") made two changes to src/smt/smt_model_finder.cpp. Besides the titular m_is_auf = false change, it also added two Boolean productions to the ho_var instantiation-set enumeration in ho_var::populate_inst_sets:

tn.add_production(m.mk_true());
tn.add_production(m.mk_false());

In this benchmark the universally-quantified variable r has the deeply nested array sort (Array (Array Bool Bool) (Array Bool Bool)), whose term_enumeration target sort unfolds to Bool. The extra true/false Bool leaves combine combinatorially with the select/store/equality operators, so term_enumeration::enum_terms blows up and MBQI hangs. The pre-existing max_count = 20 guard bounds how many enumerated terms are inserted, but not the enumerator's internal work. (The same enumeration previously needed bounding in commit d1170d19b, "Bound ho_var term enumeration to fix MBQI timeout regression".)

Confirmed by empirical bisection — rebuild + re-run each state:

state 3rd (check-sat)
both lines reverted (parent) sat (fast)
keep only m_is_auf = false, drop the two productions sat (fast)
keep only the two productions, revert m_is_auf timeout
HEAD 6fd303c4b (both) timeout

The two add_production(true/false) lines are the sole cause; the m_is_auf change is benign for this benchmark.

Fix

Remove just the two auxiliary Boolean productions; the commit's intended m_is_auf = false change is preserved.

                 ast_mark visited;
-                tn.add_production(m.mk_true());
-                tn.add_production(m.mk_false());
                 for (enode *n : ctx->enodes()) {

Validation (rebuilt z3 + re-ran benchmark)

  • Rebuilt the checkout: ./configure && make -C build -j$(nproc).
  • Re-ran inputs/issues/iss-6174/small.smt2 with -T:20 three times → sat / sat / sat every time, no timeout (previously the 3rd query hung).
  • Isolated the 3rd query and ran with model_validate=truesat with a concrete model that passes validation. The formula is genuinely satisfiable (constraints reduce to x2 = true, ar[a] = true, x[r5] = ar).

Note on the oracle

After the fix z3 returns sat for the 3rd query — matching the nightly, and sound (model-validated) — rather than the older recorded oracle unknown. The unknown → sat difference is a benign precision improvement that predates this regression; the bug fixed here is the timeout. Per the snapshot-regression guardrails I did not modify the oracle (small.expected.out); whether to refresh it to sat is a separate human decision.

Opened as a draft for human review.

Generated by Fix a Z3 snapshot-regression divergence · 2.8K AIC · ⌖ 82 AIC · ⊞ 41.2K ·

…mt2)

Commit 6fd303c ("set the auf flag to false in all cases") additionally
added two Boolean productions to the ho_var instantiation-set enumeration
in smt_model_finder.cpp:

    tn.add_production(m.mk_true());
    tn.add_production(m.mk_false());

For benchmarks whose universally-quantified array variable has a deeply
nested array sort that unfolds to a Bool target sort (e.g. the forall over
r : (Array (Array Bool Bool) (Array Bool Bool)) in iss-6174/small.smt2),
these extra Bool leaves combine combinatorially with the select/store/eq
operators in term_enumeration. enum_terms then blows up and MBQI hangs:
the third check-sat, which previously returned quickly, now times out at
-T:20 (and even -T:60). The pre-existing max_count=20 bound limits how many
terms are inserted but not the enumerator's internal work.

Remove the two auxiliary Boolean productions. The commit's titular change
(setting m_is_auf = false in all cases) is benign for this benchmark and is
preserved; only the enumeration-blowup lines are dropped. After the fix the
benchmark returns "sat sat sat" with no timeout, and the third query's model
passes model_validate=true.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant