[snapshot-regression-fix] Fix MBQI ho_var term-enumeration timeout regression (iss-6174/small.smt2)#9954
Draft
levnach wants to merge 1 commit into
Draft
Conversation
…mt2) Commit 6fd303c ("set the auf flag to false in all cases") additionally added two Boolean productions to the ho_var instantiation-set enumeration in smt_model_finder.cpp: tn.add_production(m.mk_true()); tn.add_production(m.mk_false()); For benchmarks whose universally-quantified array variable has a deeply nested array sort that unfolds to a Bool target sort (e.g. the forall over r : (Array (Array Bool Bool) (Array Bool Bool)) in iss-6174/small.smt2), these extra Bool leaves combine combinatorially with the select/store/eq operators in term_enumeration. enum_terms then blows up and MBQI hangs: the third check-sat, which previously returned quickly, now times out at -T:20 (and even -T:60). The pre-existing max_count=20 bound limits how many terms are inserted but not the enumerator's internal work. Remove the two auxiliary Boolean productions. The commit's titular change (setting m_is_auf = false in all cases) is benign for this benchmark and is preserved; only the enumeration-blowup lines are dropped. After the fix the benchmark returns "sat sat sat" with no timeout, and the third query's model passes model_validate=true. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes an MBQI model-finder timeout regression surfaced by the
snapshot-regressioncorpus.iss-6174/small.smt2(inZ3Prover/bench, underinputs/issues/iss-6174/)z3-4.17.0-x64-glibc-2.39Reported divergence
What current
masteractually does (the real regression)The recorded oracle for the 3rd
(check-sat)isunknownand the nightly under test producedsat, but currentz3master(HEAD6fd303c4b) produces neither — it hangs / times out on the 3rd query, even at-T:60. A timeout is strictly worse than bothunknownandsat, so this is a genuine, separate regression.smt.mbqi=falsereturnsunknowninstantly, localizing the hang to MBQI.Root cause
HEAD commit
6fd303c4b("set the auf flag to false in all cases") made two changes tosrc/smt/smt_model_finder.cpp. Besides the titularm_is_auf = falsechange, it also added two Boolean productions to theho_varinstantiation-set enumeration inho_var::populate_inst_sets:In this benchmark the universally-quantified variable
rhas the deeply nested array sort(Array (Array Bool Bool) (Array Bool Bool)), whoseterm_enumerationtarget sort unfolds toBool. The extratrue/falseBool leaves combine combinatorially with theselect/store/equality operators, soterm_enumeration::enum_termsblows up and MBQI hangs. The pre-existingmax_count = 20guard bounds how many enumerated terms are inserted, but not the enumerator's internal work. (The same enumeration previously needed bounding in commitd1170d19b, "Bound ho_var term enumeration to fix MBQI timeout regression".)Confirmed by empirical bisection — rebuild + re-run each state:
(check-sat)sat(fast)m_is_auf = false, drop the two productionssat(fast)m_is_auf6fd303c4b(both)The two
add_production(true/false)lines are the sole cause; them_is_aufchange is benign for this benchmark.Fix
Remove just the two auxiliary Boolean productions; the commit's intended
m_is_auf = falsechange is preserved.Validation (rebuilt z3 + re-ran benchmark)
./configure && make -C build -j$(nproc).inputs/issues/iss-6174/small.smt2with-T:20three times →sat/sat/satevery time, no timeout (previously the 3rd query hung).model_validate=true→satwith a concrete model that passes validation. The formula is genuinely satisfiable (constraints reduce tox2 = true,ar[a] = true,x[r5] = ar).Note on the oracle
After the fix
z3returnssatfor the 3rd query — matching the nightly, and sound (model-validated) — rather than the older recorded oracleunknown. Theunknown → satdifference is a benign precision improvement that predates this regression; the bug fixed here is the timeout. Per the snapshot-regression guardrails I did not modify the oracle (small.expected.out); whether to refresh it tosatis a separate human decision.Opened as a draft for human review.