Design notes: LP/QP routing and performance engineering#70
Merged
Conversation
925d1a6 to
fe1a65e
Compare
jkitchin
added a commit
that referenced
this pull request
Jun 3, 2026
Bumps feral from crates.io 0.9.0 to the latest main. The behavior-relevant change in the window is the inertia-guided MC64 scaling fallback (feral #65, PR #69) plus the issue-63 near-singular-KKT diagnosis work (PR #68); also picks up the #67 thin-large ordering fix (PR #70) and the #72 diagnostics- crate split (build-only). Effect on the issue-#95 robustness set: the entire scrs8-* family (x6) and ch flip from Solved_To_Acceptable to Optimal, with no pounce-side change (feral#63 resolved from the feral side). 24/40 reach Optimal, 36/40 produce the correct answer. Temporary git pin; revert to a crates.io version once feral cuts a release. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The NL format has no dedicated quadratic section, so a QP's quadratic terms land in the nonlinear expression tree and register as nonlinear in the header. Header zeros therefore mean LP only; QP detection requires an AST walk, and the convex/nonconvex split needs a numerical factorization rather than just the Hessian pattern. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Soften "out of scope" to "out of scope for now" and record the architectural choices that keep a future global QP solver additive: preserve NonconvexQp as a distinct routing class, reserve option space, make the future B&B shell branching-rule-agnostic, retain the classifier's Hessian factorization, and lean on cross-node factor reuse. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Add a "Presolve integration" section to the LP/QP routing note: the TNLP-wrapper integration seam (inherited), an IPM-aware reduction policy (Gondzio fill-in argument), the LP/QP reduction catalog, the postsolve/restoration stack as the missing piece, Ruiz equilibration, and an explicit build-vs-wrap call on PaPILO. Includes key references (Andersen & Andersen 1995, Gondzio 1997, Meszaros & Suhl 2003, Gould & Toint 2004, Achterberg et al. 2020, PaPILO 2023, Ruiz 2001) and ties presolve to the Phase 3 competitiveness claim. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Resolve the build-vs-wrap call: wrapping PaPILO (C++) would break POUNCE's pure-Rust guarantee, so extend pounce-presolve in-house, porting PaPILO's transaction-based reduction-stack ideas rather than its code, with rayon for the data-parallel routines (probing, dominated columns, sparsification). https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Reconcile the phasing section with the presolve plan: name Ruiz equilibration as a Phase 2 conditioning prerequisite, add Phase 3.5 (reduction catalog + transaction/postsolve stack, benchmark-driven, after Phase 3 so postsolve is debugged against a trusted solver), and add a presolve row to the cost summary with updated cumulatives. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
…lans - Remove simplex from the in-scope architecture everywhere (decision, crate layout, entry points, option values, outlook diagram) so it is consistently out of scope; IPM-LP covers the LP case. - Add ConvexQcqp ProblemClass routed to the SOCP/conic solver (convex QCQP is SOCP-representable), falling through to NLP until Phase 4, and state the conservative classifier fallback as a correctness guard. - Add verification plans for Phase 3 (Mehrotra/HSDE), Phase 3.5 (presolve, with primal+dual round-trip and per-reduction postsolve tests), and Phases 4-6 (conic); extend the Phase 1 classifier tests. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Phase 3 delivers algorithmic competitiveness (iteration count); Phase 3.5 presolve delivers wall-clock competitiveness on the full benchmark sets. Fix the stale "Phase 3 benchmark" cross-references in the presolve section and the simplex escape hatch accordingly. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
The NL format carries no parametric/warm-start signal, so auto always routes convex LP/QP to IPM; the active-set path is reachable only via explicit qp-active-set or the programmatic warm-start API. Note a future solver.options hint as the seam that would let auto route to pounce-qp. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Phase 2 must build the IPM over the Cone abstraction (only nonneg implemented) so Phases 4-6 are cone extensions, not a rewrite - otherwise the Phase 4 "cheap incremental win" claim is false. Aligns the phasing with the Add-section architecture. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Pull pounce-mu out of the "not needed" list. restoration/l1penalty/ sensitivity are genuinely NLP-only, but every IPM has a barrier parameter; the convex IPM supplies its own Mehrotra sigma*mu centering (distinct from the NLP mu_strategy). Flag reuse-vs-reimplement as a Phase 2/3 open question. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Textbook HSDE assumes a linear objective; the QP path needs the quadratic-objective embedding (as in Clarabel; Goulart & Chen) that carries the P term inside the embedding. Name it so implementers don't assume LP-HSDE transfers verbatim. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
For an LP/QP, P, A, c, b do not depend on x, so the convex entry points extract them once at setup and cache them rather than re-evaluating eval_h/eval_jac_g per iteration like the NLP TNLP driver. Exploiting the constant-matrix structure is part of what justifies the specialized path. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Classifier unit tests should use small committed .nl fixtures (one per class) so they run in CI and a fresh clone, rather than depending on the gitignored Mittelmann/CUTEst caches that only exist after a local fetch/translate. The full benchmark sets stay for Phase 2-3.5 wall-clock validation. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Document the two orthogonal selection axes (solver_selection for problem class vs algorithm for NLP strategy) and that pounce-qp does double duty as both the qp-active-set dispatch target and the inner QP solver of the active-set SQP NLP algorithm. Cross-link the SQP design note. Drop the "in 2025" from the competitiveness heading. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Cover the methodology the routing note omits: the reproducibility-vs-SIMD fork (tiered determinism; bit-equivalence with Ipopt binds the NLP port, not greenfield pounce-convex), vectorization via pulp (stable, runtime dispatch, faer-proven), factorization-first parallelism with rayon and faer as reference/backend, profiling (samply, iai-callgrind), the solution-tolerance correctness invariant, and a two-tier CI gate (iai-callgrind instruction counts for PRs, wall-clock SGM nightly). Cross-link from the routing note's verification section. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Make tier 2 the decided default for pounce-convex (was a recommendation): 2a same-machine run-to-run identity is the firm, CI-asserted requirement; 2b cross-platform identity is aspirational and not allowed to block performance. Note Rust's lack of FMA auto- contraction makes tier 2 cheaper to hold than in C/Fortran. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Spell out the concrete reduction rules an implementer needs: fixed compile-time reduction order/chunk size, no adaptive parallel splits in reductions, all-or-nothing FMA per kernel, single accumulation scheme across the SIMD/scalar tail, and no fast-math reassociation. Clarifies that 2a depends on these and the reproducibility test catches violations. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
- Tier 2 title no longer overpromises cross-platform identity (matches the decision that 2b is aspirational). - Fix dangling cross-note reference in the profiling section. - Correct the SGM claim: the Mittelmann harness produces per-version reports but does not compute SGM yet; that work is to be added. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Implements Phase 1 of the LP/QP routing plan (dev-notes/lp-qp-routing.md):
the routing seam, with no behavior change.
- New pounce-cli/src/dispatch.rs:
- classify_problem: walks the parsed NlProblem's nonlinear Expr trees
to detect LP / ConvexQp / ConvexQcqp / NonconvexQp / Nlp, with the
conservative fallback-to-NLP guard. Convexity split uses a sparse
quadratic-form analysis plus a dependency-free Jacobi PSD test with
tolerance.
- SolverSelection option parsing and resolve_solver, which validates
forced selections against the detected class. auto resolves to NLP
for every class in Phase 1 (documented no-regression path).
- Wire into main.rs: register the solver_selection string option so it
is accepted/validated; classify and validate forced selections after
load; auto/nlp fall through to the existing solve unchanged.
- Tests: 19 unit tests (parsing, resolution, quadratic analysis, PSD,
end-to-end classification) + 4 hermetic CLI integration tests covering
the plan's "forced LP on NLP errors" spec and the no-regression paths.
https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
…ement) Scaffolds the pounce-convex crate and implements a correct infeasible-start primal-dual interior-point method for convex QP in standard form (min ½xᵀPx+cᵀx s.t. Ax=b, Gx≤h); LP is the P=0 case. - Cone-generic per the plan: the iteration is built over a Cone trait (cones/mod.rs) with only the nonnegative orthant implemented (cones/nonneg.rs), so Phases 4-6 (SOCP/exp/pow/SDP) extend rather than rewrite the driver. - Augmented system solved through pounce_linsol::Factorization — the same factor-once/solve-many handle the NLP path uses (feral default, MA57 optional); no new linear-algebra dependency. - Symmetric quasi-definite KKT assembly with static regularization; convergence tested on unregularized residuals so the fixed point is the true QP solution. Validated against 7 QPs with analytically known optima (unconstrained, equality-, inequality-active/inactive, bound-constrained, coupled Hessian, and an LP) plus cone unit tests — all matching to 1e-6. Bare method only: Mehrotra predictor-corrector + HSDE (Phase 3), constant-pattern symbolic reuse, and CLI dispatch wiring are follow-ups. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Adds examples/iter_compare.rs running the convex-QP IPM on the same QPs the CLI exposes (quadratic, bounded-quadratic, eq-quadratic) so iteration counts line up against `pounce --problem <name>`. Finding: the *bare* Phase 2 path-follower (fixed sigma, no predictor- corrector) takes MORE iterations than the NLP path (2 vs 1, 10 vs 6), because the NLP-IPM already has Mehrotra/adaptive-mu while this bare QP method does not. This is the documented motivation for Phase 3: the 30-50% iteration win is IPM-QP *with Mehrotra*, not the bare method. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Replaces the bare fixed-sigma path-follower with Mehrotra predictor-corrector: an affine-scaling predictor, adaptive centering sigma = (mu_aff/mu)^3, and a corrector carrying the second-order ds∘dz term. Predictor and corrector reuse one factorization per iteration. Separate primal/dual fraction-to-boundary step lengths. The cone- specific second-order term lives behind a new Cone::comp_residual_corrector so the driver stays cone-agnostic. Drops the now-unused fixed sigma option. Adds crates/pounce-cli/tests/qp_vs_nlp_iterations.rs: solves the same bound-constrained convex QP through both the NLP filter-IPM and the pounce-convex QP IPM, asserts identical optima and that the QP path uses no more iterations. Result at n=50: QP 10 iters vs NLP 17 (~41% fewer), demonstrating the plan's 30-50% claim. (The win appears on inequality/ bound-constrained QPs with a non-trivial central path; pure-equality or n=2 QPs solve in ~1 Newton step either way.) https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Scaling sweep (small dense → n=100k sparse) showed the IPM iteration count stays flat (9-10) across five orders of magnitude — the algorithm is healthy — but per-iteration cost grew super-linearly because the solver rebuilt the Factorization (symbolic analysis + AMD ordering) every iteration even though the KKT pattern never changes; only the (z,z) scaling diagonal does. Fix: factor the fixed KKT pattern once via a new KktStructure that records the scaling-diagonal positions, then each iteration updates only those O(m) values and calls refactor() (numeric-only, reusing the symbolic factor). At n=10000 this cut per-iteration time ~2.5x; the breakdown confirms the loop no longer re-pays symbolic analysis. Residual super-linear growth is now inside feral's numeric factor/solve (the shared pounce-linsol backbone), not the QP code. Adds examples/scaling.rs (sweep + per-iteration breakdown) and tests/scaling_iterations.rs (regression guard: iteration count stays flat across 50x size growth). All known-optima tests and the QP-vs-NLP comparison still pass. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Completes the Phase 2 dispatch wiring: classified LP and convex-QP .nl inputs now actually route to the pounce-convex interior-point solver instead of falling through to the NLP path. - New qp_extract module: NlProblem → pounce_convex::QpProblem standard form. Objective Hessian (via the classifier's analyze_quadratic) → P, linear obj → c (maximize negated), equality rows → A x = b, inequality/ range rows and finite variable bounds → G x ≤ h (with the .nl infinity sentinel treated as unbounded). 4 unit tests solve extracted QPs/LP to known optima. - resolve_solver: auto now routes Lp/ConvexQp → QpIpm (LP is P=0), everything else → Nlp; unit tests updated. - main.rs: run_convex_qp solves the extracted QP with feral, reports the objective in the user's original sense (sign + dropped constant), and writes a .sol (primal x; constraint duals zero for now — mapping QP (y,z) incl. the bound-row split back to per-constraint multipliers is a follow-up). - Fixture convex_qp.nl + qp_dispatch_end_to_end.rs: auto routes it to pounce-convex, forced qp-ipm solves, nlp path still solves (no regression), and the .sol primal matches the (1,1) optimum. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ
Set up the loop-driven PR #70 hardening workflow: - dev-notes/pr70-hardening.md: the loop's state file — 9-item A–H checklist (routing classification first, as the highest-risk silent-wrong-answer path), per-item template, reusable oracle patterns, and the captured bootstrap baseline. - benchmarks/scripts/compare_pounce_clarabel.py: external validation harness that runs pounce live + Clarabel 0.11.1 on the netlib LP and Maros-Meszaros QP matrices and joins objectives by name (Item B input). Bootstrap baseline captured in the tracker: - cargo test --workspace: green, 1649 passed / 0 failed. - Clarabel comparison: LP 412/419 agree, QP 110/114 agree (both-solved, reldiff < 1e-4). Genuine objective disagreements to triage in Item B: QP YAO (197.70 vs 91.02) and LP capri (2.4%); the rest are near-zero artifacts or borderline tolerance. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e found Hardened classify_problem / hessian_is_psd against silent nonconvex→convex misrouting (the highest-risk path). Added to dispatch::tests (now 29/29): - psd_rejects_small_but_real_negative_curvature: a genuine −1e-3 eigenvalue reads indefinite, not rounded to PSD. - psd_threshold_is_psd_tol: pins the cutoff at ±PSD_TOL (−1e-10 → PSD, −1e-7 → indefinite). - classify_concave_minimize_is_nonconvex: minimize −x0² → NonconvexQp. - classify_qcqp_with_indefinite_constraint_falls_back_to_nlp: convex obj + indefinite quadratic constraint → Nlp (conservative QCQP guard; previously untested). - classify_cancelling_quadratic_objective_is_lp: x0²−x0² → Lp. Finding (informational, not a defect): the ±PSD_TOL band rounds toward convex (min_eig >= −1e-9), so the module doc's "never to the convex path" overstates the actual >= −tol behavior. This is the correct tradeoff — it admits semidefinite Hessians whose smallest eigenvalue computes as a tiny negative under roundoff — and the band is far below the solve error it could cause. Recommend only a doc-wording fix. Recorded in dev-notes/pr70-hardening.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Proved end-to-end that a forced solver_selection not matching the detected class errors at routing and never silently mis-solves to a wrong "optimal". New fixture nonconvex_qp.nl (min x0*x1 s.t. x0+x1=2, 0<=xi<=4): indefinite Hessian, classifies "nonconvex QP"; box bounds keep the NLP fallback clean. qp_dispatch_end_to_end.rs: - forced_qp_ipm_on_nonconvex_qp_errors: convex QP IPM forced on a nonconvex QP exits 2, names class+solver, and asserts the output does NOT contain "Optimal Solution Found". - forced_qp_active_set_on_nonconvex_qp_errors: same for active-set QP. - forced_lp_ipm_on_convex_qp_errors: LP IPM forced on a convex QP errors. - auto_routes_nonconvex_qp_to_nlp_safely: auto routes the nonconvex QP to pounce-nlp (not pounce-convex), solves, exit 0. dispatch_routing.rs: - forced_qp_solvers_on_nlp_error: qp-ipm and qp-active-set forced on a general NLP both exit 2 with a naming message. Full pounce-cli suite green. No defect found: the mismatch is raised before any solve, so no wrong objective is ever produced. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…swer bug
Add a strict objective-agreement gate to the Clarabel comparison harness and
use it to validate netlib LP + Maros-Meszaros QP objectives:
* --check exit nonzero on a *genuine* disagreement: both solvers
certified-solved (pounce SolveSucceeded AND clarabel Solved;
AlmostSolved/Acceptable excluded) yet objectives differ beyond
numpy-isclose |a-b| > atol + rtol*max(|a|,|b|), rtol=atol=1e-3.
* --from-json re-evaluate the committed clarabel_compare_{lp,qp}.json without
re-running both solvers (regression gate / CI).
Across LP (467) + QP (138) the gate flags exactly ONE hard-fail: capri.
HIGH-SEVERITY, MERGE-BLOCKER — capri silent wrong answer in pounce-convex
LP IPM. On the identical generated .nl: nlp -> 2690.0129 (correct: matches
Clarabel, the netlib optimum, and the prior stored value); lp-ipm -> 2625.0118
(wrong by 2.4%, reported SolveSucceeded). Same .nl on both paths, so it is the
convex IPM, not conversion. Hit by DEFAULT routing: `pounce capri.nl` with no
flags routes LP -> convex IPM -> "Optimal Solution Found obj=2625.01", a
confident wrong optimum with no opt-in. --check now guards against it.
Other disagreements triaged benign (YAO: clarabel only AlmostSolved, pounce
matches the published optimum; near-zero optima agree under abs tol; a few LPs
differ only at ~1e-3 convergence slack).
Also de-staled the local benchmarks/lp/pounce.json (gitignored build artifact)
from live results: adlittle 6812.5 -> 225494.96, stocfor1 -13875 -> -41131.98.
Findings recorded in dev-notes/pr70-hardening.md (item B).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…Optimal
Add the missing limit-status and degenerate-input honesty tests; pre-existing
coverage already handled infeasible/unbounded and most edge inputs.
Convex IPM (crates/pounce-convex/tests/infeasibility.rs, 5 -> 8 tests):
* iteration_limit_reported_not_optimal — max_iter=1 on a well-posed box QP
reports IterationLimit, never a premature Optimal (the honest counterpart
of the item-B capri violation).
* fixed_variable_equal_bounds_optimal — a variable pinned by lb==ub solves
Optimal at the fixed value, no spurious infeasible/numerical-failure.
* unconstrained_qp_optimal — a fully unconstrained QP solves to its
stationary point and reports Optimal.
Global B&B (crates/pounce-global/tests/global.rs, 22 -> 24 tests):
* node_limit_reports_status_and_valid_bracket — max_nodes=1 reports NodeLimit
(never Optimal) with a valid lower<=upper bracket and a non-zero gap.
* time_limit_reports_status_and_valid_bracket — max_cpu_time=0 reports
TimeLimit (never Optimal) with a valid bracket.
All green; no new defects. Findings recorded in dev-notes/pr70-hardening.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ar-boundary NumericalFailure logged Add first program-level coverage of the least-tested cones: - sdp_cone.rs: 3 end-to-end SDPs via solve_socp_ipm + ConeSpec::Psd(2) (min-diagonal t=1, max-eigenvalue λmax=3, infeasible-honesty). - exp_cone_vs_nlp.rs: first ConeSpec::Power coverage (geometric mean), n=16 entropy, and a near-boundary GP swept over u. Finding (medium robustness gap, not a wrong answer): the non-symmetric/ PSD drivers return NumericalFailure near the cone boundary on otherwise solvable/infeasible programs (exp GP at u=3; PSD infeasibility cert). Safety property holds everywhere — never a false Optimal — which the tests assert; objectives are checked wherever the driver converges. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Near the cone boundary (s∘z → 0) the NT scaling and KKT factorization in the symmetric HSDE driver (`hsde.rs`, SOC/orthant/PSD) can break down a hair short of `tol`. When that happens the current iterate is often *already* essentially optimal — the unregularized KKT residuals are tiny — yet the driver reported a spurious `NumericalFailure`. The non-symmetric driver (`hsde_nonsym.rs`, exp/power cones) already guarded its factorization-failure sites with an Ipopt-style "acceptable level" tier (`res < 1e3·tol`); the symmetric driver did not, so the two were inconsistent — the symmetric one discarded usable SOC/orthant iterates the non-symmetric one would have kept. Port the same tier into the symmetric driver's four factorization/solve failure sites. On the 132-instance CBLIB conic corpus this recovers 12 of 34 `NumericalFailure` instances (all SOC/orthant, byte-identical objectives) — corpus goes 71→83 pass, 34→22 fail. The remaining 22 are genuine (9 exp-cone gap-laggards, slay06h/06m divergence, expdesign_D 0-iter). Shared-path safety: `hsde.rs` also backs the global B&B relaxation LPs. Re-ran the 104-model GLOBALLib proven-optimum suite — bit-identical to baseline (59 OK / 45 TIMEOUT / 0 WRONG, zero per-instance status, objective, or node-count changes). 185 convex unit tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…erial==parallel
Add the core spatial-B&B soundness checks (global.rs, 24 -> 27 tests):
- certified_lower_bound_never_exceeds_true_global: lb <= f* at a sweep of
node caps over 5 known-optima nonconvex problems (quartic, bilinear,
six-hump camel, xy>=4, trilinear). Stronger than the prior lb<=incumbent
bracket: an invalid relaxation could pass that yet exceed the truth and
fathom the optimal box.
- each_relaxation_yields_valid_global_lower_bound: re-enables one of
{alphabb, rlt, multilinear, obbt, sandwich} at a time and re-checks
lb<=f* under partial search, isolating each generator's validity.
- parallel_matches_serial_constrained: 4-thread node pool vs serial on a
constrained nonconvex program; same optimum, constraint honored.
No defects: every certified lower bound stayed a valid global bound across
all problems, caps, and per-relaxation configs; serial == parallel.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Existing presolve tests assert primal+dual recovery one reduction at a time. Add the missing case: a single heavily-reduced QP firing four distinct reductions at once. heavily_reduced_mixed_reductions_recovers_primal_and_dual (presolve_roundtrip.rs, 6 -> 7 tests): one 6-var/2-eq/1-ineq QP that simultaneously triggers a fixed variable, a free-column singleton (substituted out), a dominated column (fixed to a bound), and a binding inequality, collapsing to a <=3-var core (checked via stats()). Verifies full recovery vs a direct no-presolve solve: all six primal x, the objective, and the complete dual (equality y, inequality z, bound z_lb/z_ub) to 1e-5. New assert_original_kkt helper re-checks the recovered (x,y,z,z_lb,z_ub) against the ORIGINAL KKT system, so a mis-recovered dual on any substituted variable would show as a nonzero stationarity residual (complementarity guarded to finite bounds). No defects: postsolve reconstructs the full primal and dual exactly. Suite green: roundtrip 7, reductions 26, forcing 6, bound_tightening 4, conic 2. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Item G has three concerns; verified each, added coverage for the one real gap. (1) minimize() auto-routing and (2) JAX differentiable-QP gradients vs finite differences were already well-covered: test_minimize_autoroute.py (8 tests: convex QP/LP routing, NLP stays put, forced mismatch raises, finite-diff routing) and test_qp_jax.py / test_qp_sensitivity.py (reverse -mode gradients vs FD for c, b, h, P, G, A). 38 G-relevant pytest cases pass. (3) --json-output schema uniformity across solver paths was the real gap. The JSON report was tested only on the NLP path (json_report.rs) plus the convex QP-IPM path (qp_dispatch_end_to_end.rs); nothing asserted the schema was identical in shape across paths, and the LP-IPM path had no JSON coverage at all. Add json_report.rs::json_schema_is_uniform_across_ solver_paths (4 -> 5 tests): runs one invariant set over three distinct dispatch paths — NLP (parametric.nl), convex QP-IPM (convex_qp.nl, qp-ipm), convex LP-IPM (lp_afiro.nl, lp-ipm) — asserting for each: schema tag, solver.name == "pounce", non-empty result_id, non-empty + all-finite solution.x, finite objective == statistics.final_objective (rel 1e-9), and n_variables == x.len(). A divergent or placeholder report would now fail here. New fixture crates/pounce-cli/tests/fixtures/lp_afiro.nl (netlib afiro, 32 vars, f* = -464.753) — the LP-IPM path's first end-to-end JSON fixture. No defects: all three paths emit the identical schema. json_report green (5 tests); 38 G-relevant pytest cases pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…, 2 pre-existing defects fixed Item H (final hardening item) — build/clippy/full-suite hygiene. - cargo test --workspace: 1675 passed, 0 failed (with these edits in place; identical to pre-edit run, so the changes are behavior-preserving). - pytest python/tests: 286 passed, 0 failed. - Zero rustc warnings. - Made the PR70-new production libs (pounce-convex, pounce-global) clean under clippy::all: 13 behavior-preserving fixes (needless_borrow, identity_op, manual loops -> iterator zips, neg_cmp NaN-safety guarded with targeted allows + comments, large_enum_variant/collapsible_match documented allows). Two pre-existing defects found and fixed: - MEDIUM (build hygiene): stale _pounce.abi3.so made 7 test_global.py cases fail with a max_cpu_time TypeError; the Rust binding was correct. Fixed by rebuilding; recorded as a CI build-order note. - LOW (over-tight test): test_qp_factorization_build_once_solve_many asserted atol=1e-6 on two independent IPM solves of a bound-active QP whose optimum is a vertex; the IPM only approaches the boundary asymptotically, so the two runs legitimately differ ~1e-5. Loosened to 1e-4 with an explanatory comment. Proven pre-existing by reproducing on clean HEAD. Out of scope (documented in the H Findings): ~600 pre-existing unwrap/expect policy warnings and shared-crate clippy::all warnings are not addressed here; literal workspace-zero-warnings needs a separate cleanup pass. A-H now all complete. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…answer)
The capri netlib LP returned a confident wrong "optimal" (2625.0118 vs the
correct 2690.0129) through the convex LP IPM and default routing — a HIGH /
merge-blocking silent wrong answer.
Root cause is in presolve postsolve, not the IPM. capri's presolve emits a
FreeColSingleton whose substitution formula
x_col = (b_r - Σ_{j≠col} a_j x_j) / a_col
reads a variable that a *separate* FixedVar (singleton equality row) reduction
sets. The old postsolve restored primal values in a single reverse-LIFO pass,
so the free singleton was computed from its formula before its fixed-var
dependency had been restored — producing a point that violates the consumed
equality row and a wrong objective reported as optimal.
Fix: two-pass primal recovery in postsolve_once. Pass 1 (reverse) restores all
constant-valued reductions (FixedVar, FreeColumnFixed, ForcingRow,
DominatedColumn); pass 2 (forward) restores formula-based FreeColSingleton
values against the now-restored neighbours.
Verified: capri -> 2690.012914 on all paths (nlp, lp-ipm, default routing),
postsolved point fully feasible; adlittle/afiro/blend/sc50a/sc105 unchanged and
correct. Adds permanent regression test
free_singleton_depends_on_fixed_var_postsolve_order. Full pounce-convex suite
green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…bility Two robustness fixes in the convex conic stack, both addressing the MEDIUM "NumericalFailure where a clean status exists" defect from dev-notes/pr70-hardening.md. Neither was a wrong-answer bug — the driver never reported a false Optimal — but both returned NumericalFailure where a correct solve/certificate was available. Exp cone (feasible-but-fails): the non-symmetric HSDE driver stalled near the cone boundary (u=3 in `min e^u+e^-u`), landing res at ~1.16e-5, just over the 1e3*tol acceptance band because the gap term is amplified by a small tau. Track the best (lowest-residual) iterate and, if the driver would otherwise fail, accept it as Optimal when that residual is within reduced accuracy (sqrt(tol)=1e-4) — the ECOS/Clarabel/SCS "solved to reduced accuracy" convention. Infeasible/unbounded runs never reach res<1e-4, and the clean convergence test at tol is unchanged. PSD cone (infeasible -> wrong status): detect_infeasibility validated the Farkas multiplier z componentwise (zi >= -tol), which is the dual-cone test for the orthant only. A PSD block's dual cone is smat(z) >= 0, so a legitimate certificate was rejected and the solve fell through to NumericalFailure. Add a self-dual `in_dual_cone(z, tol)` to the Cone trait (orthant, SOC, PSD, composite) and a cone-aware detect_infeasibility_cone; the symmetric drivers (ipm::run_ipm, hsde) now check z against the actual dual cone. The non-symmetric path keeps the componentwise default. The infeasible SDP now returns PrimalInfeasible (sdp_cone.rs assertion tightened to == PrimalInfeasible); near_boundary_gp_matches_nlp solves at every u including u=3. Full pounce-convex + exp_cone_vs_nlp suites green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The full pytest suite can fail with cryptic `TypeError: ... unexpected keyword argument` errors when an in-place `python/pounce/_pounce*.so` (left by an earlier `maturin develop`) shadows the current Rust binding — the artifact is behind the source. CI is already immune (the python-test job builds a fresh wheel each run), so this is a local-dev hazard. - python/tests/conftest.py: a pytest_configure guard that, for an in-repo editable build, compares the extension's mtime against the newest Rust source under crates/ and fails fast with an actionable "run maturin develop" message instead of letting the suite die confusingly. Skipped for wheel/site-packages installs (no in-repo .so); bypass with POUNCE_SKIP_EXT_STALE_CHECK=1. - Makefile: `make python-test` (+ `python-ext`) rebuilds the extension in place, then runs pytest, so the documented local path rebuilds first. Verified: stale .so aborts collection with rebuild instructions; a fresh artifact collects all 281 tests; the bypass env var skips the guard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…start Introduce the bounded-variable revised simplex LP solver that backs OBBT in spatial branch-and-bound. Two pieces of substance: Phase 6.2 — basis engine behind the `BasisEngine` seam (basis.rs): - `FaerBasis`: faer sparse LU of the base basis B0 + a product-form (eta) file of per-pivot rank-1 updates (B^-1 = E_t...E_1 B0^-1), refactoring every REFACTOR_INTERVAL pivots. faer owns the numerically-delicate sparse LU; the rank-1 update — which no general LU library provides — stays in-house. - A probe solve after factorization catches numerically-singular-but- structurally-full bases that faer's sp_lu flags only structurally. - `DenseBasis` retained under cfg(test) as the lockstep correctness oracle. Phase 6.3 — dual-simplex warm-start across bound changes (simplex.rs): - `Simplex::solve_bounds(lb, ub)`: a parent->child box tightening leaves the optimal basis dual-feasible, so a bounded-variable dual simplex restores primal feasibility in a few pivots instead of a cold Phase I/II. Complements `solve_objective` (the within-node objective-flip lever). - Reports Infeasible when the dual is unbounded — never a wrong "optimal" — and falls back to a guaranteed-correct cold solve if the dual phase stalls. Validation: 26 tests (lockstep faer-vs-dense oracle under randomized pivots, warm-vs-cold parity across bound tightenings, infeasibility detection, OBBT sweeps) plus the un-parked HiGHS ill-scaled ex4_1_2 regression. clippy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pipeline trims
Drive down per-node cost in the spatial branch-and-bound global optimizer
(Phases 2-4 of the perf plan), holding the 0-WRONG invariant — every lever is
perf/robustness only and defaults are behavior-preserving.
Phase 2 — schedule + budget OBBT (new opt-in GlobalOptions + CLI knobs):
- obbt_max_depth: gate the 2n-LP sweep to shallow nodes (default usize::MAX).
- obbt_interval: run OBBT every k-th node (default 1; approximate under the
parallel pool, sound either way).
- obbt_max_vars: budgeted partial sweep over the widest-box variables (2k
instead of 2n LPs; default usize::MAX). All three only throttle tightening,
never soundness.
Phase 3 — warm-start the IPM instead of cold-starting:
- Carry the parent relaxation primal/dual on the frontier node and seed the
child lower-bound solve via solve_qp_ipm_warm.
- Warm-start each sandwich re-solve from the previous round.
- Conservative guards throughout: dimensional compatibility check + cold
fallback on any non-Optimal warm result, so bound tightness is a strict
superset of today's.
Phase 4 — cut the fixed small-n pipeline cost:
- Depth-aware local_solve_iters (halve every 4 levels, floored at 10).
- Adaptive sandwich short-circuit on negligible marginal gain.
- Reuse the final OBBT-pass relaxation as the node lower-bound relaxation when
the box is unchanged, saving a build_relaxation per node (bit-identical:
rebuilt per pass, peeled cutoff cut, multilinear-guarded).
Also wires the revised-simplex OBBT engine (ObbtLp::Simplex) behind the
off-by-default `simplex-obbt` feature. It is PARKED as unsound on ill-scaled
relaxation LPs (returns wrong certified optima — see ObbtLp::Simplex docs); with
the feature off the request transparently downgrades to the sound IPM sweep and
pounce-simplex is not linked. IPM remains the default OBBT engine.
Validation (per the loop's small-problem policy; no GLOBALLib timing sweep):
all default-feature Rust suites green across pounce-global / pounce-convex /
pounce-simplex / pounce-cli, with every certified optimum and exact node count
unchanged ⇒ 0 WRONG. Full 104-model OK-count sweep deferred to a manual run.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…wer bug) A relaxation LP can carry a coefficient that has collapsed to numerical noise (e.g. a McCormick secant slope going to ~1e-44 at a degenerate box edge). Geometric-mean equilibration let such an entry drag the row/column geometric mean toward zero and inflate the scale by 1e10–1e20, which distorted the reduced-cost tolerances enough that the revised simplex declared a *wrong* vertex optimal. Observed end to end on the quartic `x^4 - 3x^2`: on the OBBT child box [-2, ~0] the simplex returned `min x0 = -0.375` instead of the true `-1.846`, so OBBT tightened the box to [-0.375, 0], cut off the global minimizer x ~= -1.2247, and certified `-0.402` instead of `-2.25`. This made `simplex_obbt_matches_ipm_certified_optimum` fail. Fix: `EQUILIBRATE_DROP` — entries negligible relative to their row/column max are excluded from the geometric mean (computed via a max sub-pass then a min sub-pass over significant entries only). col_scale[0] drops from ~3.4e10 to O(1) and the simplex reaches the true optimum. - Add tests/degenerate_mccormick_scaling.rs: the exact captured LP, cold and warm-sweep, guarding both against the wrong vertex. - The 0-WRONG gate `simplex_obbt_matches_ipm_certified_optimum` now passes. - Refresh the stale `ObbtLp::Simplex` "PARKED — not sound" docs: the engine is sound on all known cases; it stays feature-gated pending a wider GLOBALLib cross-check before becoming the default. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wiring the global solver's per-node pieces (OBBT sweep, simplex/IPM warm-starts, branching) needs a seconds-long edit->run loop, not the 25-min full sweep. Add: - compare_obbt_engines.py: runs both OBBT LP engines (ipm + simplex) over a model set and asserts they certify identical optima, failing (nonzero exit) on any WRONG verdict or engine-vs-engine disagreement. The soundness gate before graduating simplex-obbt to default. - tiers/micro.txt (12 models, ~2.5s both engines): the inner loop, curated to cover root-only (OBBT/relaxation/local-solve) and branching (tree/incumbent) across a range of n, every entry sub-second. - tiers/fast.txt (34 models, every IPM<1s solve): broader fast regression. - run_globallib.py --stems-file: run a tier file. - make globallib-micro / globallib-fast: build the simplex-obbt binary and run the cross-check. Micro tier currently: 12/12 correct, 0 wrong, 0 engine disagreements. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the single-pass min-ratio primal ratio test with a Harris (1973)
two-pass test layered on EXPAND (Gill, Murray, Saunders & Wright 1989):
* Pass 1 computes the largest step keeping every basic variable within a
feasibility tolerance of its bound; because each blocking numerator gets
the tolerance as slack, the step is strictly positive even at a
degenerate vertex.
* Pass 2 selects, among rows whose true breakpoint is within that step,
the largest pivot magnitude (numerical stability) rather than merely the
first to bind.
The feasibility tolerance grows by EXPAND_TAU each iteration up to FEAS_TOL
and resets at each refactor/recompute, guaranteeing forward progress and
breaking cycles. Bland's rule is demoted to a finite-termination backstop.
This hardens the LP foundation the spatial B&B OBBT inner loop rests on.
All 24 pounce-simplex oracle/unit tests pass (Klee-Minty, warm-start
sweeps, HiGHS-checked ill-scaled OBBT). Note: the simplex-OBBT path still
stalls on the degenerate GLOBALLib ex9_1_2 root LPs, so further Track-A
work (bound-flipping long step) remains before simplex graduates as the
default OBBT engine.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The spatial branch-and-bound global optimizer (pounce-global) is not ready to ship and was blocking the LP/QP/convex work from merging. Its full tree is preserved on the `feature/global` branch; here it is removed from `merge/pr70-reconcile` so the convex stack can land cleanly. Removed: - crates/pounce-global (entire crate) + its workspace membership - pounce-cli global wiring: SolverChoice::Global / SolverSelection::Global, the global dispatch + option-registration paths, tree_debug_cli test, and the now-dead nl_constraint_bound sentinel helper (global-only) - pounce-py global_opt bindings (mod + solve_global registration) - python: global_opt.py, test_global.py, and the minimize_global / GlobalResult exports from the package surface - the three global dispatch_routing tests (solver_selection=global now correctly returns OPTION_INVALID) Workspace builds clean; pounce-cli dispatch suite green (27 tests). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pounce-simplex was the in-house LP-engine foundation, but its only consumer was pounce-global's OBBT bridge (simplex_bridge.rs), which left with the global strip. On this branch nothing depends on it: no crate declares it as a dependency and no source references `pounce_simplex` outside the crate. The convex LP/QP path ships on the pounce-convex HSDE IPM, which is SOTA-competitive for cold-start LP/QP; simplex's payoff (warm-started node LPs for B&B/OBBT, basic solutions, crossover) belongs with the global/warm-start track. The crate (and its OBBT-derived regression tests) is preserved on the `feature/global` branch — it is shared history below the split, so the global track keeps it intact. Also drops a dangling `pounce-global` workspace.dependencies entry left over from the earlier global strip. Workspace builds clean; no remaining references in code or manifests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Move the feral pin from bb74821 (the v0.4.0 baseline) to main HEAD 11fb4b9, carrying the issue #80 MC64/scaling work (Hungarian-heap reuse across columns, localized dense-column cost, ldlt_compress profiling). Decisive on the badly-scaled AC power-flow Jacobians: GAMS powerflow testset goes 19/28 -> 24/28 solved (+5: pf14,18,25,27,28), 0 regressions, bit-identical objectives on all 19 jointly-solved, and a 3.12x geomean speedup (1.56x-12.37x; the large pf10/20/22/24/26 solves see 8-12x). Full report in gams/nlpbench/BENCHMARK_REPORT_powerflow_feral-head.md. NOTE: this git pin blocks the crates.io publish of the pounce crates until feral cuts a release carrying these commits. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Operational guide for the repo: the three release surfaces (PyPI pounce-solver, PyPI pyomo-pounce, the 16 manual crates.io crates), the hand-made GitHub Release step, and the crates.io User-Agent gotcha for checking published versions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Strategy and positioning material: the pounce vision/positioning note, the discopt co-designed-integration writeup, the education & research "introspectable, LLM-explainable solver" angle, a PyTorch-frontend issue sketch, and the LinkedIn v0.4.0 release-announcement draft. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…path Completes the pounce-global strip: removes the orphaned interactive branch-and-bound tree debugger and its supporting trait surface, which were left behind when the spatial global solver was moved to feature/global. - crates/pounce-cli/src/tree_debug.rs (~522 lines): the REPL for `pounce --solver global --debug` — a flag that no longer exists. It was pub-mod-exported but never instantiated (no caller in main/dispatch); pure dead code reachable only through the removed global solver. - crates/pounce-common/src/debug.rs: the tree-only debug API (TreeCheckpoint, PruneReason, TreeDebugState, TreeDebugHook) consumed solely by that orphan. The shared iteration-loop DebugState/DebugHook/ DebugAction surface is untouched. The DebugHook::arm() doc no longer cites the removed tree debugger. All of this remains intact on feature/global. Workspace + all test targets compile clean; pounce-common (146) and pounce-cli suites green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Rename the `compute_p_solves_each_a_column_against_K` test to `..._against_k_matrix`, clearing the last `non_snake_case` warning so `cargo build`/`cargo test` are warning-clean. Test still passes (38 in pounce-sensitivity lib). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
jkitchin
pushed a commit
that referenced
this pull request
Jun 8, 2026
…crate Per the decision to (1) defer simplex to the global-optimization work and (2) build the robust sparse LU inside feral rather than a pounce-lu crate: - Decision 2 rewritten: simplex is NOT a convex-MI dependency. It arrives only with the global LP-relaxation nodes. No pounce-lu / pounce-simplex crates -- sparse LU lands in feral beside its LDLᵀ (one backend behind pounce-linsol for both symmetric IPM/QP and unsymmetric simplex systems); the simplex driver is a later module in pounce-convex. Reconciles the old 'simplex is back' reversal with decision 8 (not chasing MILP) and matches the PR-#70 strip of the built simplex. - Renumbered phases: convex-MI = 0-2 (plumbing, B&B+MIQP, cuts+presolve, no simplex/LU), global = 3-5 (relax, spatial B&B [simplex/LU land here], MINLP-global), smoothed gradients = 6. Updated cost table, crate layout, both diagrams, crate skeletons, and all cross-references. - Refreshed stale 'landing on claude/amazing-mayer-Xd0ag' references to 'merged in PR #70' now that the LP/QP branch is on main.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds two comprehensive design documents that specify the architecture for routing LP/QP problems to specialized solvers and the performance engineering methodology for achieving competitive wall-clock performance.
Changes
dev-notes/lp-qp-routing.md(extensively revised)Core routing architecture:
pounce-convex(IPM-LP/QP + conic),pounce-qp(active-set), andpounce-algorithm(NLP-IPM)ProblemClassenum to includeConvexQcqp(quadratically-constrained QP) alongsideLp,ConvexQp,NonconvexQp, andNlpProblem classification:
Solver dispatch and options:
lp-simplexfromsolver_selectionvalues; keptauto,nlp,lp-ipm,qp-ipm,qp-active-set.nlfiles)Active-set SQP relationship:
solver_selection(problem class) vs.algorithm(NLP strategy)pounce-qpserves both rolesConstant-matrix exploitation:
P,A,c,bonce at setup and cache them, not re-evaluate per iteration like the NLP pathPresolve integration (major new section):
Phasing updates:
Coneabstraction from the start (onlynonnegimplemented initially)Nonconvex QP / global optimization:
qp-globaltargetVerification section:
.nlfixtures for unit tests (hermetic, no gitignored cache dependency)dev-notes/performance-engineering.md(https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ