Design notes: LP/QP routing and performance engineering by jkitchin · Pull Request #70 · jkitchin/pounce

jkitchin · 2026-05-28T23:59:09Z

Summary

This PR adds two comprehensive design documents that specify the architecture for routing LP/QP problems to specialized solvers and the performance engineering methodology for achieving competitive wall-clock performance.

Changes

`dev-notes/lp-qp-routing.md` (extensively revised)

Core routing architecture:

Clarified the three-crate structure: pounce-convex (IPM-LP/QP + conic), pounce-qp (active-set), and pounce-algorithm (NLP-IPM)
Expanded ProblemClass enum to include ConvexQcqp (quadratically-constrained QP) alongside Lp, ConvexQp, NonconvexQp, and Nlp
Removed simplex from the Phase 2 scope (deferred as out-of-scope with escape hatch)

Problem classification:

Detailed the NL format parsing strategy: header-based LP fast-reject, then AST walk for QP/QCQP/NLP distinction
Specified PSD testing via numerical factorization (not pattern-based) with conservative fallback to NLP on inconclusive tests
Clarified that quadratic terms in the NL format appear in the nonlinear AST, not a dedicated section

Solver dispatch and options:

Removed lp-simplex from solver_selection values; kept auto, nlp, lp-ipm, qp-ipm, qp-active-set
Explained why active-set QP is opt-in only (no warm-start signal in .nl files)

Active-set SQP relationship:

Added new section clarifying the orthogonal axes: solver_selection (problem class) vs. algorithm (NLP strategy)
Positioned active-set SQP as an NLP algorithm, not a convex-QP solver; pounce-qp serves both roles

Constant-matrix exploitation:

New section explaining why convex solvers must extract P, A, c, b once at setup and cache them, not re-evaluate per iteration like the NLP path

Presolve integration (major new section):

Detailed the inherited TNLP-wrapper integration seam
Specified IPM-aware reduction policy (fill-in bounds per Gondzio/Mészáros)
Catalogued LP/QP reductions grounded in literature (Andersen & Andersen, Gould & Toint, Achterberg et al., PaPILO)
Committed to pure-Rust implementation (porting PaPILO's transaction-based postsolve ideas, not wrapping the C++ library)
Added rayon parallelism for dominated-column detection and constraint sparsification

Phasing updates:

Inserted Phase 3.5 (presolve) between Phase 3 and Phase 4, with 2–4 month estimate
Clarified Phase 2 must build the Cone abstraction from the start (only nonneg implemented initially)
Specified Phase 3 must use the quadratic-objective HSDE variant (Clarabel/Goulart–Chen style), not textbook LP-HSDE
Emphasized Phase 3.5 is required for benchmark competitiveness (2–10× factor), though not for correctness

Nonconvex QP / global optimization:

Reframed as deliberately reachable (not out-of-scope forever): architectural choices preserve the dispatch seam for a future qp-global target
Noted that B&B lower-bounding subproblems are the conic family already planned

Verification section:

Expanded with per-phase correctness checks, including round-trip primal+dual postsolve validation
Specified small committed .nl fixtures for unit tests (hermetic, no gitignored cache dependency)
Added cross-solver objective-value checks against Clarabel/MOSEK for conic phases

`dev-notes/performance-engineering.md` (

https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Bumps feral from crates.io 0.9.0 to the latest main. The behavior-relevant change in the window is the inertia-guided MC64 scaling fallback (feral #65, PR #69) plus the issue-63 near-singular-KKT diagnosis work (PR #68); also picks up the #67 thin-large ordering fix (PR #70) and the #72 diagnostics- crate split (build-only). Effect on the issue-#95 robustness set: the entire scrs8-* family (x6) and ch flip from Solved_To_Acceptable to Optimal, with no pounce-side change (feral#63 resolved from the feral side). 24/40 reach Optimal, 36/40 produce the correct answer. Temporary git pin; revert to a crates.io version once feral cuts a release. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The NL format has no dedicated quadratic section, so a QP's quadratic terms land in the nonlinear expression tree and register as nonlinear in the header. Header zeros therefore mean LP only; QP detection requires an AST walk, and the convex/nonconvex split needs a numerical factorization rather than just the Hessian pattern. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Soften "out of scope" to "out of scope for now" and record the architectural choices that keep a future global QP solver additive: preserve NonconvexQp as a distinct routing class, reserve option space, make the future B&B shell branching-rule-agnostic, retain the classifier's Hessian factorization, and lean on cross-node factor reuse. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Add a "Presolve integration" section to the LP/QP routing note: the TNLP-wrapper integration seam (inherited), an IPM-aware reduction policy (Gondzio fill-in argument), the LP/QP reduction catalog, the postsolve/restoration stack as the missing piece, Ruiz equilibration, and an explicit build-vs-wrap call on PaPILO. Includes key references (Andersen & Andersen 1995, Gondzio 1997, Meszaros & Suhl 2003, Gould & Toint 2004, Achterberg et al. 2020, PaPILO 2023, Ruiz 2001) and ties presolve to the Phase 3 competitiveness claim. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Resolve the build-vs-wrap call: wrapping PaPILO (C++) would break POUNCE's pure-Rust guarantee, so extend pounce-presolve in-house, porting PaPILO's transaction-based reduction-stack ideas rather than its code, with rayon for the data-parallel routines (probing, dominated columns, sparsification). https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Reconcile the phasing section with the presolve plan: name Ruiz equilibration as a Phase 2 conditioning prerequisite, add Phase 3.5 (reduction catalog + transaction/postsolve stack, benchmark-driven, after Phase 3 so postsolve is debugged against a trusted solver), and add a presolve row to the cost summary with updated cumulatives. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

…lans - Remove simplex from the in-scope architecture everywhere (decision, crate layout, entry points, option values, outlook diagram) so it is consistently out of scope; IPM-LP covers the LP case. - Add ConvexQcqp ProblemClass routed to the SOCP/conic solver (convex QCQP is SOCP-representable), falling through to NLP until Phase 4, and state the conservative classifier fallback as a correctness guard. - Add verification plans for Phase 3 (Mehrotra/HSDE), Phase 3.5 (presolve, with primal+dual round-trip and per-reduction postsolve tests), and Phases 4-6 (conic); extend the Phase 1 classifier tests. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Phase 3 delivers algorithmic competitiveness (iteration count); Phase 3.5 presolve delivers wall-clock competitiveness on the full benchmark sets. Fix the stale "Phase 3 benchmark" cross-references in the presolve section and the simplex escape hatch accordingly. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

The NL format carries no parametric/warm-start signal, so auto always routes convex LP/QP to IPM; the active-set path is reachable only via explicit qp-active-set or the programmatic warm-start API. Note a future solver.options hint as the seam that would let auto route to pounce-qp. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Phase 2 must build the IPM over the Cone abstraction (only nonneg implemented) so Phases 4-6 are cone extensions, not a rewrite - otherwise the Phase 4 "cheap incremental win" claim is false. Aligns the phasing with the Add-section architecture. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Pull pounce-mu out of the "not needed" list. restoration/l1penalty/ sensitivity are genuinely NLP-only, but every IPM has a barrier parameter; the convex IPM supplies its own Mehrotra sigma*mu centering (distinct from the NLP mu_strategy). Flag reuse-vs-reimplement as a Phase 2/3 open question. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Textbook HSDE assumes a linear objective; the QP path needs the quadratic-objective embedding (as in Clarabel; Goulart & Chen) that carries the P term inside the embedding. Name it so implementers don't assume LP-HSDE transfers verbatim. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

For an LP/QP, P, A, c, b do not depend on x, so the convex entry points extract them once at setup and cache them rather than re-evaluating eval_h/eval_jac_g per iteration like the NLP TNLP driver. Exploiting the constant-matrix structure is part of what justifies the specialized path. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Classifier unit tests should use small committed .nl fixtures (one per class) so they run in CI and a fresh clone, rather than depending on the gitignored Mittelmann/CUTEst caches that only exist after a local fetch/translate. The full benchmark sets stay for Phase 2-3.5 wall-clock validation. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Document the two orthogonal selection axes (solver_selection for problem class vs algorithm for NLP strategy) and that pounce-qp does double duty as both the qp-active-set dispatch target and the inner QP solver of the active-set SQP NLP algorithm. Cross-link the SQP design note. Drop the "in 2025" from the competitiveness heading. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Cover the methodology the routing note omits: the reproducibility-vs-SIMD fork (tiered determinism; bit-equivalence with Ipopt binds the NLP port, not greenfield pounce-convex), vectorization via pulp (stable, runtime dispatch, faer-proven), factorization-first parallelism with rayon and faer as reference/backend, profiling (samply, iai-callgrind), the solution-tolerance correctness invariant, and a two-tier CI gate (iai-callgrind instruction counts for PRs, wall-clock SGM nightly). Cross-link from the routing note's verification section. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Make tier 2 the decided default for pounce-convex (was a recommendation): 2a same-machine run-to-run identity is the firm, CI-asserted requirement; 2b cross-platform identity is aspirational and not allowed to block performance. Note Rust's lack of FMA auto- contraction makes tier 2 cheaper to hold than in C/Fortran. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Spell out the concrete reduction rules an implementer needs: fixed compile-time reduction order/chunk size, no adaptive parallel splits in reductions, all-or-nothing FMA per kernel, single accumulation scheme across the SIMD/scalar tail, and no fast-math reassociation. Clarifies that 2a depends on these and the reproducibility test catches violations. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

- Tier 2 title no longer overpromises cross-platform identity (matches the decision that 2b is aspirational). - Fix dangling cross-note reference in the profiling section. - Correct the SGM claim: the Mittelmann harness produces per-version reports but does not compute SGM yet; that work is to be added. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Implements Phase 1 of the LP/QP routing plan (dev-notes/lp-qp-routing.md): the routing seam, with no behavior change. - New pounce-cli/src/dispatch.rs: - classify_problem: walks the parsed NlProblem's nonlinear Expr trees to detect LP / ConvexQp / ConvexQcqp / NonconvexQp / Nlp, with the conservative fallback-to-NLP guard. Convexity split uses a sparse quadratic-form analysis plus a dependency-free Jacobi PSD test with tolerance. - SolverSelection option parsing and resolve_solver, which validates forced selections against the detected class. auto resolves to NLP for every class in Phase 1 (documented no-regression path). - Wire into main.rs: register the solver_selection string option so it is accepted/validated; classify and validate forced selections after load; auto/nlp fall through to the existing solve unchanged. - Tests: 19 unit tests (parsing, resolution, quadratic analysis, PSD, end-to-end classification) + 4 hermetic CLI integration tests covering the plan's "forced LP on NLP errors" spec and the no-regression paths. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

…ement) Scaffolds the pounce-convex crate and implements a correct infeasible-start primal-dual interior-point method for convex QP in standard form (min ½xᵀPx+cᵀx s.t. Ax=b, Gx≤h); LP is the P=0 case. - Cone-generic per the plan: the iteration is built over a Cone trait (cones/mod.rs) with only the nonnegative orthant implemented (cones/nonneg.rs), so Phases 4-6 (SOCP/exp/pow/SDP) extend rather than rewrite the driver. - Augmented system solved through pounce_linsol::Factorization — the same factor-once/solve-many handle the NLP path uses (feral default, MA57 optional); no new linear-algebra dependency. - Symmetric quasi-definite KKT assembly with static regularization; convergence tested on unregularized residuals so the fixed point is the true QP solution. Validated against 7 QPs with analytically known optima (unconstrained, equality-, inequality-active/inactive, bound-constrained, coupled Hessian, and an LP) plus cone unit tests — all matching to 1e-6. Bare method only: Mehrotra predictor-corrector + HSDE (Phase 3), constant-pattern symbolic reuse, and CLI dispatch wiring are follow-ups. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Adds examples/iter_compare.rs running the convex-QP IPM on the same QPs the CLI exposes (quadratic, bounded-quadratic, eq-quadratic) so iteration counts line up against `pounce --problem <name>`. Finding: the *bare* Phase 2 path-follower (fixed sigma, no predictor- corrector) takes MORE iterations than the NLP path (2 vs 1, 10 vs 6), because the NLP-IPM already has Mehrotra/adaptive-mu while this bare QP method does not. This is the documented motivation for Phase 3: the 30-50% iteration win is IPM-QP *with Mehrotra*, not the bare method. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Replaces the bare fixed-sigma path-follower with Mehrotra predictor-corrector: an affine-scaling predictor, adaptive centering sigma = (mu_aff/mu)^3, and a corrector carrying the second-order ds∘dz term. Predictor and corrector reuse one factorization per iteration. Separate primal/dual fraction-to-boundary step lengths. The cone- specific second-order term lives behind a new Cone::comp_residual_corrector so the driver stays cone-agnostic. Drops the now-unused fixed sigma option. Adds crates/pounce-cli/tests/qp_vs_nlp_iterations.rs: solves the same bound-constrained convex QP through both the NLP filter-IPM and the pounce-convex QP IPM, asserts identical optima and that the QP path uses no more iterations. Result at n=50: QP 10 iters vs NLP 17 (~41% fewer), demonstrating the plan's 30-50% claim. (The win appears on inequality/ bound-constrained QPs with a non-trivial central path; pure-equality or n=2 QPs solve in ~1 Newton step either way.) https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Scaling sweep (small dense → n=100k sparse) showed the IPM iteration count stays flat (9-10) across five orders of magnitude — the algorithm is healthy — but per-iteration cost grew super-linearly because the solver rebuilt the Factorization (symbolic analysis + AMD ordering) every iteration even though the KKT pattern never changes; only the (z,z) scaling diagonal does. Fix: factor the fixed KKT pattern once via a new KktStructure that records the scaling-diagonal positions, then each iteration updates only those O(m) values and calls refactor() (numeric-only, reusing the symbolic factor). At n=10000 this cut per-iteration time ~2.5x; the breakdown confirms the loop no longer re-pays symbolic analysis. Residual super-linear growth is now inside feral's numeric factor/solve (the shared pounce-linsol backbone), not the QP code. Adds examples/scaling.rs (sweep + per-iteration breakdown) and tests/scaling_iterations.rs (regression guard: iteration count stays flat across 50x size growth). All known-optima tests and the QP-vs-NLP comparison still pass. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Completes the Phase 2 dispatch wiring: classified LP and convex-QP .nl inputs now actually route to the pounce-convex interior-point solver instead of falling through to the NLP path. - New qp_extract module: NlProblem → pounce_convex::QpProblem standard form. Objective Hessian (via the classifier's analyze_quadratic) → P, linear obj → c (maximize negated), equality rows → A x = b, inequality/ range rows and finite variable bounds → G x ≤ h (with the .nl infinity sentinel treated as unbounded). 4 unit tests solve extracted QPs/LP to known optima. - resolve_solver: auto now routes Lp/ConvexQp → QpIpm (LP is P=0), everything else → Nlp; unit tests updated. - main.rs: run_convex_qp solves the extracted QP with feral, reports the objective in the user's original sense (sign + dropped constant), and writes a .sol (primal x; constraint duals zero for now — mapping QP (y,z) incl. the bound-row split back to per-constraint multipliers is a follow-up). - Fixture convex_qp.nl + qp_dispatch_end_to_end.rs: auto routes it to pounce-convex, forced qp-ipm solves, nlp path still solves (no regression), and the .sol primal matches the (1,1) optimum. https://claude.ai/code/session_01PZiGeQc8QrerZtBJe6d7rJ

Set up the loop-driven PR #70 hardening workflow: - dev-notes/pr70-hardening.md: the loop's state file — 9-item A–H checklist (routing classification first, as the highest-risk silent-wrong-answer path), per-item template, reusable oracle patterns, and the captured bootstrap baseline. - benchmarks/scripts/compare_pounce_clarabel.py: external validation harness that runs pounce live + Clarabel 0.11.1 on the netlib LP and Maros-Meszaros QP matrices and joins objectives by name (Item B input). Bootstrap baseline captured in the tracker: - cargo test --workspace: green, 1649 passed / 0 failed. - Clarabel comparison: LP 412/419 agree, QP 110/114 agree (both-solved, reldiff < 1e-4). Genuine objective disagreements to triage in Item B: QP YAO (197.70 vs 91.02) and LP capri (2.4%); the rest are near-zero artifacts or borderline tolerance. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…e found Hardened classify_problem / hessian_is_psd against silent nonconvex→convex misrouting (the highest-risk path). Added to dispatch::tests (now 29/29): - psd_rejects_small_but_real_negative_curvature: a genuine −1e-3 eigenvalue reads indefinite, not rounded to PSD. - psd_threshold_is_psd_tol: pins the cutoff at ±PSD_TOL (−1e-10 → PSD, −1e-7 → indefinite). - classify_concave_minimize_is_nonconvex: minimize −x0² → NonconvexQp. - classify_qcqp_with_indefinite_constraint_falls_back_to_nlp: convex obj + indefinite quadratic constraint → Nlp (conservative QCQP guard; previously untested). - classify_cancelling_quadratic_objective_is_lp: x0²−x0² → Lp. Finding (informational, not a defect): the ±PSD_TOL band rounds toward convex (min_eig >= −1e-9), so the module doc's "never to the convex path" overstates the actual >= −tol behavior. This is the correct tradeoff — it admits semidefinite Hessians whose smallest eigenvalue computes as a tiny negative under roundoff — and the band is far below the solve error it could cause. Recommend only a doc-wording fix. Recorded in dev-notes/pr70-hardening.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Proved end-to-end that a forced solver_selection not matching the detected class errors at routing and never silently mis-solves to a wrong "optimal". New fixture nonconvex_qp.nl (min x0*x1 s.t. x0+x1=2, 0<=xi<=4): indefinite Hessian, classifies "nonconvex QP"; box bounds keep the NLP fallback clean. qp_dispatch_end_to_end.rs: - forced_qp_ipm_on_nonconvex_qp_errors: convex QP IPM forced on a nonconvex QP exits 2, names class+solver, and asserts the output does NOT contain "Optimal Solution Found". - forced_qp_active_set_on_nonconvex_qp_errors: same for active-set QP. - forced_lp_ipm_on_convex_qp_errors: LP IPM forced on a convex QP errors. - auto_routes_nonconvex_qp_to_nlp_safely: auto routes the nonconvex QP to pounce-nlp (not pounce-convex), solves, exit 0. dispatch_routing.rs: - forced_qp_solvers_on_nlp_error: qp-ipm and qp-active-set forced on a general NLP both exit 2 with a naming message. Full pounce-cli suite green. No defect found: the mismatch is raised before any solve, so no wrong objective is ever produced. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…swer bug Add a strict objective-agreement gate to the Clarabel comparison harness and use it to validate netlib LP + Maros-Meszaros QP objectives: * --check exit nonzero on a *genuine* disagreement: both solvers certified-solved (pounce SolveSucceeded AND clarabel Solved; AlmostSolved/Acceptable excluded) yet objectives differ beyond numpy-isclose |a-b| > atol + rtol*max(|a|,|b|), rtol=atol=1e-3. * --from-json re-evaluate the committed clarabel_compare_{lp,qp}.json without re-running both solvers (regression gate / CI). Across LP (467) + QP (138) the gate flags exactly ONE hard-fail: capri. HIGH-SEVERITY, MERGE-BLOCKER — capri silent wrong answer in pounce-convex LP IPM. On the identical generated .nl: nlp -> 2690.0129 (correct: matches Clarabel, the netlib optimum, and the prior stored value); lp-ipm -> 2625.0118 (wrong by 2.4%, reported SolveSucceeded). Same .nl on both paths, so it is the convex IPM, not conversion. Hit by DEFAULT routing: `pounce capri.nl` with no flags routes LP -> convex IPM -> "Optimal Solution Found obj=2625.01", a confident wrong optimum with no opt-in. --check now guards against it. Other disagreements triaged benign (YAO: clarabel only AlmostSolved, pounce matches the published optimum; near-zero optima agree under abs tol; a few LPs differ only at ~1e-3 convergence slack). Also de-staled the local benchmarks/lp/pounce.json (gitignored build artifact) from live results: adlittle 6812.5 -> 225494.96, stocfor1 -13875 -> -41131.98. Findings recorded in dev-notes/pr70-hardening.md (item B). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…Optimal Add the missing limit-status and degenerate-input honesty tests; pre-existing coverage already handled infeasible/unbounded and most edge inputs. Convex IPM (crates/pounce-convex/tests/infeasibility.rs, 5 -> 8 tests): * iteration_limit_reported_not_optimal — max_iter=1 on a well-posed box QP reports IterationLimit, never a premature Optimal (the honest counterpart of the item-B capri violation). * fixed_variable_equal_bounds_optimal — a variable pinned by lb==ub solves Optimal at the fixed value, no spurious infeasible/numerical-failure. * unconstrained_qp_optimal — a fully unconstrained QP solves to its stationary point and reports Optimal. Global B&B (crates/pounce-global/tests/global.rs, 22 -> 24 tests): * node_limit_reports_status_and_valid_bracket — max_nodes=1 reports NodeLimit (never Optimal) with a valid lower<=upper bracket and a non-zero gap. * time_limit_reports_status_and_valid_bracket — max_cpu_time=0 reports TimeLimit (never Optimal) with a valid bracket. All green; no new defects. Findings recorded in dev-notes/pr70-hardening.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ar-boundary NumericalFailure logged Add first program-level coverage of the least-tested cones: - sdp_cone.rs: 3 end-to-end SDPs via solve_socp_ipm + ConeSpec::Psd(2) (min-diagonal t=1, max-eigenvalue λmax=3, infeasible-honesty). - exp_cone_vs_nlp.rs: first ConeSpec::Power coverage (geometric mean), n=16 entropy, and a near-boundary GP swept over u. Finding (medium robustness gap, not a wrong answer): the non-symmetric/ PSD drivers return NumericalFailure near the cone boundary on otherwise solvable/infeasible programs (exp GP at u=3; PSD infeasibility cert). Safety property holds everywhere — never a false Optimal — which the tests assert; objectives are checked wherever the driver converges. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Near the cone boundary (s∘z → 0) the NT scaling and KKT factorization in the symmetric HSDE driver (`hsde.rs`, SOC/orthant/PSD) can break down a hair short of `tol`. When that happens the current iterate is often *already* essentially optimal — the unregularized KKT residuals are tiny — yet the driver reported a spurious `NumericalFailure`. The non-symmetric driver (`hsde_nonsym.rs`, exp/power cones) already guarded its factorization-failure sites with an Ipopt-style "acceptable level" tier (`res < 1e3·tol`); the symmetric driver did not, so the two were inconsistent — the symmetric one discarded usable SOC/orthant iterates the non-symmetric one would have kept. Port the same tier into the symmetric driver's four factorization/solve failure sites. On the 132-instance CBLIB conic corpus this recovers 12 of 34 `NumericalFailure` instances (all SOC/orthant, byte-identical objectives) — corpus goes 71→83 pass, 34→22 fail. The remaining 22 are genuine (9 exp-cone gap-laggards, slay06h/06m divergence, expdesign_D 0-iter). Shared-path safety: `hsde.rs` also backs the global B&B relaxation LPs. Re-ran the 104-model GLOBALLib proven-optimum suite — bit-identical to baseline (59 OK / 45 TIMEOUT / 0 WRONG, zero per-instance status, objective, or node-count changes). 185 convex unit tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…erial==parallel Add the core spatial-B&B soundness checks (global.rs, 24 -> 27 tests): - certified_lower_bound_never_exceeds_true_global: lb <= f* at a sweep of node caps over 5 known-optima nonconvex problems (quartic, bilinear, six-hump camel, xy>=4, trilinear). Stronger than the prior lb<=incumbent bracket: an invalid relaxation could pass that yet exceed the truth and fathom the optimal box. - each_relaxation_yields_valid_global_lower_bound: re-enables one of {alphabb, rlt, multilinear, obbt, sandwich} at a time and re-checks lb<=f* under partial search, isolating each generator's validity. - parallel_matches_serial_constrained: 4-thread node pool vs serial on a constrained nonconvex program; same optimum, constraint honored. No defects: every certified lower bound stayed a valid global bound across all problems, caps, and per-relaxation configs; serial == parallel. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Existing presolve tests assert primal+dual recovery one reduction at a time. Add the missing case: a single heavily-reduced QP firing four distinct reductions at once. heavily_reduced_mixed_reductions_recovers_primal_and_dual (presolve_roundtrip.rs, 6 -> 7 tests): one 6-var/2-eq/1-ineq QP that simultaneously triggers a fixed variable, a free-column singleton (substituted out), a dominated column (fixed to a bound), and a binding inequality, collapsing to a <=3-var core (checked via stats()). Verifies full recovery vs a direct no-presolve solve: all six primal x, the objective, and the complete dual (equality y, inequality z, bound z_lb/z_ub) to 1e-5. New assert_original_kkt helper re-checks the recovered (x,y,z,z_lb,z_ub) against the ORIGINAL KKT system, so a mis-recovered dual on any substituted variable would show as a nonzero stationarity residual (complementarity guarded to finite bounds). No defects: postsolve reconstructs the full primal and dual exactly. Suite green: roundtrip 7, reductions 26, forcing 6, bound_tightening 4, conic 2. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Item G has three concerns; verified each, added coverage for the one real gap. (1) minimize() auto-routing and (2) JAX differentiable-QP gradients vs finite differences were already well-covered: test_minimize_autoroute.py (8 tests: convex QP/LP routing, NLP stays put, forced mismatch raises, finite-diff routing) and test_qp_jax.py / test_qp_sensitivity.py (reverse -mode gradients vs FD for c, b, h, P, G, A). 38 G-relevant pytest cases pass. (3) --json-output schema uniformity across solver paths was the real gap. The JSON report was tested only on the NLP path (json_report.rs) plus the convex QP-IPM path (qp_dispatch_end_to_end.rs); nothing asserted the schema was identical in shape across paths, and the LP-IPM path had no JSON coverage at all. Add json_report.rs::json_schema_is_uniform_across_ solver_paths (4 -> 5 tests): runs one invariant set over three distinct dispatch paths — NLP (parametric.nl), convex QP-IPM (convex_qp.nl, qp-ipm), convex LP-IPM (lp_afiro.nl, lp-ipm) — asserting for each: schema tag, solver.name == "pounce", non-empty result_id, non-empty + all-finite solution.x, finite objective == statistics.final_objective (rel 1e-9), and n_variables == x.len(). A divergent or placeholder report would now fail here. New fixture crates/pounce-cli/tests/fixtures/lp_afiro.nl (netlib afiro, 32 vars, f* = -464.753) — the LP-IPM path's first end-to-end JSON fixture. No defects: all three paths emit the identical schema. json_report green (5 tests); 38 G-relevant pytest cases pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…, 2 pre-existing defects fixed Item H (final hardening item) — build/clippy/full-suite hygiene. - cargo test --workspace: 1675 passed, 0 failed (with these edits in place; identical to pre-edit run, so the changes are behavior-preserving). - pytest python/tests: 286 passed, 0 failed. - Zero rustc warnings. - Made the PR70-new production libs (pounce-convex, pounce-global) clean under clippy::all: 13 behavior-preserving fixes (needless_borrow, identity_op, manual loops -> iterator zips, neg_cmp NaN-safety guarded with targeted allows + comments, large_enum_variant/collapsible_match documented allows). Two pre-existing defects found and fixed: - MEDIUM (build hygiene): stale _pounce.abi3.so made 7 test_global.py cases fail with a max_cpu_time TypeError; the Rust binding was correct. Fixed by rebuilding; recorded as a CI build-order note. - LOW (over-tight test): test_qp_factorization_build_once_solve_many asserted atol=1e-6 on two independent IPM solves of a bound-active QP whose optimum is a vertex; the IPM only approaches the boundary asymptotically, so the two runs legitimately differ ~1e-5. Loosened to 1e-4 with an explanatory comment. Proven pre-existing by reproducing on clean HEAD. Out of scope (documented in the H Findings): ~600 pre-existing unwrap/expect policy warnings and shared-crate clippy::all warnings are not addressed here; literal workspace-zero-warnings needs a separate cleanup pass. A-H now all complete. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…answer) The capri netlib LP returned a confident wrong "optimal" (2625.0118 vs the correct 2690.0129) through the convex LP IPM and default routing — a HIGH / merge-blocking silent wrong answer. Root cause is in presolve postsolve, not the IPM. capri's presolve emits a FreeColSingleton whose substitution formula x_col = (b_r - Σ_{j≠col} a_j x_j) / a_col reads a variable that a *separate* FixedVar (singleton equality row) reduction sets. The old postsolve restored primal values in a single reverse-LIFO pass, so the free singleton was computed from its formula before its fixed-var dependency had been restored — producing a point that violates the consumed equality row and a wrong objective reported as optimal. Fix: two-pass primal recovery in postsolve_once. Pass 1 (reverse) restores all constant-valued reductions (FixedVar, FreeColumnFixed, ForcingRow, DominatedColumn); pass 2 (forward) restores formula-based FreeColSingleton values against the now-restored neighbours. Verified: capri -> 2690.012914 on all paths (nlp, lp-ipm, default routing), postsolved point fully feasible; adlittle/afiro/blend/sc50a/sc105 unchanged and correct. Adds permanent regression test free_singleton_depends_on_fixed_var_postsolve_order. Full pounce-convex suite green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…bility Two robustness fixes in the convex conic stack, both addressing the MEDIUM "NumericalFailure where a clean status exists" defect from dev-notes/pr70-hardening.md. Neither was a wrong-answer bug — the driver never reported a false Optimal — but both returned NumericalFailure where a correct solve/certificate was available. Exp cone (feasible-but-fails): the non-symmetric HSDE driver stalled near the cone boundary (u=3 in `min e^u+e^-u`), landing res at ~1.16e-5, just over the 1e3*tol acceptance band because the gap term is amplified by a small tau. Track the best (lowest-residual) iterate and, if the driver would otherwise fail, accept it as Optimal when that residual is within reduced accuracy (sqrt(tol)=1e-4) — the ECOS/Clarabel/SCS "solved to reduced accuracy" convention. Infeasible/unbounded runs never reach res<1e-4, and the clean convergence test at tol is unchanged. PSD cone (infeasible -> wrong status): detect_infeasibility validated the Farkas multiplier z componentwise (zi >= -tol), which is the dual-cone test for the orthant only. A PSD block's dual cone is smat(z) >= 0, so a legitimate certificate was rejected and the solve fell through to NumericalFailure. Add a self-dual `in_dual_cone(z, tol)` to the Cone trait (orthant, SOC, PSD, composite) and a cone-aware detect_infeasibility_cone; the symmetric drivers (ipm::run_ipm, hsde) now check z against the actual dual cone. The non-symmetric path keeps the componentwise default. The infeasible SDP now returns PrimalInfeasible (sdp_cone.rs assertion tightened to == PrimalInfeasible); near_boundary_gp_matches_nlp solves at every u including u=3. Full pounce-convex + exp_cone_vs_nlp suites green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The full pytest suite can fail with cryptic `TypeError: ... unexpected keyword argument` errors when an in-place `python/pounce/_pounce*.so` (left by an earlier `maturin develop`) shadows the current Rust binding — the artifact is behind the source. CI is already immune (the python-test job builds a fresh wheel each run), so this is a local-dev hazard. - python/tests/conftest.py: a pytest_configure guard that, for an in-repo editable build, compares the extension's mtime against the newest Rust source under crates/ and fails fast with an actionable "run maturin develop" message instead of letting the suite die confusingly. Skipped for wheel/site-packages installs (no in-repo .so); bypass with POUNCE_SKIP_EXT_STALE_CHECK=1. - Makefile: `make python-test` (+ `python-ext`) rebuilds the extension in place, then runs pytest, so the documented local path rebuilds first. Verified: stale .so aborts collection with rebuild instructions; a fresh artifact collects all 281 tests; the bypass env var skips the guard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…start Introduce the bounded-variable revised simplex LP solver that backs OBBT in spatial branch-and-bound. Two pieces of substance: Phase 6.2 — basis engine behind the `BasisEngine` seam (basis.rs): - `FaerBasis`: faer sparse LU of the base basis B0 + a product-form (eta) file of per-pivot rank-1 updates (B^-1 = E_t...E_1 B0^-1), refactoring every REFACTOR_INTERVAL pivots. faer owns the numerically-delicate sparse LU; the rank-1 update — which no general LU library provides — stays in-house. - A probe solve after factorization catches numerically-singular-but- structurally-full bases that faer's sp_lu flags only structurally. - `DenseBasis` retained under cfg(test) as the lockstep correctness oracle. Phase 6.3 — dual-simplex warm-start across bound changes (simplex.rs): - `Simplex::solve_bounds(lb, ub)`: a parent->child box tightening leaves the optimal basis dual-feasible, so a bounded-variable dual simplex restores primal feasibility in a few pivots instead of a cold Phase I/II. Complements `solve_objective` (the within-node objective-flip lever). - Reports Infeasible when the dual is unbounded — never a wrong "optimal" — and falls back to a guaranteed-correct cold solve if the dual phase stalls. Validation: 26 tests (lockstep faer-vs-dense oracle under randomized pivots, warm-vs-cold parity across bound tightenings, infeasibility detection, OBBT sweeps) plus the un-parked HiGHS ill-scaled ex4_1_2 regression. clippy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…pipeline trims Drive down per-node cost in the spatial branch-and-bound global optimizer (Phases 2-4 of the perf plan), holding the 0-WRONG invariant — every lever is perf/robustness only and defaults are behavior-preserving. Phase 2 — schedule + budget OBBT (new opt-in GlobalOptions + CLI knobs): - obbt_max_depth: gate the 2n-LP sweep to shallow nodes (default usize::MAX). - obbt_interval: run OBBT every k-th node (default 1; approximate under the parallel pool, sound either way). - obbt_max_vars: budgeted partial sweep over the widest-box variables (2k instead of 2n LPs; default usize::MAX). All three only throttle tightening, never soundness. Phase 3 — warm-start the IPM instead of cold-starting: - Carry the parent relaxation primal/dual on the frontier node and seed the child lower-bound solve via solve_qp_ipm_warm. - Warm-start each sandwich re-solve from the previous round. - Conservative guards throughout: dimensional compatibility check + cold fallback on any non-Optimal warm result, so bound tightness is a strict superset of today's. Phase 4 — cut the fixed small-n pipeline cost: - Depth-aware local_solve_iters (halve every 4 levels, floored at 10). - Adaptive sandwich short-circuit on negligible marginal gain. - Reuse the final OBBT-pass relaxation as the node lower-bound relaxation when the box is unchanged, saving a build_relaxation per node (bit-identical: rebuilt per pass, peeled cutoff cut, multilinear-guarded). Also wires the revised-simplex OBBT engine (ObbtLp::Simplex) behind the off-by-default `simplex-obbt` feature. It is PARKED as unsound on ill-scaled relaxation LPs (returns wrong certified optima — see ObbtLp::Simplex docs); with the feature off the request transparently downgrades to the sound IPM sweep and pounce-simplex is not linked. IPM remains the default OBBT engine. Validation (per the loop's small-problem policy; no GLOBALLib timing sweep): all default-feature Rust suites green across pounce-global / pounce-convex / pounce-simplex / pounce-cli, with every certified optimum and exact node count unchanged ⇒ 0 WRONG. Full 104-model OK-count sweep deferred to a manual run. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…wer bug) A relaxation LP can carry a coefficient that has collapsed to numerical noise (e.g. a McCormick secant slope going to ~1e-44 at a degenerate box edge). Geometric-mean equilibration let such an entry drag the row/column geometric mean toward zero and inflate the scale by 1e10–1e20, which distorted the reduced-cost tolerances enough that the revised simplex declared a *wrong* vertex optimal. Observed end to end on the quartic `x^4 - 3x^2`: on the OBBT child box [-2, ~0] the simplex returned `min x0 = -0.375` instead of the true `-1.846`, so OBBT tightened the box to [-0.375, 0], cut off the global minimizer x ~= -1.2247, and certified `-0.402` instead of `-2.25`. This made `simplex_obbt_matches_ipm_certified_optimum` fail. Fix: `EQUILIBRATE_DROP` — entries negligible relative to their row/column max are excluded from the geometric mean (computed via a max sub-pass then a min sub-pass over significant entries only). col_scale[0] drops from ~3.4e10 to O(1) and the simplex reaches the true optimum. - Add tests/degenerate_mccormick_scaling.rs: the exact captured LP, cold and warm-sweep, guarding both against the wrong vertex. - The 0-WRONG gate `simplex_obbt_matches_ipm_certified_optimum` now passes. - Refresh the stale `ObbtLp::Simplex` "PARKED — not sound" docs: the engine is sound on all known cases; it stays feature-gated pending a wider GLOBALLib cross-check before becoming the default. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Wiring the global solver's per-node pieces (OBBT sweep, simplex/IPM warm-starts, branching) needs a seconds-long edit->run loop, not the 25-min full sweep. Add: - compare_obbt_engines.py: runs both OBBT LP engines (ipm + simplex) over a model set and asserts they certify identical optima, failing (nonzero exit) on any WRONG verdict or engine-vs-engine disagreement. The soundness gate before graduating simplex-obbt to default. - tiers/micro.txt (12 models, ~2.5s both engines): the inner loop, curated to cover root-only (OBBT/relaxation/local-solve) and branching (tree/incumbent) across a range of n, every entry sub-second. - tiers/fast.txt (34 models, every IPM<1s solve): broader fast regression. - run_globallib.py --stems-file: run a tier file. - make globallib-micro / globallib-fast: build the simplex-obbt binary and run the cross-check. Micro tier currently: 12/12 correct, 0 wrong, 0 engine disagreements. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Replace the single-pass min-ratio primal ratio test with a Harris (1973) two-pass test layered on EXPAND (Gill, Murray, Saunders & Wright 1989): * Pass 1 computes the largest step keeping every basic variable within a feasibility tolerance of its bound; because each blocking numerator gets the tolerance as slack, the step is strictly positive even at a degenerate vertex. * Pass 2 selects, among rows whose true breakpoint is within that step, the largest pivot magnitude (numerical stability) rather than merely the first to bind. The feasibility tolerance grows by EXPAND_TAU each iteration up to FEAS_TOL and resets at each refactor/recompute, guaranteeing forward progress and breaking cycles. Bland's rule is demoted to a finite-termination backstop. This hardens the LP foundation the spatial B&B OBBT inner loop rests on. All 24 pounce-simplex oracle/unit tests pass (Klee-Minty, warm-start sweeps, HiGHS-checked ill-scaled OBBT). Note: the simplex-OBBT path still stalls on the degenerate GLOBALLib ex9_1_2 root LPs, so further Track-A work (bound-flipping long step) remains before simplex graduates as the default OBBT engine. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The spatial branch-and-bound global optimizer (pounce-global) is not ready to ship and was blocking the LP/QP/convex work from merging. Its full tree is preserved on the `feature/global` branch; here it is removed from `merge/pr70-reconcile` so the convex stack can land cleanly. Removed: - crates/pounce-global (entire crate) + its workspace membership - pounce-cli global wiring: SolverChoice::Global / SolverSelection::Global, the global dispatch + option-registration paths, tree_debug_cli test, and the now-dead nl_constraint_bound sentinel helper (global-only) - pounce-py global_opt bindings (mod + solve_global registration) - python: global_opt.py, test_global.py, and the minimize_global / GlobalResult exports from the package surface - the three global dispatch_routing tests (solver_selection=global now correctly returns OPTION_INVALID) Workspace builds clean; pounce-cli dispatch suite green (27 tests). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

pounce-simplex was the in-house LP-engine foundation, but its only consumer was pounce-global's OBBT bridge (simplex_bridge.rs), which left with the global strip. On this branch nothing depends on it: no crate declares it as a dependency and no source references `pounce_simplex` outside the crate. The convex LP/QP path ships on the pounce-convex HSDE IPM, which is SOTA-competitive for cold-start LP/QP; simplex's payoff (warm-started node LPs for B&B/OBBT, basic solutions, crossover) belongs with the global/warm-start track. The crate (and its OBBT-derived regression tests) is preserved on the `feature/global` branch — it is shared history below the split, so the global track keeps it intact. Also drops a dangling `pounce-global` workspace.dependencies entry left over from the earlier global strip. Workspace builds clean; no remaining references in code or manifests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Move the feral pin from bb74821 (the v0.4.0 baseline) to main HEAD 11fb4b9, carrying the issue #80 MC64/scaling work (Hungarian-heap reuse across columns, localized dense-column cost, ldlt_compress profiling). Decisive on the badly-scaled AC power-flow Jacobians: GAMS powerflow testset goes 19/28 -> 24/28 solved (+5: pf14,18,25,27,28), 0 regressions, bit-identical objectives on all 19 jointly-solved, and a 3.12x geomean speedup (1.56x-12.37x; the large pf10/20/22/24/26 solves see 8-12x). Full report in gams/nlpbench/BENCHMARK_REPORT_powerflow_feral-head.md. NOTE: this git pin blocks the crates.io publish of the pounce crates until feral cuts a release carrying these commits. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Operational guide for the repo: the three release surfaces (PyPI pounce-solver, PyPI pyomo-pounce, the 16 manual crates.io crates), the hand-made GitHub Release step, and the crates.io User-Agent gotcha for checking published versions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Strategy and positioning material: the pounce vision/positioning note, the discopt co-designed-integration writeup, the education & research "introspectable, LLM-explainable solver" angle, a PyTorch-frontend issue sketch, and the LinkedIn v0.4.0 release-announcement draft. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…path Completes the pounce-global strip: removes the orphaned interactive branch-and-bound tree debugger and its supporting trait surface, which were left behind when the spatial global solver was moved to feature/global. - crates/pounce-cli/src/tree_debug.rs (~522 lines): the REPL for `pounce --solver global --debug` — a flag that no longer exists. It was pub-mod-exported but never instantiated (no caller in main/dispatch); pure dead code reachable only through the removed global solver. - crates/pounce-common/src/debug.rs: the tree-only debug API (TreeCheckpoint, PruneReason, TreeDebugState, TreeDebugHook) consumed solely by that orphan. The shared iteration-loop DebugState/DebugHook/ DebugAction surface is untouched. The DebugHook::arm() doc no longer cites the removed tree debugger. All of this remains intact on feature/global. Workspace + all test targets compile clean; pounce-common (146) and pounce-cli suites green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Rename the `compute_p_solves_each_a_column_against_K` test to `..._against_k_matrix`, clearing the last `non_snake_case` warning so `cargo build`/`cargo test` are warning-clean. Test still passes (38 in pounce-sensitivity lib). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…crate Per the decision to (1) defer simplex to the global-optimization work and (2) build the robust sparse LU inside feral rather than a pounce-lu crate: - Decision 2 rewritten: simplex is NOT a convex-MI dependency. It arrives only with the global LP-relaxation nodes. No pounce-lu / pounce-simplex crates -- sparse LU lands in feral beside its LDLᵀ (one backend behind pounce-linsol for both symmetric IPM/QP and unsymmetric simplex systems); the simplex driver is a later module in pounce-convex. Reconciles the old 'simplex is back' reversal with decision 8 (not chasing MILP) and matches the PR-#70 strip of the built simplex. - Renumbered phases: convex-MI = 0-2 (plumbing, B&B+MIQP, cuts+presolve, no simplex/LU), global = 3-5 (relax, spatial B&B [simplex/LU land here], MINLP-global), smoothed gradients = 6. Updated cost table, crate layout, both diagrams, crate skeletons, and all cross-references. - Refreshed stale 'landing on claude/amazing-mayer-Xd0ag' references to 'merged in PR #70' now that the LP/QP branch is on main.

jkitchin force-pushed the claude/amazing-mayer-Xd0ag branch 5 times, most recently from 925d1a6 to fe1a65e Compare June 3, 2026 17:15

claude added 24 commits June 5, 2026 00:04

jkitchin and others added 26 commits June 7, 2026 08:20

jkitchin merged commit 89fd852 into main Jun 8, 2026
8 checks passed

jkitchin deleted the claude/amazing-mayer-Xd0ag branch June 8, 2026 12:22

jkitchin mentioned this pull request Jun 8, 2026

Tech debt: Python routing facade is designed (lp-qp-routing.md) but not built #106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design notes: LP/QP routing and performance engineering#70

Design notes: LP/QP routing and performance engineering#70
jkitchin merged 180 commits into
mainfrom
claude/amazing-mayer-Xd0ag

jkitchin commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jkitchin commented May 28, 2026

Summary

Changes

dev-notes/lp-qp-routing.md (extensively revised)

dev-notes/performance-engineering.md (

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`dev-notes/lp-qp-routing.md` (extensively revised)

`dev-notes/performance-engineering.md` (