Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
99178ed
Merge main into develop after 0.2.0 release
wahln Jun 21, 2026
f5f16b1
docs: encode the release workflow + add the /release command
wahln Jun 21, 2026
388ae1f
feat(logging): repeat iteration header, mark acceptable iterates, fix…
wahln Jun 21, 2026
c2b47d2
test: expand Hock-Schittkowski oracle coverage (HS9, HS21, HS28, HS71)
wahln Jun 21, 2026
51ed1df
feat(ipm): support two-sided linear inequalities (l <= A x <= u)
wahln Jun 21, 2026
66cef5e
refactor(testing): declare HS21 inequality via linear_ineq
wahln Jun 21, 2026
b19b4af
Merge feat/two-sided-linear-ineq into develop
wahln Jun 21, 2026
e42e9b0
feat(benchmarks): add S2MPJ corpus adapter and accuracy sweep
wahln Jun 21, 2026
ddead72
fix(restoration): degrade gracefully on a singular Gauss-Newton solve
wahln Jun 21, 2026
fc6db24
feat(benchmarks): run the full CUTEst set via S2MPJ (--all)
wahln Jun 21, 2026
9d476b7
fix(filter): use IEEE overflow semantics in the switching condition
wahln Jun 22, 2026
f1de2d4
fix(ipm): relax fixed variables (x_L == x_U) to admit a barrier interior
wahln Jun 22, 2026
1aa7f72
fix(ipm): salvage near-optimal iterate when the step solve fails
wahln Jun 22, 2026
670c4db
fix(ipm): salvage near-optimal iterate when line search hands off to …
wahln Jun 22, 2026
7459767
feat(benchmarks): add --scaling flag to the S2MPJ sweep runner
wahln Jun 22, 2026
e7ca276
fix(ipm): avoid inf/nan arithmetic warning in relax_fixed_bounds
wahln Jun 22, 2026
7c51a2e
perf(ipm): default to gradient-based scaling
wahln Jun 22, 2026
de75cfc
docs: note matrix-free Krylov saddle preconditioning limitation
wahln Jun 22, 2026
471fbf8
feat(benchmarks): exact-Hessian + sparse-direct S2MPJ sweep, default …
wahln Jun 23, 2026
d7c6c6e
add gating for s2mpj benchmark
wahln Jun 23, 2026
98ff72e
computational optimization of sparse adapters (avoid copies and conve…
wahln Jun 23, 2026
2fa25eb
feat(benchmarks): size-aware S2MPJ sweep — per-route caps, sized buil…
wahln Jun 23, 2026
4f26608
feat(benchmarks): add lbfgs/sparse + exact/krylov configs, --names-file
wahln Jun 23, 2026
4c622ca
feat(benchmarks): incremental persistence + resume/exclude for S2MPJ …
wahln Jun 24, 2026
3ecff05
feat(benchmarks): --config selector + in-flight marker for per-config…
wahln Jun 24, 2026
72f227b
feat(benchmarks): dataset-sourced expected-outcome scoring + feasibil…
wahln Jun 24, 2026
70be199
docs(benchmarks): publish latest S2MPJ/CUTEst full-corpus results
wahln Jun 24, 2026
d5e953b
fix(review): address PR review + stabilize the qc gate
wahln Jun 25, 2026
23eb026
test: match the reworded linear_ineq operator-rejection message
wahln Jun 25, 2026
532b896
release: v0.3.0
wahln Jun 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions .claude/commands/release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
description: Drive the ipax release workflow (rc branch -> PR -> tag) for a version.
argument-hint: X.Y.Z
---

Drive a release of version **$ARGUMENTS** following the **Releasing** section in
`AGENTS.md` — that section is the source of truth; read it first and keep this command
in sync with it if it changes.

This workflow has human/async gates (CI, code review, PR merge). **Stop at each gate**
and report status — do not poll indefinitely or fabricate approval. Resume when I tell
you the gate has passed.

Phase A — open the RC (do now):

1. From an up-to-date `develop`, create branch `rc/v$ARGUMENTS`.
2. Push it and open a PR with **base `main`, head `rc/v$ARGUMENTS`**.
**Do not bump the version or finalize the changelog yet.**
3. Report the PR URL and **stop**: waiting for CI + code review.

Phase B — review loop (when I report review feedback):

4. Address the review comments. Stage only the files you changed (**never `git add -A`**),
commit, push. Re-summarize and **stop** until the PR is approved.

Phase C — release bump (only once I confirm the PR is approved):

5. Finalize `CHANGELOG.md`: add `## [$ARGUMENTS] - <today>` (move the `[Unreleased]`
items into it) and update the compare links at the bottom.
6. Bump `ipax.__version__` to `$ARGUMENTS` (the single version source; pyproject derives it).
7. Run `python scripts/check.py` and `pre-commit run kacl-verify --files CHANGELOG.md`.
Commit, push, and **stop**: waiting for CI to go green and the PR to be merged.

Phase D — tag & sync (once I confirm the PR is merged):

8. `git switch main && git pull --ff-only origin main`; confirm `__version__` is `$ARGUMENTS`.
9. Create the annotated tag: `git tag -a v$ARGUMENTS main` with a short summary drawn
from the `[$ARGUMENTS]` changelog section.
10. Merge `main` into `develop` (`--no-ff`).
11. Push both: `git push origin develop && git push origin v$ARGUMENTS`. The tag push
triggers `release.yml` (PyPI + GitHub release) — report that it has started.

If `$ARGUMENTS` is empty or not `X.Y.Z`, ask me for the version before doing anything.
Stop and ask if any step deviates from `AGENTS.md` rather than improvising.
42 changes: 40 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,10 +177,14 @@ Claude Code config lives in `.claude/` and is checked in:
- **`/verify`** (`.claude/commands/verify.md`) — run `scripts/check.py` and summarize.
- **`/tdd`** (`.claude/commands/tdd.md`) — drive a change through the mandated
red→green→verify→regression loop, multi-backend and invariant-aware.
- **`/release`** (`.claude/commands/release.md`) — drive the release workflow
(`rc/vX.Y.Z` branch → PR onto `main` → review loop → version/changelog bump → tag →
sync `develop`), stopping at each human/CI gate. Follows the **Releasing** section.

The auditor's rubric is the static half of GPU performance work; the measured half is
a future `benchmarks/runners/device_efficiency.py` (sync/kernel profiling on real
hardware), deferred until GPU CI exists.
`benchmarks/runners/device_efficiency.py` (host-sync counting + per-iteration timing
on real hardware, kernel-launch counts best-effort via `nsys`). It is GPU-gated, so it
is a no-op in CI until GPU CI exists; run it locally on a CUDA backend.

---

Expand Down Expand Up @@ -250,6 +254,40 @@ core must remain backend-agnostic.

---

## Releasing

Releases follow a release-candidate PR onto `main`, then a tag. Conventions to know
before starting:

- **Version is single-sourced** in `ipax/__init__.py` (`__version__`); `pyproject.toml`
derives it via `[tool.hatch.version]`. Bump in **exactly that one place**.
- **`CHANGELOG.md` follows Keep a Changelog** and is enforced by the `kacl-verify`
pre-commit/CI hook. Keep entries under `## [Unreleased]` as work lands.
- **Tags are `vX.Y.Z`**; pushing a tag triggers `release.yml` (PyPI Trusted Publishing
+ a GitHub release whose notes come from the tag annotation and the changelog).

Steps:

1. **Branch.** From an up-to-date `develop`, create `rc/vX.Y.Z`.
2. **Open the PR.** Push the branch and open a PR with **base `main`, head `rc/vX.Y.Z`**.
**Do not bump the version yet** — review may change the contents.
3. **Wait for CI + code review** (CI gates + Copilot/human review).
4. **Address review.** Push fixes to `rc/vX.Y.Z`; repeat 3–4 until the PR is approved.
Avoid `git add -A` (it sweeps unrelated working-tree edits into the commit) — stage
the files you changed.
5. **Release bump (only once approved).** Confirm `CHANGELOG.md` covers the release:
add `## [X.Y.Z] - YYYY-MM-DD` (move the `[Unreleased]` items into it) and update the
compare links at the bottom. Bump `ipax.__version__`. Run `python scripts/check.py`
and `pre-commit run kacl-verify --files CHANGELOG.md`. Push; let CI go green.
6. **Merge** the PR into `main`.
7. **Tag and sync.** Sync `main` (`git switch main && git pull --ff-only`), create the
annotated tag (`git tag -a vX.Y.Z main` with a short changelog-derived summary),
merge `main` back into `develop` (`--no-ff`), then push **both** `develop` and the
tag (`git push origin develop && git push origin vX.Y.Z`). The tag push runs
`release.yml` — confirm that workflow succeeds.

---

## References

- Wächter, A. & Biegler, L. T. (2006). "On the implementation of an interior-point
Expand Down
132 changes: 131 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,135 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

## [Unreleased]

## [0.3.0] - 2026-06-26

### Added
- Documentation: a published **S2MPJ / CUTEst benchmark** page
(`docs/benchmarks/s2mpj.md`) recording the latest full-corpus run — system
information, per-configuration metrics for the `{lbfgs, exact} × {dense, krylov,
sparse}` matrix, the optimization-vs-feasibility split, and the dataset-sourced
scoring methodology.
- The per-iteration log table reprints its column header every
`HEADER_REPEAT_INTERVAL` (10) rows so it stays readable on long runs, and marks
any iterate that already satisfies every enabled acceptable-stopping criterion
(before the required consecutive count) with a trailing `*`.
- Expanded the Hock–Schittkowski analytic-oracle set in `ipax.testing.problems`
with `HS9`, `HS21`, `HS28`, and `HS71`, covering active bound multipliers, a
degenerate (zero) equality multiplier, a non-unique periodic optimum, and the
full equality+inequality+bounds constraint mix. Each is exercised across every
backend in the integration suite, wired into the QC benchmark corpus, and
checked by a new finite-difference derivative-consistency test that also
back-fills the previously untested HS oracles.
- Two-sided linear inequalities (`Problem.linear_ineq`, `l ≤ A x ≤ u`) are now
solved: the constant-data block is lowered into the standard one-sided
inequality machinery (finite lower rows → `l − A x ≤ 0`, finite upper rows →
`A x − u ≤ 0`, both-finite rows yield a range pair), so the IPM, gradient
scaling, and every solver route handle it with no special-casing and the block
contributes no Lagrangian-Hessian term. Previously `solve` raised
`NotImplementedError` despite the interface being documented. A matrix-free
(operator) `linear_ineq` matrix still raises with guidance to use
`ineq_constraints` instead.

- S2MPJ scoring now uses the **dataset's own documented outcome** instead of
convergence alone: the loader parses each source file for the CUTEst
classification (`pbclass`), the SIF author's solution objective
(`# LO SOLTN`, present on ~72% of the corpus), and an explicit
`Solution (infeasible)` / `Source: an infeasible problem` marker. A case is
scored *correct* when it reaches the documented objective, or — for a
documented-infeasible problem like BURKEHAN — when it **detects infeasibility**
(previously flagged as a failure). The report shows the gap to the documented
optimum (`Δf*`) and annotates `infeasible (exp)`. The objective-free problems
(CUTEst feasibility / nonlinear-equation systems) can now be run via
`--include-objective-free` as `min 0` subject to the constraints, and one
configuration can be swept per process with `--config` for parallel runs.
- S2MPJ sweep gained a size-aware run strategy for tractable full-corpus runs:
**per-route variable caps** (dense 2000, Krylov 10000, sparse 25000) so each
config runs only on problems that fit its route — small problems are
cross-validated across every route while larger ones fall through to Krylov and
the sparse-direct route — with `--max-vars` kept as a global ceiling; **sized
instantiation** (`--size N`, with `PROBLEM(N)` for the scalable problems and a
SIF-default fallback for the rest) to reach the sparse route's intended large-`n`
regime; an optional subprocess **build-time guard** (`--max-build-seconds`) that
abandons a pathological O(n²) pure-Python construction before it stalls an
unattended sweep; and per-problem instance **caching** so the per-config fan-out
rebuilds each problem once instead of up to five times.
- S2MPJ benchmark sweep now exercises the **exact Lagrangian Hessian** and the
**sparse-direct route**, not only L-BFGS. The adapter (`_S2MPJExactProblem`) wires
S2MPJ's `LgHxy`/`LHxyv` (convention `L = f + yᵀc`) into ipax's exact-Hessian route,
mapping `(σ, y_eq, y_ineq)` onto S2MPJ's single multiplier vector with the correct
signs for lowered inequality sides (lower `−y`, upper `+y`) and honoring `σ` on the
objective term so it stays correct under gradient-based scaling. With `sparse=True`
the Jacobians and Hessian cross as `SparseOperator`s (true COO sparsity) for the
sparse-direct (Feral/cuDSS) factorization. The runner's regular matrix is now
`{lbfgs, exact} × {dense, krylov, sparse}` (`exact/sparse` factors true sparsity —
raise `--max-vars` to reach the large, sparse models), and its `--scaling` now
defaults to `gradient-based` to match the solver default rather than benchmarking a
scaling-off configuration users do not get.
- S2MPJ benchmark corpus: `benchmarks/corpus/s2mpj.py` loads the pure-Python
S2MPJ translations of the CUTEst/Hock–Schittkowski problems (no Fortran/SIF
toolchain) and bridges their NumPy/SciPy evaluation onto any CPU Array-API
backend, mapping S2MPJ's two-sided `clower ≤ c(x) ≤ cupper` constraints onto
ipax's eq/ineq split. A `benchmarks/runners/s2mpj.py` L-BFGS accuracy sweep
consumes it. `list_s2mpj_problems()` enumerates a checkout and the runner's
`--all` flag sweeps the **entire CUTEst set**, with `--max-vars`/`--max-iter`/
`--max-time` caps, per-problem isolation, automatic skipping of objective-free
problems, and a status summary. Download-gated (`IPAX_S2MPJ_DIR`); not vendored
(S2MPJ has no license) and not part of per-PR CI — the loader returns `[]` and
the gated tests skip when no checkout is present.

### Changed
- **Gradient-based scaling is now the default** (`ScalingOptions.method`
`"none"` → `"gradient-based"`), matching IPOPT. Across the full CUTEst/S2MPJ
corpus this solves a net **+67** problems (≈92 recovered, mostly from the
slow-converging `max_iter` bucket; ≈23 regressed, of which only 3–4 genuinely
fail — hard nonconvex/minimax cases that diverge under scaling — and the rest
merely converge slower). The returned `x`, objective, and multipliers are
reported in the original problem's units; pass `scaling="none"` to opt out.
- Promoted the driver's private vertical-stack operator to a public
`ipax.backend.operators.VStack` (now also exposing `row_inf_norms` for
gradient scaling), reused by both the equality assembly and the new
linear-inequality lowering.

### Fixed
- A stall at a near-optimal iterate is now reported `ACCEPTABLE` instead of
being discarded. Near a solution the condensed system is ill-conditioned (μ
driven below the achieved KKT residual), so the Newton step can come out
non-finite and the line search can fail to make progress even though the
iterate is essentially optimal. The solver now salvages such an iterate —
whether the failure is a non-finite **step solve** (previously
`NUMERICAL_ERROR`) or the line search **handing off to restoration**
(previously a false `INFEASIBLE`) — when its scaled KKT components are within a
relaxed multiple (IPOPT `acceptable_tol` ≈ 1e2 × `tol`) of the optimality
tolerances, rather than throwing away a usable solution.
- Fixed variables (`x_L == x_U`) — common in CUTEst-style models — no longer make
the solve fail at the first iteration. Such a variable has no strict barrier
interior, so `z = μ/(x − x_L)` was singular and the first Newton step came out
non-finite (`numerical_error`). The solver now relaxes fixed / near-degenerate
bound pairs symmetrically about their midpoint (IPOPT
`fixed_variable_treatment='relax_bounds'`), leaving well-separated bounds
untouched. Surfaced by the S2MPJ sweep, where it accounted for the bulk of the
first-iteration `numerical_error` failures.
- The filter line-search switching condition no longer raises `OverflowError` on
a badly-scaled iterate. Python's `float ** s_phi` raises instead of returning
`inf` once the result exceeds the double range (an enormous directional
derivative `dphi`), which crashed the whole solve; the power now uses IEEE
overflow semantics (`→ inf`). Surfaced by the S2MPJ INDEF sweep. The S2MPJ
benchmark adapter likewise sanitizes overflow in its NumPy-bridged
objective/gradient (returning `inf`), so a trial point that overflows the
problem's own generated `float**` is rejected rather than crashing
(e.g. LUKVLE4C, which then solves).
- Feasibility restoration no longer crashes on a numerically singular or
extreme-scale Gauss-Newton system. The damped (Levenberg–Marquardt) step now
treats a failed/non-finite linear solve as a rejected step — growing the
damping (up to a ceiling) and retrying — instead of letting the backend's
``solve`` raise (e.g. numpy ``LinAlgError: Singular matrix`` when a constraint
Jacobian blows up far from feasibility). Surfaced by the S2MPJ HS7 sweep; the
solve now degrades to a reported status rather than raising.
- `configure_verbosity` no longer attaches a second console handler when the
application has already configured its own handler on the `"ipax"` logger,
which previously printed every iteration record twice. Propagation to ancestor
handlers (and `caplog`) is unchanged.

## [0.2.0] - 2026-06-21

### Added
Expand Down Expand Up @@ -85,7 +214,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Contract batteries (`tests/contracts/`) plus unit/property/integration/backends/
regression layers; benchmark suite (`benchmarks/`, asv); MkDocs documentation.

[Unreleased]: https://github.com/wahln/ipax/compare/v0.2.0...HEAD
[Unreleased]: https://github.com/wahln/ipax/compare/v0.3.0...HEAD
[0.3.0]: https://github.com/wahln/ipax/compare/v0.2.0...v0.3.0
[0.2.0]: https://github.com/wahln/ipax/compare/v0.1.1...v0.2.0
[0.1.1]: https://github.com/wahln/ipax/compare/v0.1.0...v0.1.1
[0.1.0]: https://github.com/wahln/ipax/releases/tag/v0.1.0
39 changes: 39 additions & 0 deletions benchmarks/corpus/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,12 @@
HS6,
HS7,
HS8,
HS9,
HS21,
HS28,
HS35,
HS43,
HS71,
BoundConstrainedQP,
EqualityConstrainedQP,
UnconstrainedQuadratic,
Expand Down Expand Up @@ -61,6 +65,7 @@ class BenchmarkProblem:
default=lambda _problem: None, repr=False
)
backends: tuple[str, ...] | None = None
exclude_configs: tuple[str, ...] = () # config labels the QC sweep skips here


def _known(problem: Problem) -> Array | None:
Expand Down Expand Up @@ -149,6 +154,40 @@ def default_corpus() -> list[BenchmarkProblem]:
tags=("eq", "nonlinear"),
build=lambda xp: (HS8(xp), _arr(xp, [2.0, 1.0])),
),
BenchmarkProblem(
name="hs9",
kind="NLP",
tags=("eq", "nonlinear"),
build=lambda xp: (HS9(xp), _arr(xp, [0.0, 0.0])),
),
BenchmarkProblem(
name="hs21",
kind="QP",
tags=("bounds", "ineq"),
build=lambda xp: (HS21(xp), _arr(xp, [3.0, 1.0])),
optimum=_known,
),
BenchmarkProblem(
name="hs28",
kind="QP",
tags=("eq",),
build=lambda xp: (HS28(xp), _arr(xp, [-1.0, 0.5, 0.5])),
optimum=_known,
),
BenchmarkProblem(
name="hs71",
kind="NLP",
tags=("eq", "ineq", "bounds", "nonlinear"),
build=lambda xp: (HS71(xp), _arr(xp, [1.0, 5.0, 5.0, 1.0])),
optimum=_known,
# The Mehrotra/Gondzio correctors sit at HS71's convergence edge on this
# nonconvex problem and stall on some backends/platforms (e.g. CI's
# Torch build) while converging on others — a known corrector-robustness
# gap, not a per-PR regression. Exclude those configs here so the gate is
# deterministic; HS71 is still swept on every stable route, and covered
# under the default solve by the integration tests.
exclude_configs=("exact/dense+mehrotra", "exact/dense+gondzio"),
),
_rt_case(),
]

Expand Down
Loading
Loading