Skip to content

Add Chebyshev approximations#18

Open
sigilante wants to merge 8 commits into
mainfrom
claude/add-numeric-types-A3HVY
Open

Add Chebyshev approximations#18
sigilante wants to merge 8 commits into
mainfrom
claude/add-numeric-types-A3HVY

Conversation

@sigilante

Copy link
Copy Markdown
Collaborator

Prototype implementation demonstrating jet-compatible approach for transcendental functions (exp, sin, cos, tan) using:

  • Fixed polynomial coefficients from musl libc
  • Deterministic argument reduction
  • Horner's method evaluation

Key principle: every FP operation happens in same order with same intermediate values in both Hoon and C jet.

Prototype implementation demonstrating jet-compatible approach for
transcendental functions (exp, sin, cos, tan) using:
- Fixed polynomial coefficients from musl libc
- Deterministic argument reduction
- Horner's method evaluation

Key principle: every FP operation happens in same order with same
intermediate values in both Hoon and C jet.
Complete fixed-polynomial math library with:
- Half precision (@rh): degree 5 sin, degree 4 cos, degree 3 exp
- Single precision (@rs): degree 9 sin, degree 10 cos, degree 4 exp
- Double precision (@rd): degree 13 sin, degree 14 cos, degree 5 exp

All coefficients from musl libc expressed as hex literals for
bit-exact jet compatibility. Each precision level includes:
- Polynomial coefficients and reduction constants
- Chebyshev polynomial generator (+cheb)
- Kernel functions (sindf, cosdf)
- Public API (sin, cos, tan, exp)
- Helper functions (scalbn, floor-int, rem-pio2)
Complete binary128 (128-bit) implementation with:
- Degree 17 sin polynomial (S1-S8)
- Degree 16 cos polynomial (C1-C8)
- Degree 7 exp polynomial (P1-P7)
- Chebyshev polynomial generator (+cheb)
- Helper functions (scalbn, floor-int, rem-pio2)

Coefficients derived from FreeBSD ld128/k_sinl.c, k_cosl.c.
Uses .~~~N syntax for quad precision literals.
Extended math-chebyshev.hoon with:
- log: Fixed polynomial (Lg1-Lg4 for @rs, Lg1-Lg7 for @rd)
- sqrt: Newton-Raphson with fixed iterations (4 for @rs, 6 for @rd)
- atan: Fixed polynomial with range reduction
- asin, acos, atan2: Derived from atan
- pow-n: Binary exponentiation (fixed 32/64 iterations)
- pow: exp(n*log(x)) for non-integer exponents
- log-2, log-10: Derived from log

Added test suite with:
- Individual function tests for @rs and @rd
- Identity tests (sin^2+cos^2=1, exp(log(x))=x, etc.)
- Chebyshev polynomial verification
Half precision (@rh):
- log: Simplified polynomial (single Lg coefficient)
- sqrt: Newton-Raphson with 3 iterations
- atan: Reduced polynomial with large-argument reduction
- asin, acos, atan2, pow-n, pow

Quad precision (@rq):
- log: 8-coefficient polynomial (Lg1-Lg8) for 113-bit accuracy
- sqrt: Newton-Raphson with 8 iterations
- atan: 6-coefficient polynomial with range reduction
- asin, acos, atan2, pow-n, pow

All functions use hex-exact coefficients for jet compatibility.
Adds comprehensive tests for half precision (@rh) and quad precision (@rq):

- Helper functions: expect-near-rh, expect-near-rq
- Function tests: sin, cos, exp, log, sqrt, atan, pow-n, cheb
- Identity tests: sin²+cos²=1, exp(log(x))=x, sqrt(x)²=x

Half precision tests use hex literals (0x3c00 = 1.0, etc.)
Quad precision tests use .~~~X literal format
Hyperbolic functions added to all precision levels (@rh, @rs, @rd, @rq):
- sinh: Hyperbolic sine via (exp(x) - exp(-x)) / 2
- cosh: Hyperbolic cosine via (exp(x) + exp(-x)) / 2
- tanh: Hyperbolic tangent via (e^2x - 1) / (e^2x + 1)
- asinh: Inverse hyperbolic sine via ln(x + sqrt(x^2 + 1))
- acosh: Inverse hyperbolic cosine via ln(x + sqrt(x^2 - 1))
- atanh: Inverse hyperbolic tangent via 0.5 * ln((1+x)/(1-x))

Special function:
- erf: Error function using Abramowitz & Stegun approximation 7.1.26

Tests include:
- Individual function tests for sinh, cosh, tanh, asinh, acosh, atanh
- Hyperbolic identity: cosh^2(x) - sinh^2(x) = 1
- Inverse identities: asinh(sinh(x))=x, acosh(cosh(x))=x, atanh(tanh(x))=x
- erf function tests and symmetry: erf(-x) = -erf(x)
Gamma function:
- Uses Lanczos approximation (g=7, n=9 coefficients)
- lgamma: Log-gamma function via log(gamma(x))
- Handles special cases: Γ(1)=1, Γ(2)=1

Bessel functions of first kind:
- j0: J₀(x) using polynomial for |x|<8, asymptotic otherwise
- j1: J₁(x) with proper sign handling

Bessel functions of second kind:
- y0: Y₀(x) using log-based formula for small x
- y1: Y₁(x) with asymptotic expansion for large x

Tests include:
- Gamma at integer values (factorial property)
- Γ(1.5) = √π/2 ~ 1.7724538
- Γ(n+1) = n*Γ(n) identity
- J0, J1 at standard test points
- Numerical derivative identity: J1(x) ≈ -J0'(x)
- Y0, Y1 at standard test points
@sigilante

Copy link
Copy Markdown
Collaborator Author

Follow-up analysis: libmath direction (from the Saloon eig work)

Captured here so it isn't lost — context is the Saloon eigendecomposition effort (PRs #47/#48), which surfaced concrete evidence for why this PR's approach is the right one. Pending for now; this is a roadmap, not a request to merge.

The problem this PR solves (with evidence)

The current math.hoon implements transcendentals as naive Taylor series with rtol-driven convergence loops (exp/sin/cos/log/sqt/… all loop until (lth (abs (sub po p)) rtol)). Three structural problems, worst last:

  1. Unbounded → can hang the ship. While developing Hermitian eig (Saloon: Hermitian (%cplx) eigendecomposition (eig Phase A2) #48), a sub-ULP rtol made Newton sqt oscillate between two adjacent floats forever → loom exhaustion → recover: dig: meme + segfault (had to reboot with a bigger loom). This hazard exists in every rtol-loop transcendental, not just sqt.
  2. Inaccurate. Argument reduction is minimal (sin does a single lossy (mod x tau); exp has none past overflow guards). Taylor converges slowly with cancellation — the doc comments show (cos pi) = -1.0000000000013558 (~1.4e-12 off, not f64-accurate). That's an accuracy ceiling for the whole Lagoon/Saloon stack.
  3. Not bit-exactly jettable — the decisive one. Iteration count depends on value and rtol, and term order is series-specific, so a C jet can't reproduce the exact SoftFloat op sequence to yield an identical noun. Effectively unjettable as written.

Why the fixed-polynomial approach here is correct

This PR's stated principle — "every FP op happens in the same order with the same intermediate values in both Hoon and the C jet" — fixes all three at once:

  • Fixed-degree minimax/Chebyshev polys with hex-exact frozen coefficients (musl is a fine reference) → bounded constant work → no hang.
  • Deterministic argument reduction (Cody-Waite / Payne-Hanek) + Horner (or Clenshaw for a Chebyshev series) → near-correctly-rounded at low degree → accurate.
  • Fixed straight-line op sequence → a C jet calling SoftFloat in the same order is bit-identical. Since vere's jets and Hoon @rd ops share the SoftFloat engine, bit-exactness is attainable — but only once the op order is fixed (which Taylor-with-variable-iterations is not).

Consequences / decisions for the migration

  • rtol largely goes away. Fixed-degree functions take no tolerance; the [rnd rtol] door sample becomes vestigial for most arms. This also removes the exact foot-gun that crashed the ship (mismatched-width rtol).
  • sqt is the natural first target and a clean template: Newton-for-sqrt with a fixed iteration count (5–6 steps after a good seed) is already deterministic and jettable — quadratic convergence makes it bit-stable. Saloon Saloon: Hermitian (%cplx) eigendecomposition (eig Phase A2) #48 ships an interim iteration-capped fsqt; the principled replacement is a jettable libmath sqt that Saloon then defers to.
  • Clenshaw (Chebyshev series) vs Horner (minimax monomials): both are fixed-op-order and jettable. musl-style Horner with frozen coefficients matches the coefficient tables directly and is simplest for the jet path; the +cheb T_n recurrence in this PR is best viewed as derivation tooling, somewhat orthogonal to runtime evaluation. Recommend Horner for the jetted path, keep a Chebyshev generator as tooling.
  • Payoff is automatic for Saloon/Lagoon: their element-wise exp/sin/… and the eig sqt route through libmath via trans-scalar, so converting libmath gives the array stack accurate + fast + jettable transcendentals for free.
  • vs PR Add Chebyshev function support for more efficient calculations. #9 (sigilante/cheb): older and sprawls across lib3d/lagoon/math; likely superseded by this focused prototype. Salvage any useful coefficients, otherwise let it lapse.

Suggested first step (when picked up)

Convert sqt (all widths) to a fixed-iteration, rtol-free, bit-exact-jet-ready form as the migration template, then retire Saloon's interim fsqt. Small, self-contained, closes the durability hole, and establishes the pattern for exp/sin/cos/log.

@sigilante

Copy link
Copy Markdown
Collaborator Author

@sigilante

Copy link
Copy Markdown
Collaborator Author

P2 hazards from the Gnome review — concrete cases the rewrite must fix

A behavioral red-team of the current math.hoon (naive Taylor/Newton with rtol-driven loops) reproduced the following. These are acceptance criteria for the fixed-polynomial rewrite — i.e. the new implementations should handle each correctly, and ideally there should be a regression test per item.

Hangs / unbounded cost (the headline)

  • log(z) is O(z²) for large z — the series ratio ((z−1)/(z+1))²→1, so e.g. z=1e4 needs ~21k iterations, z=1e6 ~200k (each calling an i-step pow-n). pow(x,n) for x>~10 routes through this. → range-reduce log (log(2^k · m)); never iterate to convergence on the raw argument.
  • log(z) for z ≤ 0: no guard. z=0 converges to ~−π/2 (wrong); z<0 diverges to NaN/garbage. → return NaN for z<0, −inf for z=0.

Silent garbage

  • exp(x) for moderately-negative x (between the underflow threshold −88.7/−709.8 and 0): catastrophic Taylor cancellation. exp(−10) is ~1100% off; exp(−20)→NaN (f32); exp(−50) off by 13 orders (f64). → range reduction (exp(x) = 2^k · exp(r)), or compute 1/exp(−x) for x<0.
  • sin/cos large-argument: range reduction is a single (mod x tau) (and mod itself is toi-based), losing bits before the polynomial. → Cody-Waite / Payne-Hanek reduction.

Crashes on reachable inputs

  • asin(x)/acos(x) crash (bare !!) for any |x|>1 that isn't exactly ±1 — e.g. 1.0 + 1 ULP, which arises from rounding. → clamp to ±1 or return NaN.
  • sqt(-0.0) hard-crashes (?> (sgn x) rejects −0). IEEE wants sqrt(−0)=−0. sqt(2^-149) (smallest f32 subnormal) +inf (the g=x/2 initial guess flushes to 0). → handle ±0 and seed the iteration so subnormals converge.

Latent

  • All NaN-detection guards are no-ops. The pattern ?. (^gte (dis 0x7fc0.0000 x) 0) uses dis (bitwise AND) → an unsigned atom → always ≥0, so the NaN branch never fires (currently benign by accident). → use a real NaN test, or the new structure should make NaN/Inf handling explicit.
  • rd::log(-inf) returns 0.0, not NaN (the rs sibling is correct; rd was copy-pasted from exp). → return NaN in the rewrite.

Why fixed-polynomial fixes the class

The Chebyshev/minimax + Horner approach with proper argument reduction is inherently bounded (no convergence loop → no hang) and lets domain/NaN/Inf handling be explicit prologue rather than emergent loop behavior. The interim takeaway even before the full rewrite: every rtol loop needs a domain guard + an iteration cap, since a sub-ULP rtol or an out-of-domain argument currently hangs or crashes the ship (this is the same hazard that OOM-crashed it via sqt during the eig work).

(Context: surfaced during the Saloon eig review; Saloon's element-wise exp/sin/log inherit all of these, though its eig path is now insulated by an iteration-capped fsqt.)

@sigilante

Copy link
Copy Markdown
Collaborator Author

Chebyshev transcendentals — implementation & jetting plan

Governing principles (non-negotiable)

  1. ACCURACY / CORRECTNESS. Faithful rounding (≤ 1 ULP of the true value) over the whole domain — modulo principle 2. (Correctly-rounded where it's cheap, but ≤1 ULP is the target.)
  2. EXACT ALGORITHMIC MATCH. The Hoon reference and the C jet run the identical algorithm — same argument reduction, same minimax polynomial, same coefficients, same evaluation order (Horner). They must agree bit-for-bit, not merely within a ULP.
  3. ALL FLOATING-POINT IN SOFTWARE, NEVER HARDWARE. Hoon uses the software fl/@r engine; C jets use SoftFloat (f16/f32/f64/f128) and SoftBLAS for array ops. No native float instructions anywhere. This is what makes principle 2 deterministic across platforms.

Why this replaces the current transcendentals

Today every transcendental (/lib/math, /lib/unum, /lib/complex) is fixed-term Taylor/AGM with no argument reduction — accurate only near the expansion point. Our edge sweeps documented the failure modes precisely (sin(10) leaves [−1,1], exp(50) saturates, cexp/csin lose digits off-origin). Naive Taylor is also effectively un-jettable bit-exactly: 20–40 iteration loops accumulate rounding differently in Hoon vs C. Minimax + range reduction is both the accuracy fix and the jetting enabler — a short fixed polynomial is trivially bit-exact between Hoon and C.

Architecture

  • Per function: reduce to a small primary interval → evaluate a minimax polynomial (Remez/Chebyshev coefficients, precomputed offline via Sollya/mpmath) → reconstruct. Every step in software FP.
  • Coefficients are baked constants, per function per width, with documented provenance and the verified faithful-rounding bound.
  • Hoon: rewrite math.hoon's exp/log/sin/cos/tan/sqt/cbrt/atan/asin/acos/pow on the fl door. /lib/unum and /lib/complex compose the real component functions (already self-contained, no /+ math), so they inherit the upgrade.
  • C jets: SoftFloat implementing the same reduction + polynomial + coefficients; SoftBLAS for the array-level (Lagoon/Saloon Tier-B elementwise) path.

Bit-exactness strategy (the crux)

Identical algorithm + identical software primitives ⇒ identical bits. Expected test values are generated from a SoftFloat reference running the same algorithm (Berkeley SoftFloat via ctypes / the softfloat package), not from mpmath truth. mpmath/Sollya is used to design the coefficients and prove the ≤1-ULP bound; SoftFloat-of-the-algorithm gives the exact bit pattern we lock in tests and that the jet must reproduce.

32- vs 64-bit Vere (the stress case)

We ship two jet variants (the standard 32-bit and the 64-bit vere64, as with the %mod jet). @rs (f32) is identical across both → one expected value, must match both binaries and Hoon. @rd (f64) is the cross-variant stress case → both variants' f64 jet paths must be bit-exact to the Hoon reference (and to each other); @rd is where divergence would show, so it's the gate that proves the "all software FP ⇒ deterministic" claim. Test @rd on both binaries.

Phasing

  • Phase 0 — fix the bare-la default rnd to %n (the +sa half is saloon: default +sa rounding mode to %n (was %z) #61). Coupled to the C-fmod %mod jet (#1021/#1022) reaching the runtime, since mod reads the door rnd.
  • Phase 1 (vertical slice)math exp + log at @rs and @rd: Hoon Chebyshev + SoftFloat-oracle re-baseline + matching C jet, proven bit-exact Hoon↔jet on both Vere variants. This exercises the entire pipeline on the hardest axis (@rd cross-variant) before fanning out.
  • Phase 2 — remaining math functions at all four widths.
  • Phase 3 — propagate through /lib/unum and /lib/complex (re-derive posit/complex coefficients or reuse via the real component path).
  • Phase 4 — wire the transcendental jets into Lagoon/Saloon Tier B (the array-level elementwise jetting); eventually Saloon eig.

Re-baselining (expected churn)

Every current transcendental test (math, unum-fns/unum-edge, complex-fns/complex-edge, saloon-*) locks the naive outputs and will change. Regenerate expected values from the SoftFloat reference. The large-argument "documented limitation" cases flip to correctness asserts (Chebyshev fixes sin(10), exp(50), etc.); the structural edge cases (origin, branch cuts, clog(0)=−inf, domain→NaR) stay.

Acceptance criteria

  1. Hoon and C jet produce identical bits for every tested input, on both 32- and 64-bit Vere.
  2. Faithful rounding (≤ 1 ULP vs true) across each function's domain, verified vs mpmath/Sollya.
  3. No hardware FP anywhere (Hoon fl / C SoftFloat only).
  4. Previously-divergent large-argument cases now correct.
  5. All re-baselined suites green on ~hex and against the jet binary.

— Prereqs already in place: pure-Hoon references for all three kinds, oracle tooling (posit_check.py/complex_check.py, mpmath/numpy/SoftPosit), comprehensive oracle-backed + edge tests, and a mature on-ship harness. The vertical slice (Phase 1) is the recommended starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants