Skip to content

Portable Linux libccl.so (glibc ≥ 2.17 baseline build)#3

Draft
matiwinnetou wants to merge 6 commits into
developfrom
feature/static-linking-spike
Draft

Portable Linux libccl.so (glibc ≥ 2.17 baseline build)#3
matiwinnetou wants to merge 6 commits into
developfrom
feature/static-linking-spike

Conversation

@matiwinnetou

Copy link
Copy Markdown

Summary

Makes the shipped Linux libccl.so distro-independent so consumers no longer need a specific Ubuntu / glibc. Result of the static-linking spike (full findings in docs/spikes/static-linking.md).

Why not actual static linking?

The original idea was to statically link (no .so). The spike found that's not possible: GraalVM native-image cannot emit a static library (oracle/graal#3053, still open), and musl's run-anywhere property only applies to static executables, not shared libraries. A truly static, no-.so distribution would require re-architecting to an IPC subprocess model — rejected as too invasive.

What we do instead — glibc baseline

Build the Linux .so inside an old-glibc container (manylinux_2_28). Because glibc is backward-compatible, a lib built against an old baseline runs on that baseline and everything newer — while the in-process FFI stays exactly as-is for all four wrappers.

Measured result (CI): the .so references no symbol newer than GLIBC_2.17, and it loads + runs a real key-derivation on centos:7 (glibc 2.17):

ldd (GNU libc) 2.17
libccl version: 0.1.0
account ok (testnet address derived)
SMOKE OK

→ runs on RHEL/CentOS 7+, Amazon Linux 2, Ubuntu 18.04+, Debian 9+ and all newer distros. (Current ubuntu-latest build demands ~GLIBC_2.39 and runs on none of those.) Not Alpine/musl — a musl variant is a possible future follow-up.

Changes

  • .github/workflows/portable-linux-lib.yml (new permanent guard, on PRs + develop/main): builds in manylinux_2_28, asserts the glibc floor via objdump, and re-runs the centos:7 smoke as a strict run-on-2.17 regression check.
  • release.yml: Linux artifact split into a manylinux_2_28 container job so every shipped Linux .so is glibc-≥2.17 portable; re-asserts the floor before packaging. macOS/Windows unchanged.
  • docs/spikes/static-linking.md: full spike findings, options, decision, and proof.
  • docs/spikes/smoke.c: minimal isolate + version + account harness used for the run-proof.
  • README.md, TODO.md: document the glibc-2.17 floor; mark the backlog item done.

Verification

The Portable Linux lib workflow on this PR builds the lib, asserts GLIBC ≤ 2.28, and runs it on centos:7 (glibc 2.17). Green = portability proven on this commit.

🤖 Generated with Claude Code

Mateusz Czeladka and others added 4 commits June 25, 2026 15:53
native-image cannot emit a static library (oracle/graal#3053, still open) and
musl portability applies only to static executables, not shared libs — so a
fully-static, no-.so FFI distribution isn't possible without an IPC rewrite.
Pursue the real goal (glibc/distro independence) instead by building the Linux
.so against an old glibc baseline.

- docs/spikes/static-linking.md: findings, options, decision (Option A).
- static-linking-spike.yml: build libccl.so in manylinux_2_28 (glibc 2.28),
  assert max required GLIBC symbol <= 2.28 via objdump.
- TODO: mark the static-linking item spiked with the verdict.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI run 28175185458 green — the .so references no glibc symbol newer than 2.17,
so it runs on glibc >= 2.17 (RHEL/CentOS 7+, Amazon Linux 2, Ubuntu 18.04+,
Debian 9+). Better than the 2.28 target. Records the objdump output and notes
the remaining follow-up: a run-on-old-distro smoke test before rollout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Compile a minimal isolate+version+account smoke harness in the manylinux_2_28
builder, then execute the prebuilt binary inside centos:7 (glibc 2.17) with no
package installs — confirming the lib loads and runs on the measured floor, not
only that it links against an old baseline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Spike proven: building libccl.so in manylinux_2_28 yields a lib requiring only
GLIBC_2.17, which loads and runs on centos:7. Promote it from experiment to
permanent infrastructure:

- portable-linux-lib.yml: permanent guard on PRs + develop/main — builds in
  manylinux_2_28, asserts the glibc floor via objdump, and re-runs the centos:7
  smoke as a strict run-on-2.17 regression check (renamed from the spike file).
- release.yml: split Linux out of the matrix into a manylinux_2_28 container job
  so every shipped Linux artifact is glibc->=2.17 portable; macOS/Windows
  unchanged. Re-asserts the floor before packaging.
- README: document the glibc 2.17 floor (and the Alpine/musl caveat).
- docs/spikes/static-linking.md, TODO: record the run-proof and rollout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@matiwinnetou matiwinnetou marked this pull request as draft June 25, 2026 14:23
Mateusz Czeladka and others added 2 commits June 25, 2026 17:28
Per Satya's review: glibc-baseline solves OS-library portability, but the binary
also defaults to the build machine's CPU instruction set and can SIGILL on older
/ datacenter CPUs lacking AVX2/AVX-512. -march=compatibility emits only the
baseline instructions common to all CPUs of the target arch, closing the
CPU-portability axis. Placed in native-image.properties (the canonical spot).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add docs/adr/ (Architecture Decision Records) with a template + index, and
record 10 decisions retrospectively and new:

  0001 native shared library via GraalVM native-image + C FFI
  0002 offline, stateless bridge; caller-supplied chain data; no provider in libccl
  0003 one FFI, four language wrappers; uniform thin contract; explicit inputs
  0004 Bun is the only supported JavaScript runtime
  0005 standardize on Oracle GraalVM 25.0.3
  0006 TxPlan (YAML) transaction format, replacing the bespoke JSON spec
  0007 Plutus exec units are caller-supplied; evaluator-agnostic
  0008 Linux portability: glibc-baseline build + -march=compatibility (not static)
  0009 branch & release process: feature -> develop, one large develop -> main PR
  0010 Go wrapper isolate thread-affinity

Replace the ephemeral docs/spikes/ with ADR-0008 (the durable rationale), move
the CI-used harness docs/spikes/smoke.c -> native-test/src/smoke.c (where C
smoke tests belong), and update all references (README, TODO, both workflows).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant