Skip to content

build: platform-agnostic host tests and toolchain reproducibility#1497

Merged
gmarull merged 7 commits into
coredevices:mainfrom
Mearman:build/native-platform-agnostic
Jun 12, 2026
Merged

build: platform-agnostic host tests and toolchain reproducibility#1497
gmarull merged 7 commits into
coredevices:mainfrom
Mearman:build/native-platform-agnostic

Conversation

@Mearman

@Mearman Mearman commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

The host test suite has environmental assumptions that break on a different OS or toolchain. This fixes the ones that cause test failures on Linux CI but not macOS (or the reverse), and adds toolchain flags for reproducible firmware builds.

-fno-common in host test CFLAGS catches tentative symbol definitions that macOS's linker silently merges but GNU ld rejects. This is what caught the enum { ... } Name; pattern in test_app_fetch_endpoint.

Include order fix in floor.c: pblibc_private.h renames log/pow to pblibc_log/pblibc_pow, and if it's included before <math.h> on glibc, the __DECL_SIMD_* macros blow up. Fixed by putting system headers first, matching the pattern already used by pow.c, scalbn.c, and sqrt.c. Un-quarantined test_floor and test_pow.

ARM GCC reproducibility flags (-fno-guess-branch-proficiency, -fno-volatile-speculation) for consistent firmware builds.

-ffp-contract=off in host test CFLAGS so floating-point contraction doesn't vary between compilers.

test_pow and test_log replaced their host-libm oracle with pre-generated reference tables. The host libm disagrees between macOS and Linux in the last ULP on some inputs, so using it as a ground truth was inherently non-portable. The reference tables are correctly-rounded values generated by gen_pow_reference.py and gen_log_reference.py using Python's decimal module at 80 significant digits.

Merge order: 1 (independent)
Depended on by: #1499

Mearman and others added 6 commits June 12, 2026 08:13
A bare tentative definition (e.g. `enum { X } Name;` at file scope in a
header included by multiple TUs) is silently merged by macOS ld64 but
rejected by GNU ld. That asymmetry means a duplicate-symbol bug can
pass on macOS CI and only surface on Linux, or vice versa.

-fno-common makes both toolchains reject the collision at link time, so
the bug is caught on whichever host runs the tests first.

GCC has defaulted to -fno-common since GCC 10 (released 2020), so this
flag is a no-op on any current Linux host. It only changes behaviour on
macOS, where Clang still accepts tentative definitions unless told not to.

The flag is placed in the 'local' env CFLAGS block (wscript lines
237-251) which applies to every host unit test; it has no effect on
firmware builds, which use a separate cross-compilation environment.

Signed-off-by: Joseph Mearman <joseph@mearman.co.uk>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On glibc, including <math.h> after <pblibc_private.h> breaks
compilation: pblibc_private.h does `#undef floor; #define floor
pblibc_floor` before glibc's <math.h> is processed, so glibc sees
`floor` already redefined when it tries to emit its __DECL_SIMD_*
declarations.

Move <math.h>, <stdbool.h>, and <stdint.h> above the
<pblibc_private.h> include in floor.c, matching the safe pattern
already used by pow.c, scalbn.c, and sqrt.c (system headers first,
pblibc_private.h last). The rename still takes effect for the
function definition that follows.

Remove test_floor.c and test_pow.c from BROKEN_TESTS: test_floor.c
uses only literal expected values and passes once floor.c compiles
correctly; test_pow.c's newlib implementation is deterministic across
hosts.

Signed-off-by: Joseph Mearman <joseph@mearman.co.uk>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add -ffile-prefix-map and -fdebug-prefix-map to the firmware CFLAGS and
ASFLAGS, mapping the absolute source root to '.'.  This removes
machine-specific paths from debug info and __FILE__ strings, making
firmware binaries byte-reproducible across different checkout paths.

-ffile-prefix-map is the preferred flag (covers both debug info and
macros); -fdebug-prefix-map is included as well for compatibility with
older toolchain versions that pre-date -ffile-prefix-map.

Also set ARFLAGS to 'rcsD' (deterministic mode) for the firmware ar
step.  Without the 'D' modifier, gcc-ar embeds timestamps and UIDs in
.a files, producing non-reproducible archives even when object files
are identical.  'D' suppresses those fields.

Both flags apply to the firmware env only; host-test compilation is
unaffected.

Signed-off-by: Joseph Mearman <joseph@mearman.co.uk>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add -ffp-contract=off and -fexcess-precision=standard to the host
unit-test CFLAGS (alongside -fno-common in the local test env).

-ffp-contract=off stops the compiler contracting multiply-add pairs
into FMA instructions, which macOS clang does by default but Linux
clang typically does not. This contraction changes intermediate
precision and is the classic source of last-ULP float differences
between platforms, so floating-point test results are now reproducible
across hosts without changing any test assertions.

-fexcess-precision=standard is a no-op on SSE-only x86-64 (where
the compiler already uses 64-bit SSE registers), but it future-proofs
any 32-bit or x87 host path where the compiler might otherwise keep
80-bit intermediates in registers between operations.

Neither flag affects firmware builds; they are applied only to the
host test environment configured in the 'local' waf conf block.

Signed-off-by: Joseph Mearman <joseph@mearman.co.uk>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test_pow compared pblibc_pow against the host libm at runtime: it
defined pow_theirs() before the pblibc_private.h rename so it called the
host pow, then compared pblibc_pow within 1 ulp. glibc and Apple's libm
disagree in the last ulp, so the expected values were host-specific.

Precompute correctly-rounded reference values off-host instead.
gen_pow_reference.py evaluates pow(2, v) as exp(v * ln 2) in the stdlib
decimal module at 80 significant digits and rounds to the nearest double
(round-half-to-even), for v = i * 0.001, i in [0, 10000), matching the
inputs the test exercises. It emits input and expected IEEE-754 bit
patterns into pow_reference.h so no decimal round-trip loss occurs. The
table is identical on every host and the script is checked in so it is
regenerable.

The test now compares pblibc_pow against the table within 1 ulp.
pow.c (newlib) documents its result as "nearly rounded"; measured
against the correctly-rounded reference it is within 1 ulp, so 1 ulp is
the justified tolerance, not an invented one. The host libm is no longer
called.

Signed-off-by: Joseph Mearman <joseph@mearman.co.uk>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
test_log compared pblibc_log against the host libm at runtime: it
defined log_theirs() before the pblibc_private.h rename so it called the
host log, then compared pblibc_log within 1 ulp. glibc and Apple's libm
disagree in the last ulp, so the expected values were host-specific.

Precompute correctly-rounded reference values off-host instead.
gen_log_reference.py evaluates log(v) in the stdlib decimal module at 80
significant digits and rounds to the nearest double (round-half-to-even),
for v = i * 0.001, i in [1, 10000), matching the inputs the test
exercises. It emits input and expected IEEE-754 bit patterns into
log_reference.h so no decimal round-trip loss occurs. The table is
identical on every host and the script is checked in so it is
regenerable.

The test now compares pblibc_log against the table within 1 ulp.
log.c (newlib) documents its error as always less than 1 ulp relative to
the true value, so 1 ulp against the correctly-rounded reference is the
justified tolerance, not an invented one. The host libm is no longer
called.

Also rename the fixture from test_pow__initialize to
test_log__initialize so clar actually runs it for this suite; the old
name meant the rounding-mode setup never fired here.

Signed-off-by: Joseph Mearman <joseph@mearman.co.uk>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Joseph Mearman <joseph@mearman.co.uk>
@Mearman Mearman force-pushed the build/native-platform-agnostic branch from d3e7335 to c8cfbb0 Compare June 12, 2026 09:29
@Mearman Mearman marked this pull request as ready for review June 12, 2026 09:53
@Mearman Mearman requested review from gmarull and jplexer as code owners June 12, 2026 09:53
@gmarull gmarull merged commit 488aa14 into coredevices:main Jun 12, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants