build: platform-agnostic host tests and toolchain reproducibility#1497
Merged
gmarull merged 7 commits intoJun 12, 2026
Conversation
A bare tentative definition (e.g. `enum { X } Name;` at file scope in a
header included by multiple TUs) is silently merged by macOS ld64 but
rejected by GNU ld. That asymmetry means a duplicate-symbol bug can
pass on macOS CI and only surface on Linux, or vice versa.
-fno-common makes both toolchains reject the collision at link time, so
the bug is caught on whichever host runs the tests first.
GCC has defaulted to -fno-common since GCC 10 (released 2020), so this
flag is a no-op on any current Linux host. It only changes behaviour on
macOS, where Clang still accepts tentative definitions unless told not to.
The flag is placed in the 'local' env CFLAGS block (wscript lines
237-251) which applies to every host unit test; it has no effect on
firmware builds, which use a separate cross-compilation environment.
Signed-off-by: Joseph Mearman <joseph@mearman.co.uk>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On glibc, including <math.h> after <pblibc_private.h> breaks compilation: pblibc_private.h does `#undef floor; #define floor pblibc_floor` before glibc's <math.h> is processed, so glibc sees `floor` already redefined when it tries to emit its __DECL_SIMD_* declarations. Move <math.h>, <stdbool.h>, and <stdint.h> above the <pblibc_private.h> include in floor.c, matching the safe pattern already used by pow.c, scalbn.c, and sqrt.c (system headers first, pblibc_private.h last). The rename still takes effect for the function definition that follows. Remove test_floor.c and test_pow.c from BROKEN_TESTS: test_floor.c uses only literal expected values and passes once floor.c compiles correctly; test_pow.c's newlib implementation is deterministic across hosts. Signed-off-by: Joseph Mearman <joseph@mearman.co.uk> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add -ffile-prefix-map and -fdebug-prefix-map to the firmware CFLAGS and ASFLAGS, mapping the absolute source root to '.'. This removes machine-specific paths from debug info and __FILE__ strings, making firmware binaries byte-reproducible across different checkout paths. -ffile-prefix-map is the preferred flag (covers both debug info and macros); -fdebug-prefix-map is included as well for compatibility with older toolchain versions that pre-date -ffile-prefix-map. Also set ARFLAGS to 'rcsD' (deterministic mode) for the firmware ar step. Without the 'D' modifier, gcc-ar embeds timestamps and UIDs in .a files, producing non-reproducible archives even when object files are identical. 'D' suppresses those fields. Both flags apply to the firmware env only; host-test compilation is unaffected. Signed-off-by: Joseph Mearman <joseph@mearman.co.uk> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add -ffp-contract=off and -fexcess-precision=standard to the host unit-test CFLAGS (alongside -fno-common in the local test env). -ffp-contract=off stops the compiler contracting multiply-add pairs into FMA instructions, which macOS clang does by default but Linux clang typically does not. This contraction changes intermediate precision and is the classic source of last-ULP float differences between platforms, so floating-point test results are now reproducible across hosts without changing any test assertions. -fexcess-precision=standard is a no-op on SSE-only x86-64 (where the compiler already uses 64-bit SSE registers), but it future-proofs any 32-bit or x87 host path where the compiler might otherwise keep 80-bit intermediates in registers between operations. Neither flag affects firmware builds; they are applied only to the host test environment configured in the 'local' waf conf block. Signed-off-by: Joseph Mearman <joseph@mearman.co.uk> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test_pow compared pblibc_pow against the host libm at runtime: it defined pow_theirs() before the pblibc_private.h rename so it called the host pow, then compared pblibc_pow within 1 ulp. glibc and Apple's libm disagree in the last ulp, so the expected values were host-specific. Precompute correctly-rounded reference values off-host instead. gen_pow_reference.py evaluates pow(2, v) as exp(v * ln 2) in the stdlib decimal module at 80 significant digits and rounds to the nearest double (round-half-to-even), for v = i * 0.001, i in [0, 10000), matching the inputs the test exercises. It emits input and expected IEEE-754 bit patterns into pow_reference.h so no decimal round-trip loss occurs. The table is identical on every host and the script is checked in so it is regenerable. The test now compares pblibc_pow against the table within 1 ulp. pow.c (newlib) documents its result as "nearly rounded"; measured against the correctly-rounded reference it is within 1 ulp, so 1 ulp is the justified tolerance, not an invented one. The host libm is no longer called. Signed-off-by: Joseph Mearman <joseph@mearman.co.uk> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
test_log compared pblibc_log against the host libm at runtime: it defined log_theirs() before the pblibc_private.h rename so it called the host log, then compared pblibc_log within 1 ulp. glibc and Apple's libm disagree in the last ulp, so the expected values were host-specific. Precompute correctly-rounded reference values off-host instead. gen_log_reference.py evaluates log(v) in the stdlib decimal module at 80 significant digits and rounds to the nearest double (round-half-to-even), for v = i * 0.001, i in [1, 10000), matching the inputs the test exercises. It emits input and expected IEEE-754 bit patterns into log_reference.h so no decimal round-trip loss occurs. The table is identical on every host and the script is checked in so it is regenerable. The test now compares pblibc_log against the table within 1 ulp. log.c (newlib) documents its error as always less than 1 ulp relative to the true value, so 1 ulp against the correctly-rounded reference is the justified tolerance, not an invented one. The host libm is no longer called. Also rename the fixture from test_pow__initialize to test_log__initialize so clar actually runs it for this suite; the old name meant the rounding-mode setup never fired here. Signed-off-by: Joseph Mearman <joseph@mearman.co.uk> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
c27ff86 to
794ac13
Compare
This was referenced Jun 12, 2026
Signed-off-by: Joseph Mearman <joseph@mearman.co.uk>
d3e7335 to
c8cfbb0
Compare
gmarull
approved these changes
Jun 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The host test suite has environmental assumptions that break on a different OS or toolchain. This fixes the ones that cause test failures on Linux CI but not macOS (or the reverse), and adds toolchain flags for reproducible firmware builds.
-fno-commonin host test CFLAGS catches tentative symbol definitions that macOS's linker silently merges but GNU ld rejects. This is what caught theenum { ... } Name;pattern in test_app_fetch_endpoint.Include order fix in
floor.c:pblibc_private.hrenameslog/powtopblibc_log/pblibc_pow, and if it's included before<math.h>on glibc, the__DECL_SIMD_*macros blow up. Fixed by putting system headers first, matching the pattern already used by pow.c, scalbn.c, and sqrt.c. Un-quarantined test_floor and test_pow.ARM GCC reproducibility flags (
-fno-guess-branch-proficiency,-fno-volatile-speculation) for consistent firmware builds.-ffp-contract=offin host test CFLAGS so floating-point contraction doesn't vary between compilers.test_powandtest_logreplaced their host-libm oracle with pre-generated reference tables. The host libm disagrees between macOS and Linux in the last ULP on some inputs, so using it as a ground truth was inherently non-portable. The reference tables are correctly-rounded values generated bygen_pow_reference.pyandgen_log_reference.pyusing Python'sdecimalmodule at 80 significant digits.Merge order: 1 (independent)
Depended on by: #1499