Skip to content

Review cleanup#1

Open
danpape wants to merge 10 commits into
masterfrom
review-cleanup
Open

Review cleanup#1
danpape wants to merge 10 commits into
masterfrom
review-cleanup

Conversation

@danpape
Copy link
Copy Markdown
Owner

@danpape danpape commented May 4, 2026

trying to test workflows

danpape and others added 10 commits May 3, 2026 11:33
- Top-level cmake_minimum_required bumped 3.14 -> 3.22
- libbech32 library target: cxx_std_11 -> cxx_std_17
- All three C++ example targets: cxx_std_11 -> cxx_std_17
- UnitTests_bech32 and bech32_api_tests: cxx_std_11 -> cxx_std_17
- C99 example/test targets (c_std_99) untouched
- Version macros (LIBBECH32_VERSION_*) untouched; bump on next major release.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pull the file-level constants, validators, HRP/DP utilities, and Bech32
core math out of the anonymous namespace at the top of bech32.cpp into
a new internal header bech32_detail.h under namespace bech32::detail.
bech32.cpp now includes the header and pulls bech32::detail into the
implementation namespace via 'using namespace bech32::detail;' so existing
unqualified references continue to compile.

The original anon namespace also held a 'using namespace bech32::limits;'
directive whose effect leaked into file scope, satisfying unqualified
references inside the C wrappers. The new header puts the directive
inside namespace bech32::detail (where it belongs), which does not leak
those names into file scope. Re-injecting at file scope preserves the
existing behaviour without editing the C wrappers.

The white-box test TU (test_Bech32.cpp) replaces the legacy
'#include "bech32.cpp"' with '#include "bech32_detail.h"'. Every internal
helper the test file exercises (rejectBString*, rejectHRP*, rejectDP*,
rejectDataValuesOutOfRange, rejectBothPartsTooLong, findSeparatorPosition,
extractHumanReadablePart, extractDataPart, convertToLowercase, mapDP,
expandHrp, polymod, verifyChecksum, createChecksum) is now reached
through namespace bech32::detail. A single 'using namespace bech32::detail;'
near the top of the file keeps every existing unqualified call site
working. Inline definitions in the new header are ODR-safe across the
two including translation units (bech32.cpp and test_Bech32.cpp).

3/3 ctest green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Public C++ API gains a typed exception hierarchy and the C ABI gains
matching error codes; every throwing entry point dispatches.

bech32::Error : std::runtime_error becomes the base class for every
library-thrown exception. Existing std::runtime_error / std::exception
catches keep working via inheritance. Nine typed subclasses, one per
distinct invariant violation:

  HrpTooLongError, HrpTooShortError, MixedCaseError,
  ValuesOutOfRangeError (with optional message), BothPartsTooLongError,
  DataPartTooShortError, NoSeparatorError, InvalidCharacterError,
  InvalidChecksumError.

Throw sites in bech32::detail (rejectHRPTooShort, rejectHRPTooLong,
rejectBothPartsTooLong, rejectDPTooShort, rejectDataValuesOutOfRange,
rejectBStringMixedCase, rejectBStringValuesOutOfRange,
rejectBStringTooShort, rejectBStringTooLong,
rejectBStringWithNoSeparator, mapDP, mapToCharset) and the one stray
throw inside bech32::encodeBasis are migrated to the matching subclass.
The -TooShort / -TooLong bstring cases reuse ValuesOutOfRangeError with
distinctive what() messages rather than getting their own subclasses.

bech32_error gains four new codes inserted before E_BECH32_MAX_ERROR
and after E_BECH32_NO_MEMORY so existing ordinals 0-5 remain stable:
E_BECH32_HRP_TOO_LONG, E_BECH32_HRP_TOO_SHORT, E_BECH32_MIXED_CASE,
E_BECH32_VALUES_OUT_OF_RANGE. E_BECH32_VALUES_OUT_OF_RANGE is a single
bundled catch-all for the six range-style violations (5-bit values
>= 32, ASCII out-of-range bytes, invalid charset char, missing
separator, dp-too-short, both-parts-too-long); deliberately not split
further.

Every extern "C" entry's catch chain is rewritten to dispatch typed
subclasses to specific codes, with a final catch (...) so no C++
exception escapes the ABI boundary:

    catch (HrpTooLongError &)       -> E_BECH32_HRP_TOO_LONG
    catch (HrpTooShortError &)      -> E_BECH32_HRP_TOO_SHORT
    catch (MixedCaseError &)        -> E_BECH32_MIXED_CASE
    catch (ValuesOutOfRangeError &) -> E_BECH32_VALUES_OUT_OF_RANGE
    catch (InvalidChecksumError &)  -> E_BECH32_INVALID_CHECKSUM
    catch (Error &)                 -> E_BECH32_VALUES_OUT_OF_RANGE  (bundle)
    catch (std::bad_alloc &)        -> E_BECH32_NO_MEMORY
    catch (...)                     -> E_BECH32_UNKNOWN_ERROR

The bech32::Error catch is intentionally placed last among typed
catches so InvalidCharacterError, NoSeparatorError, BothPartsTooLongError,
and DataPartTooShortError fall through to it and collapse to
E_BECH32_VALUES_OUT_OF_RANGE.

bech32_errordesc[] is extended in lockstep with messages for the four
new codes (positions 6-9). Position-to-enum-value invariant preserved.

Defensive catch (...) added to the non-throwing _create_* helpers and
to bech32_stripUnknownChars so an allocator failure cannot unwind past
the ABI even on pathological systems. Doxygen on every new enum member
describes when each fires.

3/3 ctest green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…entinel doc

Three structural fixes plus a doc clarification on the C ABI surface.

bech32_create_DecodedResult uses find_last_of(separator) instead of
find_first_of, matching the C++ decode path's findSeparatorPosition.
BIP-0173 defines the separator as the LAST '1' in the string — an HRP
containing '1' would otherwise size the output buffers incorrectly.

bech32_compute_encoded_string_length rejects hrplen > MAX_HRP_LENGTH or
dplen > MAX_BECH32_LENGTH by returning 0, making size_t overflow on the
addition impossible. Documented as a hard contract in the header
Doxygen comment. bech32_create_bstring upper-bounds hrplen/dplen before
the sizing call and refuses a 0 return from the helper.

Both _create_* helpers had a manual free-chain unwind: each allocation
that succeeded was undone in reverse via free() in the failure branch
of the next allocation. Adding a future field to either struct meant
editing every preceding unwind path — fragile and easy to get wrong.
Replace with two anon-namespace RAII guards (DecodedResultCleanup,
BstringCleanup) that own the partially-built struct and free it via
the matching public bech32_free_* entry (which already tolerates a
partially-populated struct with NULL fields) unless guard.release() is
called after the struct is fully built. Each helper now zero-initializes
every pointer field BEFORE arming the guard, so the destructor's free()
calls are always safe even on the narrowest failure path.

bech32_create_bstring_from_DecodedResult now validates BOTH hrplen AND
dplen before delegating; prior versions only checked hrplen, so an
attacker-shaped DecodedResult with a huge dplen could reach the sizing
math.

Doc-only on the decode sentinel: the failure-returns-empty pattern
stays (matches the C API behaviour and avoids a behaviour change for
existing callers), but every relevant doc surface now mandates that
callers MUST check 'encoding != Invalid' / 'encoding != ENCODING_INVALID'
before reading hrp/dp. Touched: C++ bech32::decode header doc, C
bech32_decode header doc, C++ DecodedResult struct doc, C
bech32_DecodedResult struct doc.

Migrate encode_whenCppMethodThrowsException_isUnsuccessful to allocate
its bstring manually — bech32_create_bstring(hrplen=84) now correctly
returns NULL, which would have masked the test's actual goal of
verifying the HrpTooLongError dispatch path through bech32_encode.

3/3 ctest green; behaviour unchanged on the doc bullet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e, plus shared encode helper, polymod warning, examples-not-installed

A grouped batch of small fixes and cleanups across bech32::detail, the
stripUnknownChars implementations, the C encode entry points, the
polymod definition, and the examples build.

ASCII-only case helpers: introduce is_ascii_upper / is_ascii_lower /
ascii_to_lower in bech32::detail. Replace every ::tolower / ::toupper /
::isupper / ::islower call site (in rejectBStringMixedCase,
convertToLowercase, stripUnknownChars). Bech32 strings are ASCII per
spec; locale-sensitive helpers can misclassify under e.g. a Turkish
locale (dotless-i / dotted-I).

Reject-chain reorder: in rejectBStringThatIsntWellFormed, run
rejectBStringValuesOutOfRange BEFORE rejectBStringMixedCase. With the
ASCII-only predicates this is redundant, but the order change matches
spec-level intent: bytes outside ASCII 33..126 are not bech32 characters
at all, so reporting them as 'out of range' is strictly more accurate
than reporting them as 'mixed case'.

npos guards: extractHumanReadablePart and extractDataPart throw
bech32::NoSeparatorError when findSeparatorPosition returns
std::string::npos. Without the check, substr(0, npos) silently returned
the whole input as 'hrp' (or 'dp'), corrupting downstream decode.

stripUnknownChars lowercases its output. Previously it kept the original
case (using ::tolower only for the membership test), so a clean-then-
decode pipeline could trip rejectBStringMixedCase on input that mixed
cases even though every surviving char was in the charset. Lowercasing
the output makes the pipeline self-contained: a caller can
stripUnknownChars(x) then decode() without a separate case-fold.
Migrated test_Bech32.cpp::Bech32Test.strip_unknown_chars to the new
expectations (3 vectors updated to lowercase output; one previously-
TODO case now has a defined answer).

bech32_stripUnknownChars C wrapper has three problems fixed: the
early-return 'dstlen > srclen' guard was backwards — it rejected the
exact case the header comment recommended (a destination buffer LARGER
than the source). The real size check is 'dstlen < result.size()+1'
inside the try block, which correctly compares the caller's buffer to
the stripped output's required size. Drop the backwards guard. The
std::string ctor used the single-arg form, which reads up to the first
embedded NUL and ignores srclen — a potential out-of-bounds read when
srclen claims fewer bytes than what the input actually contains. Switch
to the two-arg (ptr, len) form so exactly srclen bytes are read. The
ctor itself can throw std::bad_alloc and was outside the try block —
move it inside so the catch chain handles it like every other throwing
entry.

Extract the duplicated bodies of bech32_encode and
bech32_encode_using_original_constant into a single
c_encode_5bit_impl(use_original_constant) helper in the anonymous
namespace. The two public entries become 3-line forwarders. The catch
chain, the length check, and the NUL-terminate now live exactly once
instead of twice.

bech32_bstring_s Doxygen explicitly documents the +1 NUL invariant
(callers who construct bech32_bstring manually MUST allocate length + 1
bytes). bech32_create_bstring already does this; this documents the
contract for the manual path.

Prepend an 11-line comment block above the polymod definition explaining
that the 5 constants and the branchless XOR pattern are load-bearing
(BIP-0173 / BIP-0350 generator polynomial). Names sipa's BIP-0173
Python reference as the cross-validation source if the function is ever
touched.

examples/CMakeLists.txt: top-of-file comment makes the build-only status
explicit so a future contributor doesn't accidentally add an
install(TARGETS ...) rule and ship example binaries to the consumer's
install prefix.

3/3 ctest green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Bit / _5bit

BREAKING CHANGE.

Make the 8-bit data API the canonical surface on both C++ and C.

bech32::detail::convertBits — the shared bit-twiddling primitive —
repacks a vector of from_bits-wide values into a vector of to_bits-wide
values, MSB-first:

  encode: convertBits(input_8bit, 8, 5, /*pad=*/true)
  decode: convertBits(dp_5bit,    5, 8, /*pad=*/false)

The pad=false path on decode rejects non-zero leftover bits and >=8
leftover bits, which is the property that makes 8->5->8 round-trips
exact: the only valid 5-bit sequence that would re-decode to the
original byte string is the one our encode produced.

C++ entries:

  bech32::encode(hrp, data)     8-bit input. Performs 8->5 repacking
                                via convertBits(..., 8, 5, pad=true)
                                and delegates to the 5-bit core.
                                Bech32m-only by construction.
  bech32::decode(bstring)       8-bit output. Delegates to decode5Bit,
                                then performs 5->8 repacking via
                                convertBits(..., 5, 8, pad=false).
                                pad=false enforces exact round-trip
                                integrity (non-zero leftover padding
                                rejected with InvalidCharacterError).
                                On checksum failure / structural
                                reject the sentinel encoding == Invalid
                                path is preserved (result.dp empty).

  bech32::encode                       -> bech32::encode5Bit
  bech32::decode                       -> bech32::decode5Bit
  bech32::encodeUsingOriginalConstant  -> bech32::encode5BitUsingOriginalConstant

The 5-bit surface keeps both Bech32m and BIP-0173 original-constant
variants; the 8-bit surface deliberately exposes Bech32m only. Callers
that need BIP-0173 must go through encode5BitUsingOriginalConstant.
DecodedResult is shared between the two decode entries — the dp field's
data width is determined by the entry point that populated it.

C ABI mirrors the C++ shape:

  bech32_encode  / bech32_decode  -> 8-bit (Bech32m only)
  bech32_encode_5bit / bech32_decode_5bit /
    bech32_encode_5bit_using_original_constant
                                  -> renamed from the previous
                                     bech32_encode / bech32_decode /
                                     bech32_encode_using_original_constant

The 8-bit encode/decode entries forward to bech32::encode /
bech32::decode via two anonymous-namespace helpers (c_encode_8bit_impl
and the existing c_encode_5bit_impl); the catch chain is identical, so
all five public encode/decode entries report typed bech32_error codes
through the same dispatch.

bech32_compute_encoded_string_length_8bit — the 8-bit-input sizing
helper. Internally computes the ceil(dlen * 8 / 5) 5-bit expansion and
delegates to bech32_compute_encoded_string_length. Returns 0 on oversize
inputs (same sentinel contract as the 5-bit sizer).

Mechanical call-site migration for the breaking rename: every call in
the six example programs and the three test executables was 5-bit data;
the rename preserves the exact semantics with the new names. Examples
additionally migrate the badEncoding() assertion to the current typed-
exception message / error code (the error surface widened when the
typed exception hierarchy landed; the previous raw std::runtime_error
collapsed to the E_BECH32_UNKNOWN_ERROR fallback). Without this the
example binaries would assert at runtime.

The buffer-out pattern (bech32_bstring / bech32_DecodedResult) is
unchanged for both the new 8-bit entries and the renamed 5-bit entries.

3/3 ctest green; all six example programs run to completion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ectors, convertBits cross-ref

Cross-port-portable reference vectors and broad coverage across both
C++ and C test suites.

Add reference_vectors.{h,inc} — 14 hand-picked (hrp, 5-bit-dp) ->
expected-bech32-string vectors. Vectors are sourced from sipa's BIP-0173
Python reference implementation
(https://github.com/sipa/bech32/blob/master/ref/python/segwit_addr.py)
and cross-verified against the BIP-0173 / BIP-0350 'Test vectors'
sections — NOT round-tripped from libbech32 itself, which would be
circular. Coverage:
  - 5 BIP-0350 canonical (Bech32m), including maxlen
  - 2 BIP-0173 canonical (Bech32 original-constant)
  - 1 BIP-0173 HRP-contains-1 vector (find_last_of separator path)
  - 6 hand-picked Bech32m: short HRP, 83-char HRP, multi-separator HRP,
    short-HRP-long-dp

C++ test additions (bech32_api_tests.cpp):

  - decode_hrpContainsSeparator_isSuccessful — BIP-0173 HRP-contains-1
    vector. The HRP itself ends in '1' and the bech32 string contains
    multiple '1' characters near the end; correct decoding requires
    splitting on the LAST '1' (find_last_of). Verifies the encoded
    result is Bech32m (the canonical vector for this 83-char HRP).
  - encode_decode_8bit_roundTrip_isConsistent — round-trip through the
    8-bit entry points. Goes 8-bit -> bech32m -> 8-bit and asserts the
    payload is byte-identical. Bech32m only by construction; the 5-bit
    surface is unaffected.
  - 9 TypedExceptions gtest blocks (one per reachable bech32::Error
    subclass) plus a rapidcheck round-trip property for arbitrary
    byte payloads.
  - Reference-vector iteration walking REFERENCE_VECTORS from
    reference_vectors.inc and asserting the library produces exactly
    the expected encoded string for each.
  - Test-list registration in tests_main() and the runner.

C test additions (bech32_c_api_tests.c):

  - decode_hrpContainsSeparator_isSuccessful: decodes the BIP-0173
    canonical HRP-contains-1 vector via bech32_decode_5bit; asserts
    encoding=Bech32m, hrplen=83, hrp contains '1'.
  - encode_decode_8bit_roundTrip_isConsistent: round-trip through
    bech32_encode / bech32_decode plus the
    bech32_compute_encoded_string_length_8bit sizing helper. Asserts
    the decoded payload is byte-identical to input.
  - Four error-code dispatch tests, one per new typed bech32_error
    code (HRP_TOO_LONG, HRP_TOO_SHORT, MIXED_CASE, VALUES_OUT_OF_RANGE).
    Each verifies the C call returns the typed code (not the generic
    UNKNOWN_ERROR fallback) and bech32_strerror() returns the matching
    non-empty string.
  - Reference-vector iteration walking REFERENCE_VECTORS and asserting
    bech32_encode_5bit / bech32_encode_5bit_using_original_constant
    produce exactly the expected encoded string for each.

convertBits cross-reference (test_Bech32.cpp):

  - Adds a 70-case fixture (50 8->5 cases + 20 5->8 round-trips)
    generated by running sipa's Python reference offline against a
    deterministic random corpus plus explicit edge-case lengths
    (0, 1, 5, 16, 20, 32, 55, 100). The fixture is committed in-tree
    as convert_bits_cross_ref.inc; the test runs purely on committed
    data.
  - Two new gtest blocks:
      ConvertBitsCrossRef.MatchesReference_8to5_padTrue
      ConvertBitsCrossRef.MatchesReference_5to8_padFalse
    A regression in convertBits cannot silently match the fixture
    because the fixture was produced by an *independent* implementation.
    Empty-input cases use a single-byte placeholder array (C forbids
    zero-size arrays) and the explicit *_len literal from the fixture
    table.
  - rapidcheck round-trip property re-asserted with broader coverage
    to give the property test more random shapes.

3/3 ctest green; new tests visible in their respective suite outputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ase tarball, workflow_dispatch)

Four workflows under .github/workflows/, all triggered on push and
pull_request unless noted.

ci.yml — Linux gcc + Linux clang matrix:
  - Checks out the repo with submodules (test/googletest, test/rapidcheck).
  - Caches the submodule checkouts keyed on .gitmodules + each
    submodule's HEAD SHA so successive runs skip re-cloning.
  - cmake -S . -B build / cmake --build build -j /
    ctest -V --output-on-failure.
  - Runs each example binary as a smoke check (no install rule, so
    this is the only place they execute on CI).
  - workflow_dispatch trigger so workflows can be fired manually from
    the GitHub UI for debugging.
  - Single-OS matrix for now (Linux). Windows MSVC + macOS AppleClang
    are explicit follow-ups.

sanitizer.yml — Linux clang ASan + UBSan job:
  - Runs the full ctest suite with address and undefined-behaviour
    sanitizers enabled.
  - halt_on_error=1: any leak, out-of-bounds access, or undefined
    behaviour fails the workflow and blocks merge.
  - Auto-coverage for the C-API memory paths exercised by ctest
    (every _create_* helper allocation path runs under both
    sanitizers).

clang-tidy.yml — Linux clang-tidy run with
  -checks=-*,bugprone-*,cert-*,clang-analyzer-* across the source
  tree. Advisory-only — uses continue-on-error: true plus '|| true'
  (defensive double-guard) so any findings are reported without
  failing the workflow. Promotion to gated (fail-on-findings) is
  deferred.

release.yml — triggered on git tag push (not on push or pull_request).
  Builds the static library + public header and uploads a versioned
  tarball as a GitHub release asset via softprops/action-gh-release@v2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comprehensive documentation pass.

Public header Doxygen audit: every public declaration in bech32.h now
has an accurate, current Doxygen block with @param / @return / @throws
as appropriate.
  - bech32::Error and the nine typed subclasses get descriptive
    class-level blocks (when each is thrown, what its what() message
    conveys).
  - Encoding enum values document Bech32 vs Bech32m and the Invalid
    sentinel.
  - DecodedResult and bech32_DecodedResult document the per-field
    semantics and the encoding-Invalid sentinel contract.
  - stripUnknownChars documents the lowercase-output guarantee.
  - 8-bit encode / decode entries (C++ and C) document the 8->5 / 5->8
    repacking, the Bech32m-only restriction, and the
    sentinel-on-invalid return contract.
  - 5-bit encode / decode entries document the renamed surface.
  - bech32_compute_encoded_string_length{,_8bit}: document the 0
    sentinel on oversize inputs and the caller's MUST-check obligation.

README rewrite for the post-cleanup library:
  - Tooling floor bumped to C++17 + CMake >= 3.22 (drops the old
    C++11 / CMake 3.14 language).
  - Canonical out-of-tree build / ctest snippet.
  - Single 8-bit-vs-5-bit API table covering both C++ and C surfaces.
  - Worked C++ and C 8-bit encode / decode examples.
  - Typed exception hierarchy listed with their what() messages.
  - bech32_error code table covering the four new codes.
  - Reference test vectors under
    test/testbech32/reference_vectors.{h,inc}.
  - Breaking-changes note for the next major release.

Internal comment sweep in bech32.cpp:
  - Top-of-file comment anchoring the bech32_detail.h split and
    identifying this TU as the implementation file that pulls those
    helpers back in via 'using namespace bech32::detail'.
  - Removed the last literal 'encodeUsingOriginalConstant' reference
    in the 5-bit-surface section-header comment (the named symbol was
    renamed).

bech32_detail.h audit: polymod warning block accurate; reject-chain
comment correctly describes the new order; locale / ASCII-only
commentary accurate; no stale content found.

Comments-only / docs-only change; no behaviour change. 3/3 ctest green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The strerror_withInvalidErrorCode_returnsUnknownErrorMessage test
deliberately passes a numeric code (1234) outside the bech32_error
range to exercise the unknown-error fallback. With the parameter
typed as 'bech32_error', UBSan's -fsanitize=enum check fires before
the body's range check has a chance — loading any value through an
enum-typed lvalue that is not a valid enumerator is undefined
behaviour in C/C++.

Change the parameter type to plain 'int'. Same calling convention,
same body semantics; the existing comparisons against the enum
constants work via the standard int-to-enum implicit conversion.
Existing callers passing a 'bech32_error' value continue to compile
and run correctly via the implicit enum-to-int conversion.

This is a pre-existing latent bug in the library that the new
sanitizer CI surfaced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant