Review cleanup#1
Open
danpape wants to merge 10 commits into
Open
Conversation
- Top-level cmake_minimum_required bumped 3.14 -> 3.22 - libbech32 library target: cxx_std_11 -> cxx_std_17 - All three C++ example targets: cxx_std_11 -> cxx_std_17 - UnitTests_bech32 and bech32_api_tests: cxx_std_11 -> cxx_std_17 - C99 example/test targets (c_std_99) untouched - Version macros (LIBBECH32_VERSION_*) untouched; bump on next major release. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pull the file-level constants, validators, HRP/DP utilities, and Bech32 core math out of the anonymous namespace at the top of bech32.cpp into a new internal header bech32_detail.h under namespace bech32::detail. bech32.cpp now includes the header and pulls bech32::detail into the implementation namespace via 'using namespace bech32::detail;' so existing unqualified references continue to compile. The original anon namespace also held a 'using namespace bech32::limits;' directive whose effect leaked into file scope, satisfying unqualified references inside the C wrappers. The new header puts the directive inside namespace bech32::detail (where it belongs), which does not leak those names into file scope. Re-injecting at file scope preserves the existing behaviour without editing the C wrappers. The white-box test TU (test_Bech32.cpp) replaces the legacy '#include "bech32.cpp"' with '#include "bech32_detail.h"'. Every internal helper the test file exercises (rejectBString*, rejectHRP*, rejectDP*, rejectDataValuesOutOfRange, rejectBothPartsTooLong, findSeparatorPosition, extractHumanReadablePart, extractDataPart, convertToLowercase, mapDP, expandHrp, polymod, verifyChecksum, createChecksum) is now reached through namespace bech32::detail. A single 'using namespace bech32::detail;' near the top of the file keeps every existing unqualified call site working. Inline definitions in the new header are ODR-safe across the two including translation units (bech32.cpp and test_Bech32.cpp). 3/3 ctest green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Public C++ API gains a typed exception hierarchy and the C ABI gains
matching error codes; every throwing entry point dispatches.
bech32::Error : std::runtime_error becomes the base class for every
library-thrown exception. Existing std::runtime_error / std::exception
catches keep working via inheritance. Nine typed subclasses, one per
distinct invariant violation:
HrpTooLongError, HrpTooShortError, MixedCaseError,
ValuesOutOfRangeError (with optional message), BothPartsTooLongError,
DataPartTooShortError, NoSeparatorError, InvalidCharacterError,
InvalidChecksumError.
Throw sites in bech32::detail (rejectHRPTooShort, rejectHRPTooLong,
rejectBothPartsTooLong, rejectDPTooShort, rejectDataValuesOutOfRange,
rejectBStringMixedCase, rejectBStringValuesOutOfRange,
rejectBStringTooShort, rejectBStringTooLong,
rejectBStringWithNoSeparator, mapDP, mapToCharset) and the one stray
throw inside bech32::encodeBasis are migrated to the matching subclass.
The -TooShort / -TooLong bstring cases reuse ValuesOutOfRangeError with
distinctive what() messages rather than getting their own subclasses.
bech32_error gains four new codes inserted before E_BECH32_MAX_ERROR
and after E_BECH32_NO_MEMORY so existing ordinals 0-5 remain stable:
E_BECH32_HRP_TOO_LONG, E_BECH32_HRP_TOO_SHORT, E_BECH32_MIXED_CASE,
E_BECH32_VALUES_OUT_OF_RANGE. E_BECH32_VALUES_OUT_OF_RANGE is a single
bundled catch-all for the six range-style violations (5-bit values
>= 32, ASCII out-of-range bytes, invalid charset char, missing
separator, dp-too-short, both-parts-too-long); deliberately not split
further.
Every extern "C" entry's catch chain is rewritten to dispatch typed
subclasses to specific codes, with a final catch (...) so no C++
exception escapes the ABI boundary:
catch (HrpTooLongError &) -> E_BECH32_HRP_TOO_LONG
catch (HrpTooShortError &) -> E_BECH32_HRP_TOO_SHORT
catch (MixedCaseError &) -> E_BECH32_MIXED_CASE
catch (ValuesOutOfRangeError &) -> E_BECH32_VALUES_OUT_OF_RANGE
catch (InvalidChecksumError &) -> E_BECH32_INVALID_CHECKSUM
catch (Error &) -> E_BECH32_VALUES_OUT_OF_RANGE (bundle)
catch (std::bad_alloc &) -> E_BECH32_NO_MEMORY
catch (...) -> E_BECH32_UNKNOWN_ERROR
The bech32::Error catch is intentionally placed last among typed
catches so InvalidCharacterError, NoSeparatorError, BothPartsTooLongError,
and DataPartTooShortError fall through to it and collapse to
E_BECH32_VALUES_OUT_OF_RANGE.
bech32_errordesc[] is extended in lockstep with messages for the four
new codes (positions 6-9). Position-to-enum-value invariant preserved.
Defensive catch (...) added to the non-throwing _create_* helpers and
to bech32_stripUnknownChars so an allocator failure cannot unwind past
the ABI even on pathological systems. Doxygen on every new enum member
describes when each fires.
3/3 ctest green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…entinel doc Three structural fixes plus a doc clarification on the C ABI surface. bech32_create_DecodedResult uses find_last_of(separator) instead of find_first_of, matching the C++ decode path's findSeparatorPosition. BIP-0173 defines the separator as the LAST '1' in the string — an HRP containing '1' would otherwise size the output buffers incorrectly. bech32_compute_encoded_string_length rejects hrplen > MAX_HRP_LENGTH or dplen > MAX_BECH32_LENGTH by returning 0, making size_t overflow on the addition impossible. Documented as a hard contract in the header Doxygen comment. bech32_create_bstring upper-bounds hrplen/dplen before the sizing call and refuses a 0 return from the helper. Both _create_* helpers had a manual free-chain unwind: each allocation that succeeded was undone in reverse via free() in the failure branch of the next allocation. Adding a future field to either struct meant editing every preceding unwind path — fragile and easy to get wrong. Replace with two anon-namespace RAII guards (DecodedResultCleanup, BstringCleanup) that own the partially-built struct and free it via the matching public bech32_free_* entry (which already tolerates a partially-populated struct with NULL fields) unless guard.release() is called after the struct is fully built. Each helper now zero-initializes every pointer field BEFORE arming the guard, so the destructor's free() calls are always safe even on the narrowest failure path. bech32_create_bstring_from_DecodedResult now validates BOTH hrplen AND dplen before delegating; prior versions only checked hrplen, so an attacker-shaped DecodedResult with a huge dplen could reach the sizing math. Doc-only on the decode sentinel: the failure-returns-empty pattern stays (matches the C API behaviour and avoids a behaviour change for existing callers), but every relevant doc surface now mandates that callers MUST check 'encoding != Invalid' / 'encoding != ENCODING_INVALID' before reading hrp/dp. Touched: C++ bech32::decode header doc, C bech32_decode header doc, C++ DecodedResult struct doc, C bech32_DecodedResult struct doc. Migrate encode_whenCppMethodThrowsException_isUnsuccessful to allocate its bstring manually — bech32_create_bstring(hrplen=84) now correctly returns NULL, which would have masked the test's actual goal of verifying the HrpTooLongError dispatch path through bech32_encode. 3/3 ctest green; behaviour unchanged on the doc bullet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e, plus shared encode helper, polymod warning, examples-not-installed A grouped batch of small fixes and cleanups across bech32::detail, the stripUnknownChars implementations, the C encode entry points, the polymod definition, and the examples build. ASCII-only case helpers: introduce is_ascii_upper / is_ascii_lower / ascii_to_lower in bech32::detail. Replace every ::tolower / ::toupper / ::isupper / ::islower call site (in rejectBStringMixedCase, convertToLowercase, stripUnknownChars). Bech32 strings are ASCII per spec; locale-sensitive helpers can misclassify under e.g. a Turkish locale (dotless-i / dotted-I). Reject-chain reorder: in rejectBStringThatIsntWellFormed, run rejectBStringValuesOutOfRange BEFORE rejectBStringMixedCase. With the ASCII-only predicates this is redundant, but the order change matches spec-level intent: bytes outside ASCII 33..126 are not bech32 characters at all, so reporting them as 'out of range' is strictly more accurate than reporting them as 'mixed case'. npos guards: extractHumanReadablePart and extractDataPart throw bech32::NoSeparatorError when findSeparatorPosition returns std::string::npos. Without the check, substr(0, npos) silently returned the whole input as 'hrp' (or 'dp'), corrupting downstream decode. stripUnknownChars lowercases its output. Previously it kept the original case (using ::tolower only for the membership test), so a clean-then- decode pipeline could trip rejectBStringMixedCase on input that mixed cases even though every surviving char was in the charset. Lowercasing the output makes the pipeline self-contained: a caller can stripUnknownChars(x) then decode() without a separate case-fold. Migrated test_Bech32.cpp::Bech32Test.strip_unknown_chars to the new expectations (3 vectors updated to lowercase output; one previously- TODO case now has a defined answer). bech32_stripUnknownChars C wrapper has three problems fixed: the early-return 'dstlen > srclen' guard was backwards — it rejected the exact case the header comment recommended (a destination buffer LARGER than the source). The real size check is 'dstlen < result.size()+1' inside the try block, which correctly compares the caller's buffer to the stripped output's required size. Drop the backwards guard. The std::string ctor used the single-arg form, which reads up to the first embedded NUL and ignores srclen — a potential out-of-bounds read when srclen claims fewer bytes than what the input actually contains. Switch to the two-arg (ptr, len) form so exactly srclen bytes are read. The ctor itself can throw std::bad_alloc and was outside the try block — move it inside so the catch chain handles it like every other throwing entry. Extract the duplicated bodies of bech32_encode and bech32_encode_using_original_constant into a single c_encode_5bit_impl(use_original_constant) helper in the anonymous namespace. The two public entries become 3-line forwarders. The catch chain, the length check, and the NUL-terminate now live exactly once instead of twice. bech32_bstring_s Doxygen explicitly documents the +1 NUL invariant (callers who construct bech32_bstring manually MUST allocate length + 1 bytes). bech32_create_bstring already does this; this documents the contract for the manual path. Prepend an 11-line comment block above the polymod definition explaining that the 5 constants and the branchless XOR pattern are load-bearing (BIP-0173 / BIP-0350 generator polynomial). Names sipa's BIP-0173 Python reference as the cross-validation source if the function is ever touched. examples/CMakeLists.txt: top-of-file comment makes the build-only status explicit so a future contributor doesn't accidentally add an install(TARGETS ...) rule and ship example binaries to the consumer's install prefix. 3/3 ctest green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Bit / _5bit
BREAKING CHANGE.
Make the 8-bit data API the canonical surface on both C++ and C.
bech32::detail::convertBits — the shared bit-twiddling primitive —
repacks a vector of from_bits-wide values into a vector of to_bits-wide
values, MSB-first:
encode: convertBits(input_8bit, 8, 5, /*pad=*/true)
decode: convertBits(dp_5bit, 5, 8, /*pad=*/false)
The pad=false path on decode rejects non-zero leftover bits and >=8
leftover bits, which is the property that makes 8->5->8 round-trips
exact: the only valid 5-bit sequence that would re-decode to the
original byte string is the one our encode produced.
C++ entries:
bech32::encode(hrp, data) 8-bit input. Performs 8->5 repacking
via convertBits(..., 8, 5, pad=true)
and delegates to the 5-bit core.
Bech32m-only by construction.
bech32::decode(bstring) 8-bit output. Delegates to decode5Bit,
then performs 5->8 repacking via
convertBits(..., 5, 8, pad=false).
pad=false enforces exact round-trip
integrity (non-zero leftover padding
rejected with InvalidCharacterError).
On checksum failure / structural
reject the sentinel encoding == Invalid
path is preserved (result.dp empty).
bech32::encode -> bech32::encode5Bit
bech32::decode -> bech32::decode5Bit
bech32::encodeUsingOriginalConstant -> bech32::encode5BitUsingOriginalConstant
The 5-bit surface keeps both Bech32m and BIP-0173 original-constant
variants; the 8-bit surface deliberately exposes Bech32m only. Callers
that need BIP-0173 must go through encode5BitUsingOriginalConstant.
DecodedResult is shared between the two decode entries — the dp field's
data width is determined by the entry point that populated it.
C ABI mirrors the C++ shape:
bech32_encode / bech32_decode -> 8-bit (Bech32m only)
bech32_encode_5bit / bech32_decode_5bit /
bech32_encode_5bit_using_original_constant
-> renamed from the previous
bech32_encode / bech32_decode /
bech32_encode_using_original_constant
The 8-bit encode/decode entries forward to bech32::encode /
bech32::decode via two anonymous-namespace helpers (c_encode_8bit_impl
and the existing c_encode_5bit_impl); the catch chain is identical, so
all five public encode/decode entries report typed bech32_error codes
through the same dispatch.
bech32_compute_encoded_string_length_8bit — the 8-bit-input sizing
helper. Internally computes the ceil(dlen * 8 / 5) 5-bit expansion and
delegates to bech32_compute_encoded_string_length. Returns 0 on oversize
inputs (same sentinel contract as the 5-bit sizer).
Mechanical call-site migration for the breaking rename: every call in
the six example programs and the three test executables was 5-bit data;
the rename preserves the exact semantics with the new names. Examples
additionally migrate the badEncoding() assertion to the current typed-
exception message / error code (the error surface widened when the
typed exception hierarchy landed; the previous raw std::runtime_error
collapsed to the E_BECH32_UNKNOWN_ERROR fallback). Without this the
example binaries would assert at runtime.
The buffer-out pattern (bech32_bstring / bech32_DecodedResult) is
unchanged for both the new 8-bit entries and the renamed 5-bit entries.
3/3 ctest green; all six example programs run to completion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ectors, convertBits cross-ref
Cross-port-portable reference vectors and broad coverage across both
C++ and C test suites.
Add reference_vectors.{h,inc} — 14 hand-picked (hrp, 5-bit-dp) ->
expected-bech32-string vectors. Vectors are sourced from sipa's BIP-0173
Python reference implementation
(https://github.com/sipa/bech32/blob/master/ref/python/segwit_addr.py)
and cross-verified against the BIP-0173 / BIP-0350 'Test vectors'
sections — NOT round-tripped from libbech32 itself, which would be
circular. Coverage:
- 5 BIP-0350 canonical (Bech32m), including maxlen
- 2 BIP-0173 canonical (Bech32 original-constant)
- 1 BIP-0173 HRP-contains-1 vector (find_last_of separator path)
- 6 hand-picked Bech32m: short HRP, 83-char HRP, multi-separator HRP,
short-HRP-long-dp
C++ test additions (bech32_api_tests.cpp):
- decode_hrpContainsSeparator_isSuccessful — BIP-0173 HRP-contains-1
vector. The HRP itself ends in '1' and the bech32 string contains
multiple '1' characters near the end; correct decoding requires
splitting on the LAST '1' (find_last_of). Verifies the encoded
result is Bech32m (the canonical vector for this 83-char HRP).
- encode_decode_8bit_roundTrip_isConsistent — round-trip through the
8-bit entry points. Goes 8-bit -> bech32m -> 8-bit and asserts the
payload is byte-identical. Bech32m only by construction; the 5-bit
surface is unaffected.
- 9 TypedExceptions gtest blocks (one per reachable bech32::Error
subclass) plus a rapidcheck round-trip property for arbitrary
byte payloads.
- Reference-vector iteration walking REFERENCE_VECTORS from
reference_vectors.inc and asserting the library produces exactly
the expected encoded string for each.
- Test-list registration in tests_main() and the runner.
C test additions (bech32_c_api_tests.c):
- decode_hrpContainsSeparator_isSuccessful: decodes the BIP-0173
canonical HRP-contains-1 vector via bech32_decode_5bit; asserts
encoding=Bech32m, hrplen=83, hrp contains '1'.
- encode_decode_8bit_roundTrip_isConsistent: round-trip through
bech32_encode / bech32_decode plus the
bech32_compute_encoded_string_length_8bit sizing helper. Asserts
the decoded payload is byte-identical to input.
- Four error-code dispatch tests, one per new typed bech32_error
code (HRP_TOO_LONG, HRP_TOO_SHORT, MIXED_CASE, VALUES_OUT_OF_RANGE).
Each verifies the C call returns the typed code (not the generic
UNKNOWN_ERROR fallback) and bech32_strerror() returns the matching
non-empty string.
- Reference-vector iteration walking REFERENCE_VECTORS and asserting
bech32_encode_5bit / bech32_encode_5bit_using_original_constant
produce exactly the expected encoded string for each.
convertBits cross-reference (test_Bech32.cpp):
- Adds a 70-case fixture (50 8->5 cases + 20 5->8 round-trips)
generated by running sipa's Python reference offline against a
deterministic random corpus plus explicit edge-case lengths
(0, 1, 5, 16, 20, 32, 55, 100). The fixture is committed in-tree
as convert_bits_cross_ref.inc; the test runs purely on committed
data.
- Two new gtest blocks:
ConvertBitsCrossRef.MatchesReference_8to5_padTrue
ConvertBitsCrossRef.MatchesReference_5to8_padFalse
A regression in convertBits cannot silently match the fixture
because the fixture was produced by an *independent* implementation.
Empty-input cases use a single-byte placeholder array (C forbids
zero-size arrays) and the explicit *_len literal from the fixture
table.
- rapidcheck round-trip property re-asserted with broader coverage
to give the property test more random shapes.
3/3 ctest green; new tests visible in their respective suite outputs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ase tarball, workflow_dispatch)
Four workflows under .github/workflows/, all triggered on push and
pull_request unless noted.
ci.yml — Linux gcc + Linux clang matrix:
- Checks out the repo with submodules (test/googletest, test/rapidcheck).
- Caches the submodule checkouts keyed on .gitmodules + each
submodule's HEAD SHA so successive runs skip re-cloning.
- cmake -S . -B build / cmake --build build -j /
ctest -V --output-on-failure.
- Runs each example binary as a smoke check (no install rule, so
this is the only place they execute on CI).
- workflow_dispatch trigger so workflows can be fired manually from
the GitHub UI for debugging.
- Single-OS matrix for now (Linux). Windows MSVC + macOS AppleClang
are explicit follow-ups.
sanitizer.yml — Linux clang ASan + UBSan job:
- Runs the full ctest suite with address and undefined-behaviour
sanitizers enabled.
- halt_on_error=1: any leak, out-of-bounds access, or undefined
behaviour fails the workflow and blocks merge.
- Auto-coverage for the C-API memory paths exercised by ctest
(every _create_* helper allocation path runs under both
sanitizers).
clang-tidy.yml — Linux clang-tidy run with
-checks=-*,bugprone-*,cert-*,clang-analyzer-* across the source
tree. Advisory-only — uses continue-on-error: true plus '|| true'
(defensive double-guard) so any findings are reported without
failing the workflow. Promotion to gated (fail-on-findings) is
deferred.
release.yml — triggered on git tag push (not on push or pull_request).
Builds the static library + public header and uploads a versioned
tarball as a GitHub release asset via softprops/action-gh-release@v2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comprehensive documentation pass. Public header Doxygen audit: every public declaration in bech32.h now has an accurate, current Doxygen block with @param / @return / @throws as appropriate. - bech32::Error and the nine typed subclasses get descriptive class-level blocks (when each is thrown, what its what() message conveys). - Encoding enum values document Bech32 vs Bech32m and the Invalid sentinel. - DecodedResult and bech32_DecodedResult document the per-field semantics and the encoding-Invalid sentinel contract. - stripUnknownChars documents the lowercase-output guarantee. - 8-bit encode / decode entries (C++ and C) document the 8->5 / 5->8 repacking, the Bech32m-only restriction, and the sentinel-on-invalid return contract. - 5-bit encode / decode entries document the renamed surface. - bech32_compute_encoded_string_length{,_8bit}: document the 0 sentinel on oversize inputs and the caller's MUST-check obligation. README rewrite for the post-cleanup library: - Tooling floor bumped to C++17 + CMake >= 3.22 (drops the old C++11 / CMake 3.14 language). - Canonical out-of-tree build / ctest snippet. - Single 8-bit-vs-5-bit API table covering both C++ and C surfaces. - Worked C++ and C 8-bit encode / decode examples. - Typed exception hierarchy listed with their what() messages. - bech32_error code table covering the four new codes. - Reference test vectors under test/testbech32/reference_vectors.{h,inc}. - Breaking-changes note for the next major release. Internal comment sweep in bech32.cpp: - Top-of-file comment anchoring the bech32_detail.h split and identifying this TU as the implementation file that pulls those helpers back in via 'using namespace bech32::detail'. - Removed the last literal 'encodeUsingOriginalConstant' reference in the 5-bit-surface section-header comment (the named symbol was renamed). bech32_detail.h audit: polymod warning block accurate; reject-chain comment correctly describes the new order; locale / ASCII-only commentary accurate; no stale content found. Comments-only / docs-only change; no behaviour change. 3/3 ctest green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The strerror_withInvalidErrorCode_returnsUnknownErrorMessage test deliberately passes a numeric code (1234) outside the bech32_error range to exercise the unknown-error fallback. With the parameter typed as 'bech32_error', UBSan's -fsanitize=enum check fires before the body's range check has a chance — loading any value through an enum-typed lvalue that is not a valid enumerator is undefined behaviour in C/C++. Change the parameter type to plain 'int'. Same calling convention, same body semantics; the existing comparisons against the enum constants work via the standard int-to-enum implicit conversion. Existing callers passing a 'bech32_error' value continue to compile and run correctly via the implicit enum-to-int conversion. This is a pre-existing latent bug in the library that the new sanitizer CI surfaced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
trying to test workflows