Skip to content

refactor: merge 4 C types into single XXHASHObject type#155

Open
ifduyue wants to merge 6 commits into
masterfrom
one-type-multiple-algo
Open

refactor: merge 4 C types into single XXHASHObject type#155
ifduyue wants to merge 6 commits into
masterfrom
one-type-multiple-algo

Conversation

@ifduyue
Copy link
Copy Markdown
Owner

@ifduyue ifduyue commented May 30, 2026

Following CPython _hashlib's pattern (one HASH type for all algorithms), merge the four per-algorithm C types (PYXXH32Object, PYXXH64Object, PYXXH3_64Object, PYXXH3_128Object) into a single XXHASHObject type.

Key changes:

  • Single struct with XXH_Algo enum + union of state pointers
  • Single set of tp_* methods with algo-based dispatch (switch)
  • Single methods/getseters/slots/spec (was 4 copies of each)
  • 4 module-level constructor functions (xxh32, xxh64, xxh3_64, xxh3_128) sharing a single _xxhash_new helper
  • One-shot functions consolidated via DEFINE_ONESHOT macro
  • PyCFunction constructors replace 4 separate type objects
  • Updated .pyi stubs: constructors are functions, not @Final classes

Results:

  • C source reduced by 37% (2271 -> 1421 lines, -845 lines)
  • Zero warnings on all supported Python versions (3.9-3.15, 3.14t)
  • All 124 tests pass across all versions

Following CPython _hashlib's pattern (one HASH type for all algorithms),
merge the four per-algorithm C types (PYXXH32Object, PYXXH64Object,
PYXXH3_64Object, PYXXH3_128Object) into a single XXHASHObject type.

Key changes:
- Single struct with XXH_Algo enum + union of state pointers
- Single set of tp_* methods with algo-based dispatch (switch)
- Single methods/getseters/slots/spec (was 4 copies of each)
- 4 module-level constructor functions (xxh32, xxh64, xxh3_64, xxh3_128)
  sharing a single _xxhash_new helper
- One-shot functions consolidated via DEFINE_ONESHOT macro
- PyCFunction constructors replace 4 separate type objects
- Updated .pyi stubs: constructors are functions, not @Final classes

Results:
- C source reduced by 37% (2271 -> 1421 lines, -845 lines)
- Zero warnings on all supported Python versions (3.9-3.15, 3.14t)
- All 124 tests pass across all versions
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the type stubs in xxhash/__init__.pyi to correctly define the xxh* constructors as functions returning a _Hasher instance rather than distinct classes, and updates the subinterpreter isolation tests accordingly. A review comment correctly identifies an incorrect alias assignment where xxh64 is assigned to xxh3_64 despite already being defined as a distinct function.

Comment thread xxhash/__init__.pyi Outdated
Comment on lines +73 to +74
# Aliases
xxh64 = xxh3_64 # type: ignore[assignment]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The assignment xxh64 = xxh3_64 is incorrect. xxh64 is a distinct algorithm and has already been correctly defined as a function on line 69. It is not an alias of xxh3_64 (unlike xxh128 which is an alias of xxh3_128). This assignment should be removed to prevent type-checking errors and confusion.

Suggested change
# Aliases
xxh64 = xxh3_64 # type: ignore[assignment]
# Aliases

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented May 30, 2026

Merging this PR will not alter performance

✅ 148 untouched benchmarks
⏩ 66 skipped benchmarks1


Comparing one-type-multiple-algo (beaf437) with master (945da72)

Open in CodSpeed

Footnotes

  1. 66 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

ifduyue added 2 commits May 30, 2026 16:27
Replace the generic _xxhash_oneshot() dispatcher (switch(algo) x switch(return_type))
with fully specialized per-algorithm/per-return-type functions generated by
DEFINE_ONESHOT_INT and DEFINE_ONESHOT_DH macros. Each function directly calls
the right xxhash C function and formats the result, avoiding the double-dispatch
overhead.

Also extract _xxhash_hexdigest_from_bytes() and _xxhash_xxh128_intdigest_result()
helpers, and remove the unused XXH_OneshotDispatch struct.

Benchmark result (xxh32_hexdigest, 1KB data):
  Before: 421.2 ns/call
  After:  353.3 ns/call  (~16% faster)
…l methods

Define XXH_AlgoDispatch (create_state, free_state, reset, update,
digest, intdigest, copy_state) and a const XXH_DISPATCH[] array
indexed by XXH_Algo, eliminating all 10 switch(self->algo) statements
in the stateful path (_xxhash_init_state, _xxhash_free_state,
_xxhash_reset_state, _xxhash_do_update, _xxhash_new, XXHASH_init,
XXHASH_digest, XXHASH_hexdigest, XXHASH_intdigest, XXHASH_copy).

Also remove the unused _free_module callback (PyType_FromModuleAndSpec
ties heap type lifetime to the module automatically).

Benchmarks show performance at parity or slightly improved
(constructor+hexdigest ~6-20% faster).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant