refactor: merge 4 C types into single XXHASHObject type#155
Conversation
Following CPython _hashlib's pattern (one HASH type for all algorithms), merge the four per-algorithm C types (PYXXH32Object, PYXXH64Object, PYXXH3_64Object, PYXXH3_128Object) into a single XXHASHObject type. Key changes: - Single struct with XXH_Algo enum + union of state pointers - Single set of tp_* methods with algo-based dispatch (switch) - Single methods/getseters/slots/spec (was 4 copies of each) - 4 module-level constructor functions (xxh32, xxh64, xxh3_64, xxh3_128) sharing a single _xxhash_new helper - One-shot functions consolidated via DEFINE_ONESHOT macro - PyCFunction constructors replace 4 separate type objects - Updated .pyi stubs: constructors are functions, not @Final classes Results: - C source reduced by 37% (2271 -> 1421 lines, -845 lines) - Zero warnings on all supported Python versions (3.9-3.15, 3.14t) - All 124 tests pass across all versions
There was a problem hiding this comment.
Code Review
This pull request updates the type stubs in xxhash/__init__.pyi to correctly define the xxh* constructors as functions returning a _Hasher instance rather than distinct classes, and updates the subinterpreter isolation tests accordingly. A review comment correctly identifies an incorrect alias assignment where xxh64 is assigned to xxh3_64 despite already being defined as a distinct function.
| # Aliases | ||
| xxh64 = xxh3_64 # type: ignore[assignment] |
There was a problem hiding this comment.
The assignment xxh64 = xxh3_64 is incorrect. xxh64 is a distinct algorithm and has already been correctly defined as a function on line 69. It is not an alias of xxh3_64 (unlike xxh128 which is an alias of xxh3_128). This assignment should be removed to prevent type-checking errors and confusion.
| # Aliases | |
| xxh64 = xxh3_64 # type: ignore[assignment] | |
| # Aliases |
Merging this PR will not alter performance
Comparing Footnotes
|
Replace the generic _xxhash_oneshot() dispatcher (switch(algo) x switch(return_type)) with fully specialized per-algorithm/per-return-type functions generated by DEFINE_ONESHOT_INT and DEFINE_ONESHOT_DH macros. Each function directly calls the right xxhash C function and formats the result, avoiding the double-dispatch overhead. Also extract _xxhash_hexdigest_from_bytes() and _xxhash_xxh128_intdigest_result() helpers, and remove the unused XXH_OneshotDispatch struct. Benchmark result (xxh32_hexdigest, 1KB data): Before: 421.2 ns/call After: 353.3 ns/call (~16% faster)
…l methods Define XXH_AlgoDispatch (create_state, free_state, reset, update, digest, intdigest, copy_state) and a const XXH_DISPATCH[] array indexed by XXH_Algo, eliminating all 10 switch(self->algo) statements in the stateful path (_xxhash_init_state, _xxhash_free_state, _xxhash_reset_state, _xxhash_do_update, _xxhash_new, XXHASH_init, XXHASH_digest, XXHASH_hexdigest, XXHASH_intdigest, XXHASH_copy). Also remove the unused _free_module callback (PyType_FromModuleAndSpec ties heap type lifetime to the module automatically). Benchmarks show performance at parity or slightly improved (constructor+hexdigest ~6-20% faster).
Following CPython _hashlib's pattern (one HASH type for all algorithms), merge the four per-algorithm C types (PYXXH32Object, PYXXH64Object, PYXXH3_64Object, PYXXH3_128Object) into a single XXHASHObject type.
Key changes:
Results: