Add per-object locking for thread safety, sub-interpreter and free-threaded Python support#154
Conversation
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces thread-safety enhancements to the _xxhash C extension by implementing per-object locking (using PyMutex for Python 3.13+ and PyThread_type_lock for Python 3.9-3.12) to protect internal state access when the GIL is released. It also adds a comprehensive test suite in tests/test_thread_safety.py to verify concurrent operations. The review feedback highlights critical improvements for the test suite, including correctly handling Windows process crashes and timeouts in subprocesses, failing tests on any non-zero exit code, and making the test repetitions configurable via an environment variable to optimize CI runtimes.
| def _run_many(self, code, scenario_name): | ||
| crashed = 0 | ||
| errors = [] | ||
| for i in range(self.REPETITIONS): | ||
| rc, out, err = _run_in_subprocess(code, timeout=self.TIMEOUT) | ||
| if rc == -signal.SIGSEGV: | ||
| crashed += 1 | ||
| errors.append(f"run {i}: SEGFAULT") | ||
| elif rc == -signal.SIGABRT: | ||
| crashed += 1 | ||
| errors.append(f"run {i}: SIGABRT") | ||
| elif rc != 0: | ||
| # Non-zero exit – something went wrong | ||
| errors.append(f"run {i}: exit {rc} {err[:200]}") | ||
| # else: OK, no news is good news | ||
|
|
||
| if crashed: | ||
| self.fail( | ||
| f"{scenario_name}: {crashed}/{self.REPETITIONS} runs crashed " | ||
| f"due to data race: {'; '.join(errors[:5])}" | ||
| ) | ||
| if errors: | ||
| # Non-crash errors (timeouts, exceptions) – these might still | ||
| # indicate data-corruption issues even without a segfault. | ||
| print( | ||
| f"WARNING: {scenario_name}: {len(errors)}/{self.REPETITIONS} " | ||
| f"non-crash failures (may indicate corruption): " | ||
| f"{'; '.join(errors[:3])}" | ||
| ) |
There was a problem hiding this comment.
There are two major issues with the current implementation of _run_many:
- On Windows, process crashes (such as Access Violations /
SIGSEGV) return positive exit codes (like3221225477/0xC0000005) rather than negative signal numbers. These are not caught byrc == -signal.SIGSEGV, meaning Windows crashes will only print a warning and the test will still pass! - Any other non-zero exit code (such as timeouts or unexpected exceptions) only prints a warning instead of failing the test. Any non-zero exit code or timeout represents a failure in thread-safety (e.g., deadlock, corruption, or crash) and must fail the test.
We should fail the test on any non-zero exit code and correctly identify Windows crashes.
| def _run_many(self, code, scenario_name): | |
| crashed = 0 | |
| errors = [] | |
| for i in range(self.REPETITIONS): | |
| rc, out, err = _run_in_subprocess(code, timeout=self.TIMEOUT) | |
| if rc == -signal.SIGSEGV: | |
| crashed += 1 | |
| errors.append(f"run {i}: SEGFAULT") | |
| elif rc == -signal.SIGABRT: | |
| crashed += 1 | |
| errors.append(f"run {i}: SIGABRT") | |
| elif rc != 0: | |
| # Non-zero exit – something went wrong | |
| errors.append(f"run {i}: exit {rc} {err[:200]}") | |
| # else: OK, no news is good news | |
| if crashed: | |
| self.fail( | |
| f"{scenario_name}: {crashed}/{self.REPETITIONS} runs crashed " | |
| f"due to data race: {'; '.join(errors[:5])}" | |
| ) | |
| if errors: | |
| # Non-crash errors (timeouts, exceptions) – these might still | |
| # indicate data-corruption issues even without a segfault. | |
| print( | |
| f"WARNING: {scenario_name}: {len(errors)}/{self.REPETITIONS} " | |
| f"non-crash failures (may indicate corruption): " | |
| f"{'; '.join(errors[:3])}" | |
| ) | |
| def _run_many(self, code, scenario_name): | |
| errors = [] | |
| for i in range(self.REPETITIONS): | |
| rc, out, err = _run_in_subprocess(code, timeout=self.TIMEOUT) | |
| if rc != 0: | |
| status = f"exit {rc}" | |
| if rc == -signal.SIGSEGV or rc == 3221225477: | |
| status = "SEGFAULT/Access Violation" | |
| elif rc == -signal.SIGABRT: | |
| status = "SIGABRT" | |
| errors.append(f"run {i}: {status}\nSTDERR: {err[:200]}") | |
| if errors: | |
| self.fail( | |
| f"{scenario_name}: {len(errors)}/{self.REPETITIONS} runs failed or crashed: " | |
| f"{'; '.join(errors[:5])}" | |
| ) |
| def _run_in_subprocess(code: str, timeout: float = 60.0): | ||
| """Run *code* in a subprocess and return (returncode, stdout, stderr).""" | ||
| proc = subprocess.run( | ||
| [sys.executable, "-c", code], | ||
| capture_output=True, | ||
| text=True, | ||
| timeout=timeout, | ||
| ) | ||
| return proc.returncode, proc.stdout, proc.stderr |
There was a problem hiding this comment.
The _run_in_subprocess function does not handle subprocess.TimeoutExpired. If a timeout occurs, it will raise an unhandled exception and crash the entire test suite immediately, rather than reporting it as a test failure. Catching TimeoutExpired and returning a non-zero exit code (e.g., -1) ensures that timeouts are gracefully handled and reported as test failures.
| def _run_in_subprocess(code: str, timeout: float = 60.0): | |
| """Run *code* in a subprocess and return (returncode, stdout, stderr).""" | |
| proc = subprocess.run( | |
| [sys.executable, "-c", code], | |
| capture_output=True, | |
| text=True, | |
| timeout=timeout, | |
| ) | |
| return proc.returncode, proc.stdout, proc.stderr | |
| def _run_in_subprocess(code: str, timeout: float = 60.0): | |
| """Run *code* in a subprocess and return (returncode, stdout, stderr).""" | |
| try: | |
| proc = subprocess.run( | |
| [sys.executable, "-c", code], | |
| capture_output=True, | |
| text=True, | |
| timeout=timeout, | |
| ) | |
| return proc.returncode, proc.stdout, proc.stderr | |
| except subprocess.TimeoutExpired as e: | |
| return -1, e.stdout or "", (e.stderr or "") + "\nTIMEOUT EXPIRED" |
| crashes, deadlocks, or unexpected exceptions occur. | ||
| """ | ||
|
|
||
| REPETITIONS = 20 |
There was a problem hiding this comment.
Hardcoding REPETITIONS = 20 can make the test suite very slow, especially in CI environments where resources are constrained. Making it configurable via an environment variable (defaulting to a lower value like 5 or 10 for standard CI runs) allows developers to run aggressive stress tests locally while keeping CI fast and efficient.
| REPETITIONS = 20 | |
| REPETITIONS = int(os.environ.get("XXHASH_TEST_REPETITIONS", 5)) |
Merging this PR will not alter performance
Performance Changes
Comparing Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces robust thread-safety mechanisms to the _xxhash C extension by implementing per-object locking using PyMutex (for Python 3.13+) and PyThread_type_lock (for Python 3.9-3.12). It also releases the GIL for hashing large data blocks, adds sub-interpreter support, and introduces a comprehensive suite of thread-safety tests. The review feedback suggests improving the test suite's robustness by gracefully handling subprocess timeouts inside the test runner to explicitly report potential deadlocks instead of raising raw tracebacks.
| def _run_in_subprocess(code: str, timeout: float = 60.0): | ||
| """Run *code* in a subprocess and return (returncode, stdout, stderr).""" | ||
| proc = subprocess.run( | ||
| [sys.executable, "-c", code], | ||
| capture_output=True, | ||
| text=True, | ||
| timeout=timeout, | ||
| ) | ||
| return proc.returncode, proc.stdout, proc.stderr |
There was a problem hiding this comment.
If a deadlock or hang occurs during the thread-safety tests, subprocess.run will raise subprocess.TimeoutExpired after the timeout. Handling this exception gracefully inside _run_in_subprocess and returning a specific error code (e.g., -999) allows the test suite to report a clear diagnostic message (like 'TIMEOUT (possible deadlock)') instead of failing with a raw traceback.
| def _run_in_subprocess(code: str, timeout: float = 60.0): | |
| """Run *code* in a subprocess and return (returncode, stdout, stderr).""" | |
| proc = subprocess.run( | |
| [sys.executable, "-c", code], | |
| capture_output=True, | |
| text=True, | |
| timeout=timeout, | |
| ) | |
| return proc.returncode, proc.stdout, proc.stderr | |
| def _run_in_subprocess(code: str, timeout: float = 60.0): | |
| """Run *code* in a subprocess and return (returncode, stdout, stderr).""" | |
| try: | |
| proc = subprocess.run( | |
| [sys.executable, "-c", code], | |
| capture_output=True, | |
| text=True, | |
| timeout=timeout, | |
| ) | |
| return proc.returncode, proc.stdout, proc.stderr | |
| except subprocess.TimeoutExpired: | |
| return -999, "", "TIMEOUT" |
| for i in range(self.REPETITIONS): | ||
| rc, out, err = _run_in_subprocess(code, timeout=self.TIMEOUT) | ||
| if rc == -signal.SIGSEGV: | ||
| crashed += 1 | ||
| errors.append(f"run {i}: SEGFAULT") | ||
| elif rc == -signal.SIGABRT: | ||
| crashed += 1 | ||
| errors.append(f"run {i}: SIGABRT") | ||
| elif rc != 0: | ||
| # Non-zero exit – something went wrong | ||
| errors.append(f"run {i}: exit {rc} {err[:200]}") | ||
| # else: OK, no news is good news |
There was a problem hiding this comment.
Handle the custom timeout return code (-999) from _run_in_subprocess to explicitly report a possible deadlock/timeout failure in the test summary.
for i in range(self.REPETITIONS):
rc, out, err = _run_in_subprocess(code, timeout=self.TIMEOUT)
if rc == -signal.SIGSEGV:
crashed += 1
errors.append(f"run {i}: SEGFAULT")
elif rc == -signal.SIGABRT:
crashed += 1
errors.append(f"run {i}: SIGABRT")
elif rc == -999:
crashed += 1
errors.append(f"run {i}: TIMEOUT (possible deadlock)")
elif rc != 0:
# Non-zero exit – something went wrong
errors.append(f"run {i}: exit {rc} {err[:200]}")
# else: OK, no news is good news|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces thread-safety to the xxhash C extension by implementing per-object locking (using PyMutex for Python 3.13+ and PyThread_type_lock for Python 3.9-3.12) and releasing the GIL for large updates. It also adds comprehensive thread-safety tests. A high-severity issue was identified in the Python 3.13+ implementation where calling PyMutex_Lock while holding the GIL can lead to interpreter-wide GIL starvation. It is recommended to release the GIL while waiting to acquire the mutex.
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces per-object locking (using PyThread_type_lock) to the xxhash C extension to ensure thread safety across all hash types, preventing data races during concurrent access. The changes include the addition of a new locking macro, updates to the initialization logic for hash objects, and the inclusion of comprehensive thread-safety tests. Additionally, build configuration files were updated and the version number was incremented. A high-severity review comment was raised regarding the potential for crashes when using static type objects with sub-interpreter support, suggesting a refactor to heap-allocated types.
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces thread-safety improvements to the _xxhash C extension by implementing per-object locking (using PyMutex for Python 3.13+ and PyThread_type_lock for Python 3.9-3.12) to prevent data races during concurrent hash operations. It also migrates static types to heap types for sub-interpreter safety and adds comprehensive thread-safety and benchmark tests. The review feedback highlights several potential data races where the per-object lock is not acquired: specifically, within the init macro which can be called multiple times on a shared object, and when copying self->seed to a new object during copy operations without holding self's lock.
| p->seed = self->seed; | ||
| XXHASH_LOCK_ACQUIRE(self); | ||
| XXH32_copyState(p->xxhash_state, self->xxhash_state); | ||
| XXHASH_LOCK_RELEASE(self); |
There was a problem hiding this comment.
To ensure full thread-safety, self->seed should be copied to p->seed while holding self's lock. Otherwise, if another thread concurrently calls __init__ on self, p->seed could read a stale or torn value.
XXHASH_LOCK_ACQUIRE(self);
p->seed = self->seed;
XXH32_copyState(p->xxhash_state, self->xxhash_state);
XXHASH_LOCK_RELEASE(self);| p->seed = self->seed; | ||
| XXHASH_LOCK_ACQUIRE(self); | ||
| XXH64_copyState(p->xxhash_state, self->xxhash_state); | ||
| XXHASH_LOCK_RELEASE(self); |
There was a problem hiding this comment.
To ensure full thread-safety, self->seed should be copied to p->seed while holding self's lock. Otherwise, if another thread concurrently calls __init__ on self, p->seed could read a stale or torn value.
XXHASH_LOCK_ACQUIRE(self);
p->seed = self->seed;
XXH64_copyState(p->xxhash_state, self->xxhash_state);
XXHASH_LOCK_RELEASE(self);| p->seed = self->seed; | ||
| XXHASH_LOCK_ACQUIRE(self); | ||
| XXH3_copyState(p->xxhash_state, self->xxhash_state); |
There was a problem hiding this comment.
To ensure full thread-safety, self->seed should be copied to p->seed while holding self's lock. Otherwise, if another thread concurrently calls __init__ on self, p->seed could read a stale or torn value.
XXHASH_LOCK_ACQUIRE(self);
p->seed = self->seed;
XXH3_copyState(p->xxhash_state, self->xxhash_state);| p->seed = self->seed; | ||
| XXHASH_LOCK_ACQUIRE(self); | ||
| XXH3_copyState(p->xxhash_state, self->xxhash_state); |
There was a problem hiding this comment.
To ensure full thread-safety, self->seed should be copied to p->seed while holding self's lock. Otherwise, if another thread concurrently calls __init__ on self, p->seed could read a stale or torn value.
XXHASH_LOCK_ACQUIRE(self);
p->seed = self->seed;
XXH3_copyState(p->xxhash_state, self->xxhash_state);|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces thread-safety to the xxhash C extension by implementing per-object locking (using PyMutex on Python 3.13+ and PyThread_type_lock on Python 3.9-3.12) to protect internal hash states during concurrent operations. It also transitions static types to heap types for sub-interpreter safety, optimizes GIL release thresholds, updates benchmarks, and adds a comprehensive thread-safety test suite. There are no review comments to address, and I have no additional feedback to provide.
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces thread-safety mechanisms to the _xxhash C extension by implementing per-object locking (using PyMutex on Python 3.13+ and PyThread_type_lock on Python 3.9-3.12) and releasing the GIL for large hash updates. It also refactors the hash types to be heap-allocated for sub-interpreter safety and adds comprehensive thread-safety tests and benchmarks. The review feedback highlights critical issues regarding the use of PyObject_New instead of tp_alloc for heap types, which can cause memory corruption when subclassed. Additionally, the reviewer identified a thread-safety vulnerability where lazy lock allocation on Python 3.9-3.12 leaves small updates unprotected, and pointed out a stray, incorrectly indented statement at the end of the benchmark tests that would break execution.
| if ((self = PyObject_New(PYXXH32Object, type)) == NULL) { | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
Using PyObject_New to allocate instances of heap types is unsafe when subclassing is supported. If a user subclasses xxh32 in Python, the subclass type object will have a larger tp_basicsize (to accommodate __dict__, slots, etc.). PyObject_New only allocates sizeof(PYXXH32Object) bytes, which will lead to buffer overflows and memory corruption when subclass fields are accessed.
To support subclassing safely, always use type->tp_alloc(type, 0) to allocate instances of heap types.
if ((self = (PYXXH32Object *)type->tp_alloc(type, 0)) == NULL) {
return NULL;
}| if ((p = PyObject_New(PYXXH32Object, Py_TYPE(self))) == NULL) { | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
Using PyObject_New to copy instances of heap types is unsafe when subclassing is supported. If self is an instance of a subclass, Py_TYPE(self) will be the subclass type, which requires more memory than sizeof(PYXXH32Object). PyObject_New will allocate too little memory, leading to heap corruption.
To support subclassing safely, use Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0) to allocate the copy.
if ((p = (PYXXH32Object *)Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0)) == NULL) {
return NULL;
}| if ((self = PyObject_New(PYXXH64Object, type)) == NULL) { | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
Using PyObject_New to allocate instances of heap types is unsafe when subclassing is supported. If a user subclasses xxh64 in Python, the subclass type object will have a larger tp_basicsize. PyObject_New only allocates sizeof(PYXXH64Object) bytes, which will lead to buffer overflows and memory corruption when subclass fields are accessed.
To support subclassing safely, always use type->tp_alloc(type, 0) to allocate instances of heap types.
if ((self = (PYXXH64Object *)type->tp_alloc(type, 0)) == NULL) {
return NULL;
}| if ((p = PyObject_New(PYXXH64Object, Py_TYPE(self))) == NULL) { | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
Using PyObject_New to copy instances of heap types is unsafe when subclassing is supported. If self is an instance of a subclass, Py_TYPE(self) will be the subclass type, which requires more memory than sizeof(PYXXH64Object). PyObject_New will allocate too little memory, leading to heap corruption.
To support subclassing safely, use Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0) to allocate the copy.
if ((p = (PYXXH64Object *)Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0)) == NULL) {
return NULL;
}| if ((self = PyObject_New(PYXXH3_64Object, type)) == NULL) { | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
Using PyObject_New to allocate instances of heap types is unsafe when subclassing is supported. If a user subclasses xxh3_64 in Python, the subclass type object will have a larger tp_basicsize. PyObject_New only allocates sizeof(PYXXH3_64Object) bytes, which will lead to buffer overflows and memory corruption when subclass fields are accessed.
To support subclassing safely, always use type->tp_alloc(type, 0) to allocate instances of heap types.
if ((self = (PYXXH3_64Object *)type->tp_alloc(type, 0)) == NULL) {
return NULL;
}| if ((p = PyObject_New(PYXXH3_64Object, Py_TYPE(self))) == NULL) { | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
Using PyObject_New to copy instances of heap types is unsafe when subclassing is supported. If self is an instance of a subclass, Py_TYPE(self) will be the subclass type, which requires more memory than sizeof(PYXXH3_64Object). PyObject_New will allocate too little memory, leading to heap corruption.
To support subclassing safely, use Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0) to allocate the copy.
if ((p = (PYXXH3_64Object *)Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0)) == NULL) {
return NULL;
}| if ((self = PyObject_New(PYXXH3_128Object, type)) == NULL) { | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
Using PyObject_New to allocate instances of heap types is unsafe when subclassing is supported. If a user subclasses xxh3_128 in Python, the subclass type object will have a larger tp_basicsize. PyObject_New only allocates sizeof(PYXXH3_128Object) bytes, which will lead to buffer overflows and memory corruption when subclass fields are accessed.
To support subclassing safely, always use type->tp_alloc(type, 0) to allocate instances of heap types.
if ((self = (PYXXH3_128Object *)type->tp_alloc(type, 0)) == NULL) {
return NULL;
}| if ((p = PyObject_New(PYXXH3_128Object, Py_TYPE(self))) == NULL) { | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
Using PyObject_New to copy instances of heap types is unsafe when subclassing is supported. If self is an instance of a subclass, Py_TYPE(self) will be the subclass type, which requires more memory than sizeof(PYXXH3_128Object). PyObject_New will allocate too little memory, leading to heap corruption.
To support subclassing safely, use Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0) to allocate the copy.
if ((p = (PYXXH3_128Object *)Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0)) == NULL) {
return NULL;
}| h.hexdigest() | ||
|
|
||
|
|
||
| h.hexdigest() |
There was a problem hiding this comment.
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces thread-safety improvements to the xxhash C extension by implementing per-object locking to protect internal hash states from concurrent access, alongside adding sub-interpreter support and comprehensive test suites. The review comments highlight critical issues across multiple object creation and copy functions where memory allocated by PyObject_New is not zero-initialized, potentially causing crashes during deallocation if state initialization fails. Additionally, a stray, indented h.hexdigest() call at the end of tests/test_benchmark.py needs to be removed to prevent an IndentationError or NameError.
| return NULL; | ||
| } | ||
|
|
||
| XXHASH_LOCK_INIT(self); |
There was a problem hiding this comment.
Memory allocated by PyObject_New is not zero-initialized. If XXH32_createState() fails and returns NULL, self->xxhash_state will contain garbage. When Py_DECREF(self) is called, the deallocator PYXXH32_dealloc will attempt to free this garbage pointer, leading to a crash or memory corruption. Initialize self->xxhash_state to NULL immediately after lock initialization to prevent this.
XXHASH_LOCK_INIT(self);
self->xxhash_state = NULL;| h.hexdigest() | ||
|
|
||
|
|
||
| h.hexdigest() |
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces thread-safety to the _xxhash C extension by implementing per-object locking (using PyMutex for Python 3.13+ and PyThread_type_lock for Python 3.9-3.12) to protect internal hash states during concurrent operations. It also migrates static type definitions to heap types using PyType_FromModuleAndSpec for sub-interpreter safety, refactors argument parsing and initialization using macros, and adds comprehensive thread-safety, sub-interpreter, and benchmark tests. I have no feedback to provide as the only review comment was incorrect; PyModule_AddType does not steal a reference to the type object, meaning the Py_DECREF calls in the original code are correct and necessary to prevent memory leaks.
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces thread-safety improvements to the _xxhash C extension by implementing per-object locking and releasing the GIL for large inputs. It also refactors static types to heap types to support sub-interpreters (PEP 489/684/689) and adds comprehensive thread-safety, sub-interpreter, and benchmark tests. However, the reviewer identified a critical performance regression: the PyType_Spec definitions for all four hash types (xxh32, xxh64, xxh3_64, and xxh3_128) are missing the Py_TPFLAGS_HAVE_VECTORCALL flag. Without this flag, CPython will ignore the assigned tp_vectorcall handler and fall back to the slower standard call path.
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces thread-safety locking mechanisms to protect internal hash states during concurrent GIL-released updates, transitions the extension to use heap types for sub-interpreter support, and adds a fast path for bytes objects. Feedback on the changes highlights that the Py_TPFLAGS_HAVE_VECTORCALL flag is missing from the newly defined heap type specs (XXH32Type_spec, XXH64Type_spec, XXH3_64Type_spec, and XXH3_128Type_spec), which prevents CPython from utilizing the optimized vectorcall path and causes constructor calls to fall back to slower paths.
|
@gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces thread-safety to the xxhash C extension by implementing per-object locking (using PyMutex for Python 3.13+ and PyThread_type_lock for older versions) and migrates static types to heap types to support sub-interpreters safely. It also adds comprehensive thread-safety and sub-interpreter tests, alongside new benchmarks. The review feedback correctly identifies that the Py_TPFLAGS_HAVE_VECTORCALL flag is missing from the PyType_Spec definitions for all hash types (xxh32, xxh64, xxh3_64, and xxh3_128), which prevents the vectorcall optimization from being enabled and falls back to the slower standard call path.
6d7989b to
74b3cbb
Compare
…readed Python support Introduce a per-object lock (PyThread_type_lock on 3.9-3.12, PyMutex on 3.13+) to protect all access to xxhash_state, eliminating reliance on the GIL. Lock is lazily allocated on first large update (>=64KB) for Python < 3.13, avoiding overhead in single-threaded small-data paths. On Python 3.13+ the lock is always active. Key changes: - All state accessors (update, digest, hexdigest, intdigest, copy, reset, init) are now lock-guarded - update() releases GIL before acquiring lock for large data to avoid ABBA deadlock, matching hashlib pattern - One-shot hash functions (xxh32(), xxh3_64(), etc.) release GIL for data >=64KB - Switch from static PyTypeObject to heap types via PyType_FromModuleAndSpec, required for sub-interpreter safety - Declare Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (3.12+) and Py_MOD_GIL_NOT_USED (3.13+) in module slots - Deduplicate per-type init/update logic via XXHASH_INIT and XXHASH_DO_UPDATE macros - Optimize _get_buffer_or_str with a fast-path for bytes objects - Replace intermediate stack buffers in digest/hexdigest with direct writes into pre-allocated PyBytes/PyUnicode - Add comprehensive thread-safety stress tests (test_thread_safety.py) - Add sub-interpreter compatibility tests (test_interpreters.py)
Introduce a per-object lock (PyThread_type_lock on 3.9-3.12, PyMutex on 3.13+) to protect all access to xxhash_state, eliminating reliance on the GIL.
Lock is lazily allocated on first large update (>=64KB) for Python < 3.13, avoiding overhead in single-threaded small-data paths. On Python 3.13+ the lock is always active.
Key changes: