Add per-object locking for thread safety, sub-interpreter and free-threaded Python support by ifduyue · Pull Request #154 · ifduyue/python-xxhash

ifduyue · 2026-05-28T15:38:22Z

Introduce a per-object lock (PyThread_type_lock on 3.9-3.12, PyMutex on 3.13+) to protect all access to xxhash_state, eliminating reliance on the GIL.

Lock is lazily allocated on first large update (>=64KB) for Python < 3.13, avoiding overhead in single-threaded small-data paths. On Python 3.13+ the lock is always active.

Key changes:

All state accessors (update, digest, hexdigest, intdigest, copy, reset, init) are now lock-guarded
update() releases GIL before acquiring lock for large data to avoid ABBA deadlock, matching hashlib pattern
One-shot hash functions (xxh32(), xxh3_64(), etc.) release GIL for data >=64KB
Switch from static PyTypeObject to heap types via PyType_FromModuleAndSpec, required for sub-interpreter safety
Declare Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (3.12+) and Py_MOD_GIL_NOT_USED (3.13+) in module slots
Deduplicate per-type init/update logic via XXHASH_INIT and XXHASH_DO_UPDATE macros
Optimize _get_buffer_or_str with a fast-path for bytes objects
Replace intermediate stack buffers in digest/hexdigest with direct writes into pre-allocated PyBytes/PyUnicode
Add comprehensive thread-safety stress tests (test_thread_safety.py)
Add sub-interpreter compatibility tests (test_interpreters.py)

ifduyue · 2026-05-28T15:39:15Z

@gemini review

gemini-code-assist

Code Review

This pull request introduces thread-safety enhancements to the _xxhash C extension by implementing per-object locking (using PyMutex for Python 3.13+ and PyThread_type_lock for Python 3.9-3.12) to protect internal state access when the GIL is released. It also adds a comprehensive test suite in tests/test_thread_safety.py to verify concurrent operations. The review feedback highlights critical improvements for the test suite, including correctly handling Windows process crashes and timeouts in subprocesses, failing tests on any non-zero exit code, and making the test repetitions configurable via an environment variable to optimize CI runtimes.

gemini-code-assist · 2026-05-28T15:40:51Z

+    def _run_many(self, code, scenario_name):
+        crashed = 0
+        errors = []
+        for i in range(self.REPETITIONS):
+            rc, out, err = _run_in_subprocess(code, timeout=self.TIMEOUT)
+            if rc == -signal.SIGSEGV:
+                crashed += 1
+                errors.append(f"run {i}: SEGFAULT")
+            elif rc == -signal.SIGABRT:
+                crashed += 1
+                errors.append(f"run {i}: SIGABRT")
+            elif rc != 0:
+                # Non-zero exit – something went wrong
+                errors.append(f"run {i}: exit {rc} {err[:200]}")
+            # else: OK, no news is good news
+
+        if crashed:
+            self.fail(
+                f"{scenario_name}: {crashed}/{self.REPETITIONS} runs crashed "
+                f"due to data race: {'; '.join(errors[:5])}"
+            )
+        if errors:
+            # Non-crash errors (timeouts, exceptions) – these might still
+            # indicate data-corruption issues even without a segfault.
+            print(
+                f"WARNING: {scenario_name}: {len(errors)}/{self.REPETITIONS} "
+                f"non-crash failures (may indicate corruption): "
+                f"{'; '.join(errors[:3])}"
+            )


There are two major issues with the current implementation of _run_many:

On Windows, process crashes (such as Access Violations / SIGSEGV) return positive exit codes (like 3221225477 / 0xC0000005) rather than negative signal numbers. These are not caught by rc == -signal.SIGSEGV, meaning Windows crashes will only print a warning and the test will still pass!

Any other non-zero exit code (such as timeouts or unexpected exceptions) only prints a warning instead of failing the test. Any non-zero exit code or timeout represents a failure in thread-safety (e.g., deadlock, corruption, or crash) and must fail the test.

We should fail the test on any non-zero exit code and correctly identify Windows crashes.

Suggested change

def _run_many(self, code, scenario_name):

crashed = 0

errors = []

for i in range(self.REPETITIONS):

rc, out, err = _run_in_subprocess(code, timeout=self.TIMEOUT)

if rc == -signal.SIGSEGV:

crashed += 1

errors.append(f"run {i}: SEGFAULT")

elif rc == -signal.SIGABRT:

crashed += 1

errors.append(f"run {i}: SIGABRT")

elif rc != 0:

# Non-zero exit – something went wrong

errors.append(f"run {i}: exit {rc} {err[:200]}")

# else: OK, no news is good news

if crashed:

self.fail(

f"{scenario_name}: {crashed}/{self.REPETITIONS} runs crashed "

f"due to data race: {'; '.join(errors[:5])}"

)

if errors:

# Non-crash errors (timeouts, exceptions) – these might still

# indicate data-corruption issues even without a segfault.

print(

f"WARNING: {scenario_name}: {len(errors)}/{self.REPETITIONS} "

f"non-crash failures (may indicate corruption): "

f"{'; '.join(errors[:3])}"

)

def _run_many(self, code, scenario_name):

errors = []

for i in range(self.REPETITIONS):

rc, out, err = _run_in_subprocess(code, timeout=self.TIMEOUT)

if rc != 0:

status = f"exit {rc}"

if rc == -signal.SIGSEGV or rc == 3221225477:

status = "SEGFAULT/Access Violation"

elif rc == -signal.SIGABRT:

status = "SIGABRT"

errors.append(f"run {i}: {status}\nSTDERR: {err[:200]}")

if errors:

self.fail(

f"{scenario_name}: {len(errors)}/{self.REPETITIONS} runs failed or crashed: "

f"{'; '.join(errors[:5])}"

)

gemini-code-assist · 2026-05-28T15:40:51Z

+def _run_in_subprocess(code: str, timeout: float = 60.0):
+    """Run *code* in a subprocess and return (returncode, stdout, stderr)."""
+    proc = subprocess.run(
+        [sys.executable, "-c", code],
+        capture_output=True,
+        text=True,
+        timeout=timeout,
+    )
+    return proc.returncode, proc.stdout, proc.stderr


The _run_in_subprocess function does not handle subprocess.TimeoutExpired. If a timeout occurs, it will raise an unhandled exception and crash the entire test suite immediately, rather than reporting it as a test failure. Catching TimeoutExpired and returning a non-zero exit code (e.g., -1) ensures that timeouts are gracefully handled and reported as test failures.

Suggested change

def _run_in_subprocess(code: str, timeout: float = 60.0):

"""Run *code* in a subprocess and return (returncode, stdout, stderr)."""

proc = subprocess.run(

[sys.executable, "-c", code],

capture_output=True,

text=True,

timeout=timeout,

)

return proc.returncode, proc.stdout, proc.stderr

def _run_in_subprocess(code: str, timeout: float = 60.0):

"""Run *code* in a subprocess and return (returncode, stdout, stderr)."""

try:

proc = subprocess.run(

[sys.executable, "-c", code],

capture_output=True,

text=True,

timeout=timeout,

)

return proc.returncode, proc.stdout, proc.stderr

except subprocess.TimeoutExpired as e:

return -1, e.stdout or "", (e.stderr or "") + "\nTIMEOUT EXPIRED"

gemini-code-assist · 2026-05-28T15:40:51Z

+    crashes, deadlocks, or unexpected exceptions occur.
+    """
+
+    REPETITIONS = 20


Hardcoding REPETITIONS = 20 can make the test suite very slow, especially in CI environments where resources are constrained. Making it configurable via an environment variable (defaulting to a lower value like 5 or 10 for standard CI runs) allows developers to run aggressive stress tests locally while keeping CI fast and efficient.

Suggested change

REPETITIONS = 20

REPETITIONS = int(os.environ.get("XXHASH_TEST_REPETITIONS", 5))

codspeed-hq · 2026-05-28T15:41:34Z

Merging this PR will not alter performance

✅ 54 untouched benchmarks
🆕 94 new benchmarks
⏩ 66 skipped benchmarks¹

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
🆕	`test_xxh32_ctor_seed_kw`	N/A	82.8 µs	N/A
🆕	`test_xxh32_stream_update_digest_64kb`	N/A	187.6 µs	N/A
🆕	`test_xxh32_stream_intdigest_5b`	N/A	85.6 µs	N/A
🆕	`test_xxh3_128_digest_64kb`	N/A	129.9 µs	N/A
🆕	`test_xxh32_stream_digest_2mb`	N/A	2.2 ms	N/A
🆕	`test_xxh3_128_ctor_empty`	N/A	81.9 µs	N/A
🆕	`test_xxh32_stream_update_digest_2mb`	N/A	3.3 ms	N/A
🆕	`test_xxh32_digest_5b`	N/A	81.7 µs	N/A
🆕	`test_xxh32_stream_update_hexdigest_5b`	N/A	85.8 µs	N/A
🆕	`test_xxh32_stream_update_hexdigest_64kb`	N/A	187.9 µs	N/A
🆕	`test_xxh3_128_ctor_seed_kw`	N/A	84 µs	N/A
🆕	`test_xxh32_stream_intdigest_64kb`	N/A	149.9 µs	N/A
🆕	`test_xxh32_stream_update_digest_5b`	N/A	85.2 µs	N/A
🆕	`test_xxh3_128_hexdigest_5b`	N/A	82.3 µs	N/A
🆕	`test_xxh32_stream_update_hexdigest_2mb`	N/A	3.3 ms	N/A
🆕	`test_xxh3_128_digest_5b`	N/A	81.7 µs	N/A
🆕	`test_xxh32_digest_64kb`	N/A	138.8 µs	N/A
🆕	`test_xxh32_stream_update_intdigest_2mb`	N/A	3.3 ms	N/A
🆕	`test_xxh32_stream_update_intdigest_64kb`	N/A	187.8 µs	N/A
🆕	`test_xxh3_128_hexdigest_64kb`	N/A	130.2 µs	N/A
...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

_{Comparing multi-threads (0c1c69f) with master (e3eb2c0)}

66 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

gemini-code-assist

Code Review

This pull request introduces robust thread-safety mechanisms to the _xxhash C extension by implementing per-object locking using PyMutex (for Python 3.13+) and PyThread_type_lock (for Python 3.9-3.12). It also releases the GIL for hashing large data blocks, adds sub-interpreter support, and introduces a comprehensive suite of thread-safety tests. The review feedback suggests improving the test suite's robustness by gracefully handling subprocess timeouts inside the test runner to explicitly report potential deadlocks instead of raising raw tracebacks.

gemini-code-assist · 2026-05-28T15:42:41Z

+def _run_in_subprocess(code: str, timeout: float = 60.0):
+    """Run *code* in a subprocess and return (returncode, stdout, stderr)."""
+    proc = subprocess.run(
+        [sys.executable, "-c", code],
+        capture_output=True,
+        text=True,
+        timeout=timeout,
+    )
+    return proc.returncode, proc.stdout, proc.stderr


If a deadlock or hang occurs during the thread-safety tests, subprocess.run will raise subprocess.TimeoutExpired after the timeout. Handling this exception gracefully inside _run_in_subprocess and returning a specific error code (e.g., -999) allows the test suite to report a clear diagnostic message (like 'TIMEOUT (possible deadlock)') instead of failing with a raw traceback.

Suggested change

def _run_in_subprocess(code: str, timeout: float = 60.0):

"""Run *code* in a subprocess and return (returncode, stdout, stderr)."""

proc = subprocess.run(

[sys.executable, "-c", code],

capture_output=True,

text=True,

timeout=timeout,

)

return proc.returncode, proc.stdout, proc.stderr

def _run_in_subprocess(code: str, timeout: float = 60.0):

"""Run *code* in a subprocess and return (returncode, stdout, stderr)."""

try:

proc = subprocess.run(

[sys.executable, "-c", code],

capture_output=True,

text=True,

timeout=timeout,

)

return proc.returncode, proc.stdout, proc.stderr

except subprocess.TimeoutExpired:

return -999, "", "TIMEOUT"

gemini-code-assist · 2026-05-28T15:42:42Z

+        for i in range(self.REPETITIONS):
+            rc, out, err = _run_in_subprocess(code, timeout=self.TIMEOUT)
+            if rc == -signal.SIGSEGV:
+                crashed += 1
+                errors.append(f"run {i}: SEGFAULT")
+            elif rc == -signal.SIGABRT:
+                crashed += 1
+                errors.append(f"run {i}: SIGABRT")
+            elif rc != 0:
+                # Non-zero exit – something went wrong
+                errors.append(f"run {i}: exit {rc} {err[:200]}")
+            # else: OK, no news is good news


Handle the custom timeout return code (-999) from _run_in_subprocess to explicitly report a possible deadlock/timeout failure in the test summary.

for i in range(self.REPETITIONS): rc, out, err = _run_in_subprocess(code, timeout=self.TIMEOUT) if rc == -signal.SIGSEGV: crashed += 1 errors.append(f"run {i}: SEGFAULT") elif rc == -signal.SIGABRT: crashed += 1 errors.append(f"run {i}: SIGABRT") elif rc == -999: crashed += 1 errors.append(f"run {i}: TIMEOUT (possible deadlock)") elif rc != 0: # Non-zero exit – something went wrong errors.append(f"run {i}: exit {rc} {err[:200]}") # else: OK, no news is good news

ifduyue · 2026-05-28T15:57:06Z

@gemini review

gemini-code-assist

Code Review

This pull request introduces thread-safety to the xxhash C extension by implementing per-object locking (using PyMutex for Python 3.13+ and PyThread_type_lock for Python 3.9-3.12) and releasing the GIL for large updates. It also adds comprehensive thread-safety tests. A high-severity issue was identified in the Python 3.13+ implementation where calling PyMutex_Lock while holding the GIL can lead to interpreter-wide GIL starvation. It is recommended to release the GIL while waiting to acquire the mutex.

ifduyue · 2026-05-28T16:47:06Z

@gemini review

gemini-code-assist

Code Review

This pull request introduces per-object locking (using PyThread_type_lock) to the xxhash C extension to ensure thread safety across all hash types, preventing data races during concurrent access. The changes include the addition of a new locking macro, updates to the initialization logic for hash objects, and the inclusion of comprehensive thread-safety tests. Additionally, build configuration files were updated and the version number was incremented. A high-severity review comment was raised regarding the potential for crashes when using static type objects with sub-interpreter support, suggesting a refactor to heap-allocated types.

ifduyue · 2026-05-28T17:16:50Z

@gemini review

gemini-code-assist

Code Review

This pull request introduces thread-safety improvements to the _xxhash C extension by implementing per-object locking (using PyMutex for Python 3.13+ and PyThread_type_lock for Python 3.9-3.12) to prevent data races during concurrent hash operations. It also migrates static types to heap types for sub-interpreter safety and adds comprehensive thread-safety and benchmark tests. The review feedback highlights several potential data races where the per-object lock is not acquired: specifically, within the init macro which can be called multiple times on a shared object, and when copying self->seed to a new object during copy operations without holding self's lock.

gemini-code-assist · 2026-05-28T17:20:04Z

    p->seed = self->seed;
+    XXHASH_LOCK_ACQUIRE(self);
    XXH32_copyState(p->xxhash_state, self->xxhash_state);
+    XXHASH_LOCK_RELEASE(self);


To ensure full thread-safety, self->seed should be copied to p->seed while holding self's lock. Otherwise, if another thread concurrently calls __init__ on self, p->seed could read a stale or torn value.

XXHASH_LOCK_ACQUIRE(self); p->seed = self->seed; XXH32_copyState(p->xxhash_state, self->xxhash_state); XXHASH_LOCK_RELEASE(self);

gemini-code-assist · 2026-05-28T17:20:05Z

    p->seed = self->seed;
+    XXHASH_LOCK_ACQUIRE(self);
    XXH64_copyState(p->xxhash_state, self->xxhash_state);
+    XXHASH_LOCK_RELEASE(self);


To ensure full thread-safety, self->seed should be copied to p->seed while holding self's lock. Otherwise, if another thread concurrently calls __init__ on self, p->seed could read a stale or torn value.

XXHASH_LOCK_ACQUIRE(self); p->seed = self->seed; XXH64_copyState(p->xxhash_state, self->xxhash_state); XXHASH_LOCK_RELEASE(self);

gemini-code-assist · 2026-05-28T17:20:05Z

    p->seed = self->seed;
+    XXHASH_LOCK_ACQUIRE(self);
    XXH3_copyState(p->xxhash_state, self->xxhash_state);


To ensure full thread-safety, self->seed should be copied to p->seed while holding self's lock. Otherwise, if another thread concurrently calls __init__ on self, p->seed could read a stale or torn value.

XXHASH_LOCK_ACQUIRE(self); p->seed = self->seed; XXH3_copyState(p->xxhash_state, self->xxhash_state);

gemini-code-assist · 2026-05-28T17:20:05Z

    p->seed = self->seed;
+    XXHASH_LOCK_ACQUIRE(self);
    XXH3_copyState(p->xxhash_state, self->xxhash_state);


To ensure full thread-safety, self->seed should be copied to p->seed while holding self's lock. Otherwise, if another thread concurrently calls __init__ on self, p->seed could read a stale or torn value.

XXHASH_LOCK_ACQUIRE(self); p->seed = self->seed; XXH3_copyState(p->xxhash_state, self->xxhash_state);

ifduyue · 2026-05-28T17:27:09Z

@gemini review

gemini-code-assist

Code Review

This pull request introduces thread-safety to the xxhash C extension by implementing per-object locking (using PyMutex on Python 3.13+ and PyThread_type_lock on Python 3.9-3.12) to protect internal hash states during concurrent operations. It also transitions static types to heap types for sub-interpreter safety, optimizes GIL release thresholds, updates benchmarks, and adds a comprehensive thread-safety test suite. There are no review comments to address, and I have no additional feedback to provide.

ifduyue · 2026-05-28T17:32:46Z

@gemini review

gemini-code-assist

Code Review

This pull request introduces thread-safety mechanisms to the _xxhash C extension by implementing per-object locking (using PyMutex on Python 3.13+ and PyThread_type_lock on Python 3.9-3.12) and releasing the GIL for large hash updates. It also refactors the hash types to be heap-allocated for sub-interpreter safety and adds comprehensive thread-safety tests and benchmarks. The review feedback highlights critical issues regarding the use of PyObject_New instead of tp_alloc for heap types, which can cause memory corruption when subclassed. Additionally, the reviewer identified a thread-safety vulnerability where lazy lock allocation on Python 3.9-3.12 leaves small updates unprotected, and pointed out a stray, incorrectly indented statement at the end of the benchmark tests that would break execution.

gemini-code-assist · 2026-05-28T17:35:49Z

+    if ((self = PyObject_New(PYXXH32Object, type)) == NULL) {
        return NULL;
    }


Using PyObject_New to allocate instances of heap types is unsafe when subclassing is supported. If a user subclasses xxh32 in Python, the subclass type object will have a larger tp_basicsize (to accommodate __dict__, slots, etc.). PyObject_New only allocates sizeof(PYXXH32Object) bytes, which will lead to buffer overflows and memory corruption when subclass fields are accessed.

To support subclassing safely, always use type->tp_alloc(type, 0) to allocate instances of heap types.

if ((self = (PYXXH32Object *)type->tp_alloc(type, 0)) == NULL) { return NULL; }

gemini-code-assist · 2026-05-28T17:35:49Z

+    if ((p = PyObject_New(PYXXH32Object, Py_TYPE(self))) == NULL) {
        return NULL;
    }


Using PyObject_New to copy instances of heap types is unsafe when subclassing is supported. If self is an instance of a subclass, Py_TYPE(self) will be the subclass type, which requires more memory than sizeof(PYXXH32Object). PyObject_New will allocate too little memory, leading to heap corruption.

To support subclassing safely, use Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0) to allocate the copy.

if ((p = (PYXXH32Object *)Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0)) == NULL) { return NULL; }

gemini-code-assist · 2026-05-28T17:35:49Z

+    if ((self = PyObject_New(PYXXH64Object, type)) == NULL) {
        return NULL;
    }


Using PyObject_New to allocate instances of heap types is unsafe when subclassing is supported. If a user subclasses xxh64 in Python, the subclass type object will have a larger tp_basicsize. PyObject_New only allocates sizeof(PYXXH64Object) bytes, which will lead to buffer overflows and memory corruption when subclass fields are accessed.

To support subclassing safely, always use type->tp_alloc(type, 0) to allocate instances of heap types.

if ((self = (PYXXH64Object *)type->tp_alloc(type, 0)) == NULL) { return NULL; }

gemini-code-assist · 2026-05-28T17:35:49Z

+    if ((p = PyObject_New(PYXXH64Object, Py_TYPE(self))) == NULL) {
        return NULL;
    }


Using PyObject_New to copy instances of heap types is unsafe when subclassing is supported. If self is an instance of a subclass, Py_TYPE(self) will be the subclass type, which requires more memory than sizeof(PYXXH64Object). PyObject_New will allocate too little memory, leading to heap corruption.

To support subclassing safely, use Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0) to allocate the copy.

if ((p = (PYXXH64Object *)Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0)) == NULL) { return NULL; }

gemini-code-assist · 2026-05-28T17:35:49Z

+    if ((self = PyObject_New(PYXXH3_64Object, type)) == NULL) {
        return NULL;
    }


Using PyObject_New to allocate instances of heap types is unsafe when subclassing is supported. If a user subclasses xxh3_64 in Python, the subclass type object will have a larger tp_basicsize. PyObject_New only allocates sizeof(PYXXH3_64Object) bytes, which will lead to buffer overflows and memory corruption when subclass fields are accessed.

To support subclassing safely, always use type->tp_alloc(type, 0) to allocate instances of heap types.

if ((self = (PYXXH3_64Object *)type->tp_alloc(type, 0)) == NULL) { return NULL; }

gemini-code-assist · 2026-05-28T17:35:49Z

+    if ((p = PyObject_New(PYXXH3_64Object, Py_TYPE(self))) == NULL) {
        return NULL;
    }


Using PyObject_New to copy instances of heap types is unsafe when subclassing is supported. If self is an instance of a subclass, Py_TYPE(self) will be the subclass type, which requires more memory than sizeof(PYXXH3_64Object). PyObject_New will allocate too little memory, leading to heap corruption.

To support subclassing safely, use Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0) to allocate the copy.

if ((p = (PYXXH3_64Object *)Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0)) == NULL) { return NULL; }

gemini-code-assist · 2026-05-28T17:35:49Z

+    if ((self = PyObject_New(PYXXH3_128Object, type)) == NULL) {
        return NULL;
    }


Using PyObject_New to allocate instances of heap types is unsafe when subclassing is supported. If a user subclasses xxh3_128 in Python, the subclass type object will have a larger tp_basicsize. PyObject_New only allocates sizeof(PYXXH3_128Object) bytes, which will lead to buffer overflows and memory corruption when subclass fields are accessed.

To support subclassing safely, always use type->tp_alloc(type, 0) to allocate instances of heap types.

if ((self = (PYXXH3_128Object *)type->tp_alloc(type, 0)) == NULL) { return NULL; }

gemini-code-assist · 2026-05-28T17:35:49Z

+    if ((p = PyObject_New(PYXXH3_128Object, Py_TYPE(self))) == NULL) {
        return NULL;
    }


Using PyObject_New to copy instances of heap types is unsafe when subclassing is supported. If self is an instance of a subclass, Py_TYPE(self) will be the subclass type, which requires more memory than sizeof(PYXXH3_128Object). PyObject_New will allocate too little memory, leading to heap corruption.

To support subclassing safely, use Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0) to allocate the copy.

if ((p = (PYXXH3_128Object *)Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0)) == NULL) { return NULL; }

gemini-code-assist · 2026-05-28T17:35:49Z

+    h.hexdigest()
+
+
+    h.hexdigest()


There is a stray h.hexdigest() statement at the module level (with incorrect indentation) at the end of the file. This will cause an IndentationError or NameError when the benchmark module is imported, breaking the benchmark execution.

Suggested change

h.hexdigest()

h.hexdigest()

h.hexdigest()

ifduyue · 2026-05-28T17:40:31Z

@gemini review

gemini-code-assist

Code Review

This pull request introduces thread-safety improvements to the xxhash C extension by implementing per-object locking to protect internal hash states from concurrent access, alongside adding sub-interpreter support and comprehensive test suites. The review comments highlight critical issues across multiple object creation and copy functions where memory allocated by PyObject_New is not zero-initialized, potentially causing crashes during deallocation if state initialization fails. Additionally, a stray, indented h.hexdigest() call at the end of tests/test_benchmark.py needs to be removed to prevent an IndentationError or NameError.

gemini-code-assist · 2026-05-28T17:42:56Z

        return NULL;
    }

+    XXHASH_LOCK_INIT(self);


Memory allocated by PyObject_New is not zero-initialized. If XXH32_createState() fails and returns NULL, self->xxhash_state will contain garbage. When Py_DECREF(self) is called, the deallocator PYXXH32_dealloc will attempt to free this garbage pointer, leading to a crash or memory corruption. Initialize self->xxhash_state to NULL immediately after lock initialization to prevent this.

XXHASH_LOCK_INIT(self); self->xxhash_state = NULL;

gemini-code-assist · 2026-05-28T17:42:57Z

+    h.hexdigest()
+
+
+    h.hexdigest()


There is a stray, indented h.hexdigest() call at the end of the file outside of any function. This will cause an IndentationError or NameError when the test module is imported or executed. This line should be removed.

ifduyue · 2026-05-28T18:03:25Z

@gemini review

gemini-code-assist

Code Review

This pull request introduces thread-safety to the _xxhash C extension by implementing per-object locking (using PyMutex for Python 3.13+ and PyThread_type_lock for Python 3.9-3.12) to protect internal hash states during concurrent operations. It also migrates static type definitions to heap types using PyType_FromModuleAndSpec for sub-interpreter safety, refactors argument parsing and initialization using macros, and adds comprehensive thread-safety, sub-interpreter, and benchmark tests. I have no feedback to provide as the only review comment was incorrect; PyModule_AddType does not steal a reference to the type object, meaning the Py_DECREF calls in the original code are correct and necessary to prevent memory leaks.

ifduyue · 2026-05-29T04:25:55Z

@gemini review

gemini-code-assist

Code Review

This pull request introduces thread-safety improvements to the _xxhash C extension by implementing per-object locking and releasing the GIL for large inputs. It also refactors static types to heap types to support sub-interpreters (PEP 489/684/689) and adds comprehensive thread-safety, sub-interpreter, and benchmark tests. However, the reviewer identified a critical performance regression: the PyType_Spec definitions for all four hash types (xxh32, xxh64, xxh3_64, and xxh3_128) are missing the Py_TPFLAGS_HAVE_VECTORCALL flag. Without this flag, CPython will ignore the assigned tp_vectorcall handler and fall back to the slower standard call path.

ifduyue · 2026-05-29T04:47:12Z

@gemini review

gemini-code-assist

Code Review

This pull request introduces thread-safety locking mechanisms to protect internal hash states during concurrent GIL-released updates, transitions the extension to use heap types for sub-interpreter support, and adds a fast path for bytes objects. Feedback on the changes highlights that the Py_TPFLAGS_HAVE_VECTORCALL flag is missing from the newly defined heap type specs (XXH32Type_spec, XXH64Type_spec, XXH3_64Type_spec, and XXH3_128Type_spec), which prevents CPython from utilizing the optimized vectorcall path and causes constructor calls to fall back to slower paths.

ifduyue · 2026-05-29T06:48:00Z

@gemini review

gemini-code-assist

Code Review

This pull request introduces thread-safety to the xxhash C extension by implementing per-object locking (using PyMutex for Python 3.13+ and PyThread_type_lock for older versions) and migrates static types to heap types to support sub-interpreters safely. It also adds comprehensive thread-safety and sub-interpreter tests, alongside new benchmarks. The review feedback correctly identifies that the Py_TPFLAGS_HAVE_VECTORCALL flag is missing from the PyType_Spec definitions for all hash types (xxh32, xxh64, xxh3_64, and xxh3_128), which prevents the vectorcall optimization from being enabled and falls back to the slower standard call path.

…readed Python support Introduce a per-object lock (PyThread_type_lock on 3.9-3.12, PyMutex on 3.13+) to protect all access to xxhash_state, eliminating reliance on the GIL. Lock is lazily allocated on first large update (>=64KB) for Python < 3.13, avoiding overhead in single-threaded small-data paths. On Python 3.13+ the lock is always active. Key changes: - All state accessors (update, digest, hexdigest, intdigest, copy, reset, init) are now lock-guarded - update() releases GIL before acquiring lock for large data to avoid ABBA deadlock, matching hashlib pattern - One-shot hash functions (xxh32(), xxh3_64(), etc.) release GIL for data >=64KB - Switch from static PyTypeObject to heap types via PyType_FromModuleAndSpec, required for sub-interpreter safety - Declare Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (3.12+) and Py_MOD_GIL_NOT_USED (3.13+) in module slots - Deduplicate per-type init/update logic via XXHASH_INIT and XXHASH_DO_UPDATE macros - Optimize _get_buffer_or_str with a fast-path for bytes objects - Replace intermediate stack buffers in digest/hexdigest with direct writes into pre-allocated PyBytes/PyUnicode - Add comprehensive thread-safety stress tests (test_thread_safety.py) - Add sub-interpreter compatibility tests (test_interpreters.py)