Skip to content

Thread-safety: data race in ImagingDefaultArena block cache (memory_get_block / memory_return_block) #9600

@ctkhanhly

Description

@ctkhanhly

What did you do?

Used asyncio.to_thread() / ThreadPoolExecutor to offload PIL Image operations (create, load, resize, encode) to multiple threads concurrently.

What did you expect to happen?

Thread-safe access to PIL Image allocation/deallocation, or documentation stating that multi-threaded usage requires set_blocks_max(0).

What actually happened?

ImagingDefaultArena in src/libImaging/Storage.c uses shared mutable state (blocks_cached, blocks_pool) without any synchronization. When multiple threads call memory_get_block / memory_return_block concurrently (which happens when PIL's GIL-releasing C operations run in parallel), data races can occur:

// memory_get_block (line 310) — no lock
if (arena->blocks_cached > 0) {
    arena->blocks_cached -= 1;                          // ← read-modify-write, no lock
    block = arena->blocks_pool[arena->blocks_cached];   // ← concurrent read, no lock
}

// memory_return_block (line 347) — no lock
if (arena->blocks_cached < arena->blocks_max) {
    arena->blocks_pool[arena->blocks_cached] = block;   // ← concurrent write, no lock
    arena->blocks_cached += 1;                          // ← read-modify-write, no lock
} else {
    free(block.ptr);
}

Race scenario (two threads returning blocks simultaneously):

Thread A: reads blocks_cached = 19 (< blocks_max 20)
Thread B: reads blocks_cached = 19 (same stale value — no lock)
Thread A: writes blocks_pool[19] = block_A, increments to 20
Thread B: writes blocks_pool[19] = block_B (OVERWRITES block_A)

Result: block_A.ptr is permanently lost — malloc'd, never free'd.

Similarly, memory_get_block can hand the same cached block to two threads simultaneously (use-after-free / double-free risk).

The race window is narrow under the GIL (requires concurrent C-level execution during GIL-released operations like resize/encode), making it hard to reproduce in simple tests. Under sustained high-concurrency production workloads with longer GIL-released operations, the race triggers more frequently.

Workaround: Image.core.set_blocks_max(0) disables the cache entirely, eliminating the shared mutable state.

What are your OS, Python and Pillow versions?

  • OS: Linux x86_64
  • Python: 3.12
  • Pillow: 10.4.0
import asyncio
from concurrent.futures import ThreadPoolExecutor
from PIL import Image

# This triggers concurrent access to ImagingDefaultArena from multiple threads.
# The race is probabilistic and more likely under sustained production load
# with longer GIL-released operations (resize, encode).
async def main():
    loop = asyncio.get_event_loop()
    loop.set_default_executor(ThreadPoolExecutor(max_workers=20))

    resolutions = [(512, 512), (1024, 1024), (1536, 1536), (2048, 2048)]

    async def process(res):
        def _work():
            img = Image.new("RGB", res)
            img.load()
        await asyncio.to_thread(_work)

    for _ in range(1000):
        await asyncio.gather(*[process(resolutions[i % 4]) for i in range(20)])

asyncio.run(main())

Additional context:

PIL's C extensions release the GIL during image operations, enabling true parallelism in thread pools. However, ImagingDefaultArena is a process-global struct with no mutex or atomic operations protecting its state. When asyncio.to_thread dispatches PIL work to a ThreadPoolExecutor, multiple threads can call memory_get_block / memory_return_block concurrently during the GIL-released portions of operations like Image.resize() or Image.save().

This is particularly relevant for:

  • Python's increasing use of asyncio.to_thread for GIL-releasing C extensions
  • The upcoming free-threaded CPython (PEP 703), which will remove the GIL entirely and make this race trivially triggerable

Suggested fixes:

  1. Add a pthread_mutex around blocks_cached / blocks_pool access in memory_get_block and memory_return_block
  2. Or use per-thread arenas (eliminates contention)
  3. Or document that the block cache is not thread-safe and recommend set_blocks_max(0) for multi-threaded usage

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions