What did you do?
Used asyncio.to_thread() / ThreadPoolExecutor to offload PIL Image operations (create, load, resize, encode) to multiple threads concurrently.
What did you expect to happen?
Thread-safe access to PIL Image allocation/deallocation, or documentation stating that multi-threaded usage requires set_blocks_max(0).
What actually happened?
ImagingDefaultArena in src/libImaging/Storage.c uses shared mutable state (blocks_cached, blocks_pool) without any synchronization. When multiple threads call memory_get_block / memory_return_block concurrently (which happens when PIL's GIL-releasing C operations run in parallel), data races can occur:
// memory_get_block (line 310) — no lock
if (arena->blocks_cached > 0) {
arena->blocks_cached -= 1; // ← read-modify-write, no lock
block = arena->blocks_pool[arena->blocks_cached]; // ← concurrent read, no lock
}
// memory_return_block (line 347) — no lock
if (arena->blocks_cached < arena->blocks_max) {
arena->blocks_pool[arena->blocks_cached] = block; // ← concurrent write, no lock
arena->blocks_cached += 1; // ← read-modify-write, no lock
} else {
free(block.ptr);
}
Race scenario (two threads returning blocks simultaneously):
Thread A: reads blocks_cached = 19 (< blocks_max 20)
Thread B: reads blocks_cached = 19 (same stale value — no lock)
Thread A: writes blocks_pool[19] = block_A, increments to 20
Thread B: writes blocks_pool[19] = block_B (OVERWRITES block_A)
Result: block_A.ptr is permanently lost — malloc'd, never free'd.
Similarly, memory_get_block can hand the same cached block to two threads simultaneously (use-after-free / double-free risk).
The race window is narrow under the GIL (requires concurrent C-level execution during GIL-released operations like resize/encode), making it hard to reproduce in simple tests. Under sustained high-concurrency production workloads with longer GIL-released operations, the race triggers more frequently.
Workaround: Image.core.set_blocks_max(0) disables the cache entirely, eliminating the shared mutable state.
What are your OS, Python and Pillow versions?
- OS: Linux x86_64
- Python: 3.12
- Pillow: 10.4.0
import asyncio
from concurrent.futures import ThreadPoolExecutor
from PIL import Image
# This triggers concurrent access to ImagingDefaultArena from multiple threads.
# The race is probabilistic and more likely under sustained production load
# with longer GIL-released operations (resize, encode).
async def main():
loop = asyncio.get_event_loop()
loop.set_default_executor(ThreadPoolExecutor(max_workers=20))
resolutions = [(512, 512), (1024, 1024), (1536, 1536), (2048, 2048)]
async def process(res):
def _work():
img = Image.new("RGB", res)
img.load()
await asyncio.to_thread(_work)
for _ in range(1000):
await asyncio.gather(*[process(resolutions[i % 4]) for i in range(20)])
asyncio.run(main())
Additional context:
PIL's C extensions release the GIL during image operations, enabling true parallelism in thread pools. However, ImagingDefaultArena is a process-global struct with no mutex or atomic operations protecting its state. When asyncio.to_thread dispatches PIL work to a ThreadPoolExecutor, multiple threads can call memory_get_block / memory_return_block concurrently during the GIL-released portions of operations like Image.resize() or Image.save().
This is particularly relevant for:
- Python's increasing use of
asyncio.to_thread for GIL-releasing C extensions
- The upcoming free-threaded CPython (PEP 703), which will remove the GIL entirely and make this race trivially triggerable
Suggested fixes:
- Add a
pthread_mutex around blocks_cached / blocks_pool access in memory_get_block and memory_return_block
- Or use per-thread arenas (eliminates contention)
- Or document that the block cache is not thread-safe and recommend
set_blocks_max(0) for multi-threaded usage
What did you do?
Used
asyncio.to_thread()/ThreadPoolExecutorto offload PIL Image operations (create, load, resize, encode) to multiple threads concurrently.What did you expect to happen?
Thread-safe access to PIL Image allocation/deallocation, or documentation stating that multi-threaded usage requires
set_blocks_max(0).What actually happened?
ImagingDefaultArenainsrc/libImaging/Storage.cuses shared mutable state (blocks_cached,blocks_pool) without any synchronization. When multiple threads callmemory_get_block/memory_return_blockconcurrently (which happens when PIL's GIL-releasing C operations run in parallel), data races can occur:Race scenario (two threads returning blocks simultaneously):
Similarly,
memory_get_blockcan hand the same cached block to two threads simultaneously (use-after-free / double-free risk).The race window is narrow under the GIL (requires concurrent C-level execution during GIL-released operations like resize/encode), making it hard to reproduce in simple tests. Under sustained high-concurrency production workloads with longer GIL-released operations, the race triggers more frequently.
Workaround:
Image.core.set_blocks_max(0)disables the cache entirely, eliminating the shared mutable state.What are your OS, Python and Pillow versions?
Additional context:
PIL's C extensions release the GIL during image operations, enabling true parallelism in thread pools. However,
ImagingDefaultArenais a process-global struct with no mutex or atomic operations protecting its state. Whenasyncio.to_threaddispatches PIL work to aThreadPoolExecutor, multiple threads can callmemory_get_block/memory_return_blockconcurrently during the GIL-released portions of operations likeImage.resize()orImage.save().This is particularly relevant for:
asyncio.to_threadfor GIL-releasing C extensionsSuggested fixes:
pthread_mutexaroundblocks_cached/blocks_poolaccess inmemory_get_blockandmemory_return_blockset_blocks_max(0)for multi-threaded usage