Skip to content

perf: release the GIL during parsing#91

Merged
kurok merged 1 commit into
masterfrom
perf/release-gil
Jun 12, 2026
Merged

perf: release the GIL during parsing#91
kurok merged 1 commit into
masterfrom
perf/release-gil

Conversation

@kurok

@kurok kurok commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

What

parse_email held the GIL for the entire call, but the parse itself (mailparse + base64/charset decoding) is pure Rust and never touches the Python interpreter. Holding the GIL serializes every concurrent parse_email call onto a single core.

This wraps the parse in py.detach() (PyO3 0.29's rename of allow_threads) so the GIL is released for its duration. The byte payload is already an owned copy (payload_to_bytes), so nothing borrows from a Python object while the GIL is released; the ParseError and PyMail are built after re-attaching, where the interpreter is needed.

Benchmark

Single-thread latency — unchanged (min 1.369 ms vs 1.369 ms baseline).

Multi-threaded throughput — large_message.eml, N threads × 300 parses, best-of-3, 12-core machine:

threads before (GIL held) after (GIL released) speedup
1 645/s 646/s 1.0×
2 635/s 1247/s 1.96×
4 637/s 2405/s 3.78×
8 636/s 4691/s 7.37×

Baseline throughput is flat regardless of thread count (the GIL serializes parsing); after this change it scales near-linearly with cores. This is the win that matters for any service parsing many emails concurrently.

Risk

  • No behavior change; all 91 correctness tests pass, cargo clippy --release clean.
  • detach requires the closure + return type be Ungil (i.e. Send): Vec<u8>, mail_parser::Mail, and MailParseError are all Send, and no Py/Python value crosses the boundary, so this is enforced at compile time.
  • Note: this keeps the existing one-time input copy (required so the buffer is owned and Send across the GIL release). A zero-copy alternative would remove that copy but is mutually exclusive with releasing the GIL; for concurrent/server workloads, GIL release is the larger win.

parse_email holds the GIL for the entire call, but the actual parse is pure
Rust and never touches the Python interpreter. Holding the GIL serializes all
concurrent parse_email calls onto a single core.

Wrap the parse in py.detach() (formerly allow_threads) so the GIL is released
for its duration. The byte payload is already an owned copy, so nothing borrows
from a Python object while the GIL is released; errors and the PyMail are built
after re-attaching, where the interpreter is required.

Single-thread latency is unchanged. Multi-threaded throughput on a 12-core box
(large_message.eml, best-of-3) goes from flat to near-linear:

  threads   before     after
        1    645/s      646/s
        2    635/s     1247/s
        4    637/s     2405/s
        8    636/s     4691/s  (7.4x)

No behavior change; all 91 correctness tests pass.

Signed-off-by: yuriyryabikov <22548029+kurok@users.noreply.github.com>
@kurok kurok merged commit d4379d4 into master Jun 12, 2026
7 checks passed
@kurok kurok deleted the perf/release-gil branch June 12, 2026 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant