perf: release the GIL during parsing by kurok · Pull Request #91 · namecheap/fast_mail_parser

kurok · 2026-06-12T22:50:00Z

What

parse_email held the GIL for the entire call, but the parse itself (mailparse + base64/charset decoding) is pure Rust and never touches the Python interpreter. Holding the GIL serializes every concurrent parse_email call onto a single core.

This wraps the parse in py.detach() (PyO3 0.29's rename of allow_threads) so the GIL is released for its duration. The byte payload is already an owned copy (payload_to_bytes), so nothing borrows from a Python object while the GIL is released; the ParseError and PyMail are built after re-attaching, where the interpreter is needed.

Benchmark

Single-thread latency — unchanged (min 1.369 ms vs 1.369 ms baseline).

Multi-threaded throughput — large_message.eml, N threads × 300 parses, best-of-3, 12-core machine:

threads	before (GIL held)	after (GIL released)	speedup
1	645/s	646/s	1.0×
2	635/s	1247/s	1.96×
4	637/s	2405/s	3.78×
8	636/s	4691/s	7.37×

Baseline throughput is flat regardless of thread count (the GIL serializes parsing); after this change it scales near-linearly with cores. This is the win that matters for any service parsing many emails concurrently.

Risk

No behavior change; all 91 correctness tests pass, cargo clippy --release clean.
detach requires the closure + return type be Ungil (i.e. Send): Vec<u8>, mail_parser::Mail, and MailParseError are all Send, and no Py/Python value crosses the boundary, so this is enforced at compile time.
Note: this keeps the existing one-time input copy (required so the buffer is owned and Send across the GIL release). A zero-copy alternative would remove that copy but is mutually exclusive with releasing the GIL; for concurrent/server workloads, GIL release is the larger win.

parse_email holds the GIL for the entire call, but the actual parse is pure Rust and never touches the Python interpreter. Holding the GIL serializes all concurrent parse_email calls onto a single core. Wrap the parse in py.detach() (formerly allow_threads) so the GIL is released for its duration. The byte payload is already an owned copy, so nothing borrows from a Python object while the GIL is released; errors and the PyMail are built after re-attaching, where the interpreter is required. Single-thread latency is unchanged. Multi-threaded throughput on a 12-core box (large_message.eml, best-of-3) goes from flat to near-linear: threads before after 1 645/s 646/s 2 635/s 1247/s 4 637/s 2405/s 8 636/s 4691/s (7.4x) No behavior change; all 91 correctness tests pass. Signed-off-by: yuriyryabikov <22548029+kurok@users.noreply.github.com>

kurok merged commit d4379d4 into master Jun 12, 2026
7 checks passed

kurok deleted the perf/release-gil branch June 12, 2026 22:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: release the GIL during parsing#91

perf: release the GIL during parsing#91
kurok merged 1 commit into
masterfrom
perf/release-gil

kurok commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kurok commented Jun 12, 2026

What

Benchmark

Risk

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant