Can some linear-time regex engines be considered harmful? A runtime analysis of linear-time regex engines in the context of production software systems.
Linear-time Regex engines are considered the gold standard for reducing the risk of Regular Expression Denial of Service (ReDoS) attacks. However, engines that operate in linear-time can in theory still cause harm to software systems if the coefficient of the linear runtime is large enough. We investigate if any linear-time Regex engines found in either literature or libraries can be considered harmful in the context of production software systems, by causing a large enough stall in runtime.
Important
ReDoS Vulnerability Found! Use the following one-liner to run it if you have uv installed.
This code should timeout, as it tries to compute an exponentially large Regex.
uv run --with pyre2==0.3.10 python -c "import re2 as pyre2; pyre2.match('^(?=(a+)+b)\\w+$', 'a' * 50)"You can also run run_pyre2_timeout_simple.py to see proof of concept.
I will be submitting to HATRA 2026.
Work to be done:
- Create a system to measure the following for a regex and input pair:
- memory usage
- regex match time
- regex compile time (input independent)
- AST depth (input independent)
- number of loops (input independent)
- What is the reality* and max* of the input + regex for top 30 packages in two linear regex engines
- reality is defined as the normal situation that this regex will be used in
- max is defined as the most extreme case of regex operation permitted by the other code (ex. input lenght truncation)
-
Python code to run regex patterns against many different Python libraries (
main.py,run_pyre2_timeout_simple.py,run_pyre2_timeout10_large.py,test_pyre2_on_36.py, etc.) -
C# code to test the default Dotnet Regex library and RE# (with full results)
-
TypeScript code to test regex libraries under the Bun runtime
-
Test cases JSON — the standardized ReDoS test cases used across Python, TypeScript, and C#
-
Graphing tools to interpret and visualize the runtime output (
graph.py,graph_scaling.py,graph_resh_results.py,results_table.py, etc.) -
JSON result data for each language and timeout setting (
py_redos_test_results.json,ts_redos_test_results.json,csharp_redos_test_results.json, scaling tests, and timeout variants) -
Images — graphs, tables, and figures referenced throughout this README
-
A list of datasets for ReDoS
- include Python libraries
- include JavaScript / TypeScript libraries
- include Go libraries
- include Rust libraries
- include Re# and Dotnet library
- Vary input size and not just input pattern
- Make table of Regex libraries
- Collect more regex patterns from literature
- Draft up poster for initial review
- Make ReDoS test cases JSON
- Run scaling on tests in
./python/run_pyre2_timeout10_large.pyto check for exponential behavior
| Name | Language | Claimed to be linear |
|---|---|---|
| Re | Python | No |
| Dotnet Regex | C# | No |
| Regex | Python | Reduces backtracking chance but no guarantee |
| Rure | Python | Yes "guarantees linear time" |
| Pyre2 | Python | Yes "guarantees linear-time behavior" |
| RE# | C# | Yes "the main matching algorithm has input-linear complexity both in theory as well as experimentally" |
| Regolith | JavaScript | Yes "guarantees linear time" |
| RegExp | Go | Yes "guaranteed to run in time linear" |
| Regex | Rust | Yes "worst time O(m*nt)" |
These libraries were picked after I searched for "linear time regex library python". Re2 was removed from the test because it could not be installed. Similarly, Regexy was archived and out of date, so it too was excluded.
I use Python's default "re" library as a control even though it does not claim to be linear time.
Test 1 and 2 were done in just Python
- Test 1 -- Scaling Test
- Test 2 -- Preliminary Results
- Test 3 -- Dotnet & RE# Test
- Test 4 -- Check Python, TypeScript (bun runtime), and C# (.NET)
Each Regex pattern was run with an input size of 0 to 30 on all 4 of the tested Regex libraries. Each line represents a different Regex library, the y axis represents time on a log scale with a hard timeout at 2 seconds. The regex patterns where created by asking Claude Sonnet 4.5 for regex patterns that may lead to catastrophic backtracking.
Here is an example of one of the tests where both Regex and Re can be considered harmful.
Here is a list of each test run that links to its corresponding graph.
- Nested quantifiers (
^(a+)+$) - Nested quantifiers with Kleene star (
^(a*)*$) - Nested quantifiers with mismatch (
^(a+)+b$) - Alternation with overlapping patterns (
^(a|a)*$) - Alternation with prefix overlap (
^(a|ab)*$) - Multiple alternations (
(a|a|a|a|a|b)*c) - Triple nested groups (
^((a+)+)+$) - Nested Kleene star with suffix (
^(a*)*b$) - Nested plus with suffix (
^(a+)*b$) - Email-like pattern (ReDoS)
- Overlapping character classes lowercase (
^([a-z]+)+[A-Z]$) - Overlapping character classes alphanumeric (
^([0-9a-z]+)+[A-Z]$) - Wildcard nested quantifiers (
^(.*)*$) - Wildcard plus nested (
^(.+)+$) - Wildcard with suffix (
^(.*)+b$) - Multiple overlapping quantifiers (
^(a*)+b$) - Optional nested quantifiers (
^(a?)+b$) - Non-greedy nested quantifiers (
^(a*?)*b$) - Word boundary catastrophic (
^(\\w+\\s*)+$) - Word with spaces pattern (
^([\\w]+[\\s]*)*$) - Digit nested plus (
^(\\d+)+$) - Digit nested star (
^([0-9]+)*$) - Complex alternation plus (
^(a+|a+)+$) - Complex alternation star (
^(a*|a*)*$) - Alternation with length variation (
^(aa+|a+)+$) - URL pattern (simplified)
- Whitespace with letters (
^(\\s*a+\\s*)+$) - Whitespace alternation (
^(\\s+|a+)*b$) - Optional group patterns (
^(a+)?b?(a+)?$) - Optional with nested groups (
^(a+b?)+c$) - Character class repetition (
^([a-zA-Z]+)*$) - Alphanumeric with symbol (
^([a-z0-9]+)+[!]$) - Nested alternation simple (
^((a|b)+)+c$) - Nested alternation overlap (
^((a|ab)+)+c$) - Long repeating with suffix (
^(a+b)+c$) - Repeating pattern variation (
^(ab+)+c$)
| Name | Language | Claimed to be linear | Found to be harmful | Quantity of harmful results (out of 36) |
|---|---|---|---|---|
| Re | Python | No | Yes | 25 |
| Rure | Python | Yes "guarantees linear time" | No | 0 |
| Regex | Python | Reduces backtracking chance but no guarantee | Yes | 1 |
| Pyre2 | Python | Yes "guarantees linear-time behavior" | No | 0 |
This was the first test I ran where each pattern was run with a single input size. These results are preliminary and were to test if I was using a reasonable method for running regex patterns.
We run Program.cs with dotnet run. This tests runs 113 tests in both the RE# library and the default Dotnet Regex library. The RE# library has zero cases that can be considered harmful, but 75 cases that can be conspired harmful. Those results are expected, as the Dotnet Regex library does not claim to be linear-time and RE# does claim to be linear.
Included are the full results.
I standardized the tests into a JSON file called test_cases.json and changed how test cases are handled in Python, TS, and C# to use this test case file. I ran each language on these test cases and to get the results py_redos_test_results.json, ts_redos_test_results.json, csharp_redos_test_results.json. I then created results_table.py that produced a few graphs and tables.
A few takeaways:
- C# Regex is very vulnerable to ReDoS compared to the other languages, failing in 40 test cases for each 3 of the runs
- We did not find evidence that any library that claimed to be linear-time can be considered harmful
I had an issue installing https://pypi.org/project/re2.
I found a pull request from one of the authors of resharp where they optimize the dotnet regex library dotnet/runtime#102655
The source code for resharp has been moved or removed https://github.com/ieviev/resharp
You can install it from the library website https://www.nuget.org/packages/Resharp


