GitHub - DavisPL/many-regex: A Regex execution engine that tests a pattern with many different engines

Can some linear-time regex engines be considered harmful? A runtime analysis of linear-time regex engines in the context of production software systems.

Quick Related Links

Introduction

Linear-time Regex engines are considered the gold standard for reducing the risk of Regular Expression Denial of Service (ReDoS) attacks. However, engines that operate in linear-time can in theory still cause harm to software systems if the coefficient of the linear runtime is large enough. We investigate if any linear-time Regex engines found in either literature or libraries can be considered harmful in the context of production software systems, by causing a large enough stall in runtime.

ReDoS Found

Important

ReDoS Vulnerability Found! Use the following one-liner to run it if you have uv installed.

This code should timeout, as it tries to compute an exponentially large Regex.

uv run --with pyre2==0.3.10 python -c "import re2 as pyre2; pyre2.match('^(?=(a+)+b)\\w+$', 'a' * 50)"

You can also run run_pyre2_timeout_simple.py to see proof of concept.

HATRA

I will be submitting to HATRA 2026.

Work to be done:

Create a system to measure the following for a regex and input pair:

memory usage
regex match time
regex compile time (input independent)
AST depth (input independent)
number of loops (input independent)

What is the reality* and max* of the input + regex for top 30 packages in two linear regex engines

reality is defined as the normal situation that this regex will be used in
max is defined as the most extreme case of regex operation permitted by the other code (ex. input lenght truncation)

Included

Python code to run regex patterns against many different Python libraries (main.py, run_pyre2_timeout_simple.py, run_pyre2_timeout10_large.py, test_pyre2_on_36.py, etc.)
C# code to test the default Dotnet Regex library and RE# (with full results)
TypeScript code to test regex libraries under the Bun runtime
Test cases JSON — the standardized ReDoS test cases used across Python, TypeScript, and C#
Graphing tools to interpret and visualize the runtime output (graph.py, graph_scaling.py, graph_resh_results.py, results_table.py, etc.)
JSON result data for each language and timeout setting (py_redos_test_results.json, ts_redos_test_results.json, csharp_redos_test_results.json, scaling tests, and timeout variants)
Images — graphs, tables, and figures referenced throughout this README
A list of datasets for ReDoS

Roadmap

Harmfulness Scale

Libraries Tested

Name	Language	Claimed to be linear
Re	Python	No
Dotnet Regex	C#	No
Regex	Python	Reduces backtracking chance but no guarantee
Rure	Python	Yes "guarantees linear time"
Pyre2	Python	Yes "guarantees linear-time behavior"
RE#	C#	Yes "the main matching algorithm has input-linear complexity both in theory as well as experimentally"
Regolith	JavaScript	Yes "guarantees linear time"
RegExp	Go	Yes "guaranteed to run in time linear"
Regex	Rust	Yes "worst time O(m*nt)"

These libraries were picked after I searched for "linear time regex library python". Re2 was removed from the test because it could not be installed. Similarly, Regexy was archived and out of date, so it too was excluded.

I use Python's default "re" library as a control even though it does not claim to be linear time.

Experiments ToC

Test 1 and 2 were done in just Python

Test 1 -- Scaling Test
Test 2 -- Preliminary Results
Test 3 -- Dotnet & RE# Test
Test 4 -- Check Python, TypeScript (bun runtime), and C# (.NET)

Test 1 -- Scaling Test

Methods

Each Regex pattern was run with an input size of 0 to 30 on all 4 of the tested Regex libraries. Each line represents a different Regex library, the y axis represents time on a log scale with a hard timeout at 2 seconds. The regex patterns where created by asking Claude Sonnet 4.5 for regex patterns that may lead to catastrophic backtracking.

Here is an example of one of the tests where both Regex and Re can be considered harmful.

Here is a list of each test run that links to its corresponding graph.

Results

Name	Language	Claimed to be linear	Found to be harmful	Quantity of harmful results (out of 36)
Re	Python	No	Yes	25
Rure	Python	Yes "guarantees linear time"	No	0
Regex	Python	Reduces backtracking chance but no guarantee	Yes	1
Pyre2	Python	Yes "guarantees linear-time behavior"	No	0

Test 2 -- Preliminary Results

This was the first test I ran where each pattern was run with a single input size. These results are preliminary and were to test if I was using a reasonable method for running regex patterns.

Test 3 -- Dotnet & RE# Test

We run Program.cs with dotnet run. This tests runs 113 tests in both the RE# library and the default Dotnet Regex library. The RE# library has zero cases that can be considered harmful, but 75 cases that can be conspired harmful. Those results are expected, as the Dotnet Regex library does not claim to be linear-time and RE# does claim to be linear.

Included are the full results.

Test 4 -- Check Python, TypeScript (bun runtime), and C# (.NET)

I standardized the tests into a JSON file called test_cases.json and changed how test cases are handled in Python, TS, and C# to use this test case file. I ran each language on these test cases and to get the results py_redos_test_results.json, ts_redos_test_results.json, csharp_redos_test_results.json. I then created results_table.py that produced a few graphs and tables.

A few takeaways:

C# Regex is very vulnerable to ReDoS compared to the other languages, failing in 40 test cases for each 3 of the runs
We did not find evidence that any library that claimed to be linear-time can be considered harmful

Notes

I had an issue installing https://pypi.org/project/re2.

I found a pull request from one of the authors of resharp where they optimize the dotnet regex library dotnet/runtime#102655

The source code for resharp has been moved or removed https://github.com/ieviev/resharp

You can install it from the library website https://www.nuget.org/packages/Resharp

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
csharp		csharp
google-redos-test		google-redos-test
graphing-tools		graphing-tools
images		images
json-data		json-data
misc-results		misc-results
misc-scripts		misc-scripts
python		python
typescript		typescript
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
redos-datasets.md		redos-datasets.md
test_cases.json		test_cases.json
timeout_matrix_run.log		timeout_matrix_run.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Related Links

Introduction

ReDoS Found

HATRA

Included

Roadmap

Harmfulness Scale

Libraries Tested

Experiments ToC

Test 1 -- Scaling Test

Methods

Results

Test 2 -- Preliminary Results

Test 3 -- Dotnet & RE# Test

Test 4 -- Check Python, TypeScript (bun runtime), and C# (.NET)

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quick Related Links

Introduction

ReDoS Found

HATRA

Included

Roadmap

Harmfulness Scale

Libraries Tested

Experiments ToC

Test 1 -- Scaling Test

Methods

Results

Test 2 -- Preliminary Results

Test 3 -- Dotnet & RE# Test

Test 4 -- Check Python, TypeScript (bun runtime), and C# (.NET)

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages