Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
f443989
Added C++ std::regex
HFTrader Jan 4, 2022
18af519
Added compiled results
HFTrader Jan 4, 2022
c9d2f8a
Added boost::regex
HFTrader Jan 5, 2022
f4f1390
Added ctre - compile time regular expressions
HFTrader Jan 5, 2022
9c9b933
Fixed flags and modes
HFTrader Jan 6, 2022
3b15f7f
Included new results in the chart
HFTrader Jan 6, 2022
5c2f756
Merge branch 'rust-leipzig:master' into master
HFTrader Jan 15, 2022
c2369d4
Updated README.md
HFTrader Jan 15, 2022
daa6b26
Merge branch 'master' into master
HFTrader Oct 12, 2022
053c28a
Merge branch 'rust-leipzig:master' into master
HFTrader Oct 12, 2022
75ee3f0
Added spreadsheet generator and updated benchmarks
HFTrader Oct 12, 2022
9eec408
Merge branch 'master' of github.com:HFTrader/regex-performance
HFTrader Oct 13, 2022
ea6fb0a
Building boost minimal from github, added not how to compile with cla…
HFTrader Oct 14, 2022
457501a
Added results IceLake server, added note install Ubuntu
HFTrader Oct 14, 2022
ef2d5d9
Better formatting on spreadsheets
HFTrader Oct 14, 2022
5bb0d2d
Fix project configuration issues
HFTrader Sep 28, 2025
4edbcc8
Add native CPU optimization flags for maximum performance
HFTrader Sep 28, 2025
2b20938
Major enhancements to regex performance benchmarking tool
HFTrader Sep 28, 2025
36b9e0a
Modernize dependency management in vendor/CMakeLists.txt
HFTrader Sep 28, 2025
746d92d
Add timeout functionality and integrate YARA/Hyperscan engines
HFTrader Sep 28, 2025
780009e
Remove vendor Git repositories and update .gitignore
HFTrader Sep 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 27 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,33 @@ vendor/local/*
src/rust/.cargo/config
src/version.h

# Excel temporary files
.~lock.*#
*.tmp

# Sub projects
vendor/pcre2/*
vendor/pcre2/

# Build dependency artifacts
vendor/abseil-cpp/
vendor/boost/
vendor/ctre/
vendor/jansson/
vendor/yara/
vendor/hyperscan/
vendor/oniguruma/
vendor/re2/
vendor/tre/

# Result files and benchmarks
results*.txt
results*.csv
*.xlsx
test_input.txt
titles.md
build_deps.sh

# Screenshots
*.png


6 changes: 3 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

cmake_minimum_required(VERSION 3.0)

project(RegexPeformance C CXX)
project(RegexPerformance C CXX)

if(NOT CMAKE_CXX_STANDARD)
set(CMAKE_CXX_STANDARD 20)
Expand All @@ -15,13 +15,13 @@ endif()

set(CMAKE_EXPORT_COMPILE_COMMANDS ON)

set(GENERAL_C_FLAGS "-march=native -Wall -Wstack-usage=5000 -fdiagnostics-color -pipe -fsigned-char -fno-asynchronous-unwind-tables -fno-stack-protector -Wunused-parameter")
set(GENERAL_C_FLAGS "-march=native -mtune=native -Wall -Wstack-usage=5000 -fdiagnostics-color -pipe -fsigned-char -fno-asynchronous-unwind-tables -fno-stack-protector -Wunused-parameter")

set(CMAKE_C_FLAGS "-std=c11 ${GENERAL_C_FLAGS}" CACHE STRING "additional CFLAGS" FORCE)
set(CMAKE_C_FLAGS_DEBUG "-O0 -g")
set(CMAKE_C_FLAGS_RELEASE "-O3")

set(CMAKE_CXX_FLAGS "-std=c++11 ${GENERAL_C_FLAGS}" CACHE STRING "additional CFLAGS" FORCE)
set(CMAKE_CXX_FLAGS "-std=c++20 ${GENERAL_C_FLAGS}" CACHE STRING "additional CFLAGS" FORCE)
set(CMAKE_CXX_FLAGS_DEBUG "-O0 -g")
set(CMAKE_CXX_FLAGS_RELEASE "-O3")

Expand Down
82 changes: 80 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,15 @@ This tool is based on the work of John Maddock (See his own regex comparison [he
and the sljit project (See their regex comparison [here](http://sljit.sourceforge.net/regex_perf.html)).

## Requirements

### Modern Clang 19.1.6 Toolchain (Recommended)
When using the modern toolchain build, all dependencies are automatically handled by the build script. Just ensure you have access to the toolchain environment:
- Access to `/ssd/hblib-installer/ubuntu-20.04/tools/sourceme.sh`
- CMake 3.24.2 (included in toolchain)
- Clang 19.1.6 (included in toolchain)
- All regex engine dependencies built automatically

### Legacy System Requirements
| dependency | version |
|------------|----------|
| Cmake | >=3.0 |
Expand Down Expand Up @@ -51,6 +60,38 @@ regex crate for defined expressions.
The different engines have different requirements which are not described here.
Please see the related project documentations.

On Ubuntu 20.04 these were necessary installs to get the build done from a stock AWS box
```bash
$ apt install build-essential cmake rustc cargo automake autoconf autopoint autogen \
libtool libprotobuf-dev libprotobuf-c-dev protobuf-compiler ninja-build \
ragel libpcap pcaputils pkg-config libboost-dev flex bison
```

### Modern Clang 19.1.6 Toolchain Build (Recommended)

For optimal performance with the latest toolchain, use the automated build script with the modern Clang 19.1.6 toolchain:

```bash
# Source the modern toolchain environment
source /ssd/hblib-installer/ubuntu-20.04/tools/sourceme.sh

# Clean build from scratch
./build_deps_simple.sh

# Build the main project
mkdir -p build && cd build
CC=clang CXX=clang++ cmake ..
make -j$(nproc)
```

This approach:
- Uses Clang 19.1.6 compiler with LLVM tools
- Builds all dependencies with modern toolchain
- Includes latest RE2 with Abseil dependencies
- Supports all 11 regex engines with optimal performance

### Legacy Build Method

In the case all depencies are fulfilled, just configure and build the cmake based project:

```bash
Expand All @@ -71,6 +112,13 @@ make regex_perf
The test tool calls each engine with a defined set of different regular expression on a given file.
The repository contains a ~16Mbyte large text file (3200.txt) which can be used for measuring.

```bash
# When using modern toolchain, source the environment first
source /ssd/hblib-installer/ubuntu-20.04/tools/sourceme.sh
build/src/regex_perf -f 3200.txt
```

For legacy builds:
```bash
./src/regex_perf -f ./3200.txt
```
Expand Down Expand Up @@ -98,8 +146,38 @@ python3 ../genspreadsheet.py results.csv
It will save an Excel spreadsheet with the name `regex-results-YYYYMMDD-HHMMSS.xlsx` in the current
directory.

## Compiling with clang + libc++

Unfortunately it is not possible to run both standard C++ from GCC/stdlibc++ and clang+libc++ at the
same time, it is just the way that cmake selects a single compiler.

To run with clang+libc++ use the following recipe:
```bash
mkdir build && cd build
cmake \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_EXE_LINKER_FLAGS="-lc++abi -lc++" \
-DCMAKE_CXX_COMPILER=/usr/local/bin/clang++ \
-DCMAKE_C_COMPILER=/usr/local/bin/clang \
-DCMAKE_CXX_FLAGS_INIT="-std=c++20 -stdlib=libc++ -march=native -mtune=native" \
-G Ninja ..
```

## Current Build Status

**Latest Update (2025-09-28)**: Successfully rebuilt from scratch with modern Clang 19.1.6 toolchain
- ✅ All 11 regex engines working: CTRE, Boost, C++ std, PCRE (3 variants), RE2, Oniguruma, TRE, Rust regex (2 variants)
- ✅ Latest RE2 with Abseil dependencies properly linked
- ✅ Performance tests running successfully
- ⚠️ CTRE has known issues with case-insensitive patterns and word boundaries
- 🚀 Best performers: Rust regex, PCRE-JIT, RE2

## Results

These results were obtained in an AMD Threadripper 3960X (Zen2) at 3.8 GHz running Ubuntu 20.04.5 LTS.
These results were obtained in an AMD Threadripper 3960X (Zen2) at 3.8 GHz running Ubuntu 20.04.5 LTS.

![Updated Performance Results](results_threadripper.png "Performance Results")

IceLake Xeon Platinum 8375C @ 2.90GHz (AWS C6i instance) - no mitigations

![Updated Performance Results](results_20221012.png "Performance Results")
![IceLake Server](results_icelake.png "Results Ice Lake")
Loading