Fdr get conf optimization #311

ypicchi-arm · 2024-08-05T13:45:05Z

Speeds up FDR for NEON by vectorizing the loads in get_conf_stride.
Also included the domain mask unflipping. I remember you've done something similar in your experimental FDR branch so I left it as a separate commit if you want to snip it out.

This adds three new CMake options, all defaulting to true, making it possible to opt-out of building parts of Vectorscan that are not essential for deployment of the matching runtime. These new options: - `BUILD_UNIT`: control whether the `unit` directory is included - `BUILD_DOC`: control whether the `doc` directory is included - `BUILD_TOOLS`: control whether the `tools` directory is included

Man pages tend to be preferred in some circles, lets add an option to build the vectorscan documentation that way. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>

The project name in the documentation should probably be updated to reflect that this is vectorscan. Update the copyright too. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>

The generated documentation continues to refer to Hyperscan despite the project now being VectorScan. Lets replace many of the Hyperscan references with Vectorscan. At the same time, lets resync the documentation here with the vectorscan readme. This updates the supported platforms/compilers and build options. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>

Correct the description in the pkgconfig file, but leave the name alone as we want to remain compatible with projects utilizing hyperscan. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>

While fixing the documentation, it was noticed that the hsbench output was still referring to the project as Hyperscan. Lets correct it. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>

Add CMake options for more build granularity

Add man page generation, change man section, update docs to reflect name change, and couple other tweaks

…sheng-implementation-on-arm RFC Enable sheng32/64 for SVE

…add-wider-sheng-implementation-on-arm Revert "RFC Enable sheng32/64 for SVE"

* clang-analyzer-deadcode.DeadStores * clang-analyzer-optin.performance.Padding --------- Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>

script to clang-tidy CI Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>

Various clang-tidy-performance fixes: * noexcept * performance-noexcept-swap * performance * performance-move-const-arg * performance-unnecessary-value-param * performance-inefficient-vector-operation * performance-no-int-to-ptr * add performance * performance-inefficient-string-concatenation * clang-analyzer-deadcode.DeadStores * performance-inefficient-vector-operation * clang-analyzer-core.NullDereference * clang-analyzer-core.UndefinedBinaryOperatorResult * clang-analyzer-core.CallAndMessage --------- Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>

Major refactoring of teddy and teddy_avx2, unrolling macros to C++ templated functions --------- Co-authored-by: G.E <gregory.economou@vectorcamp.gr>

This allows the use of SIMDE library to emulate SSSE3/SSE4.2 instructions on SSE2-only (x86-64-v2) hardware. --------- Co-authored-by: G.E <gregory.economou@vectorcamp.gr> Co-authored-by: Konstantinos Margaritis <konstantinos@vectorcamp.gr>

…ctorCamp#306) * maybe fix the hsbench issue (check_ssse3 again) in sse2/simde env * fix the last failing unit test with fat --------- Co-authored-by: G.E. <gregory.economou@vectorcamp.gr>

* rebar based unit tests * fixing paths --------- Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>

* fixed paths and utf8-lossy=true * revert to maskz (its the bug) * cppcheck fix --------- Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>

By using svmatch on 16 bit lanes with a 8 bit predicate, we end up including an undefined character in the pattern checks. The inactive lane after load contains an undefined value, usually \0. Patterns using \0 as the last character would then match this spurious character, returning a match beyond the buffer's end. The fix checks for such matches and rejects them. Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>

Vectorscan used to reject such pattern because they were being compared to "" and found to be an empty string. We now check the pattern length instead. Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>

Vectorscan requires SSE4.2 as a minimum on x86_64. For Hyperscan this used to be SSSE3. Applications that use the library call hs_valid_platform() to check if the CPU fulfils this minimum requirement. However, when Vectorscan upgraded to SSE4.2, the check was not updated. This leads to the library trying to execute instructions that are not supported, resulting in the application to crash. This might not have been noticed as the CPUs that do not support SSE4.2 are rather old and unlikely to run any load where performance is an issue. However, I believe that the library should not let the application crash. Signed-off-by: Michael Tremer <michael.tremer@ipfire.org>

* Revert "Fix noodle SVE2 off by one bug" This patch was fixing the bug when it happens at the end of the buffer but it wasn't fixing it when we do scanDoubleOnce before the main loop The next patch fix this bug for both case instead This reverts commit 48dd0e5. * Fix noodle spurious match with \0 chars for SVE2 When sve2's noodle process a non full vector (before the main loop or at the end of it), a fake \0 was being parsed, trigerring a match for pattern that ended with \0. This patch fix this. Signed-off-by: Yoan Picchi <yoan.picchi@arm.com> --------- Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>

* supress knownConditionTrueFalse * cppcheck suppress redundantInitialization * cppcheck solve stlcstrStream * cppcheck suppress useStlAlgorithm * cppcheck-suppress derefInvalidIteratorRedundantCheck * cppcheck solvwe constParameterReference * const parameter reference cppcheck * removed wrong fix * cppcheck-suppress memsetClassFloat * cppcheck fix memsetClassFloat * cppcheck fix unsignedLessThanZero * supressing all errors on simde gitmodule * fix typo (unsignedLessThanZero) * fix cppcheck suppress simde gitmodule * cppcheck-suppress unsignedLessThanZero --------- Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>

Revert the code that produced the regression error in VectorCamp#317 Add the regression error to a unit test regressions.cpp along with the rebar tests --------- Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>

An old commit (24ae167) had the side effect of moving cmake defines after they were being used. This patch move them back to be defined before being used. Speed hsbench back up by ~ 0.8% Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>

…ning (VectorCamp#332) * Clang 17+ is more restrictive on rebind<T> on MacOS/Boost, remove warning * More clang/boost warnings on MacOS, disable for now

VectorCamp#333) Fixed out of bounds read in AVX512VBMI version of fdr_exec_fat_teddy (VectorCamp#322) * Replaced the 32 byte read with a properly truncated mapped read * Added a unit test Co-authored-by: Rafał Dowgird <rafal.dowgird@rtbhouse.com>

Multiple AVX512VBMI-related fixes: src/nfa/mcsheng_compile.cpp: No need for an assert here, impl_id can be set to 0 src/nfa/nfa_api_queue.h: Make sure this compiles on both C++ and C src/nfagraph/ng_fuzzy.cpp: Fix compilation error when DEBUG_OUTPUT=on src/runtime.c: Fix crash when data == NULL unit/internal/sheng.cpp: Unit test has to enable AVX512VBMI manually as autodetection does not get trigger, this causes test to fail src/fdr/teddy_fat.cpp: AVX512 loads need to be 64-bit aligned, caused a crash on clang-18

* added static libraries in cmake to fix unit-internal seg fault in freebsd, ppc64le, gcc13 error * Moved gcc13 flags for freebsd-gcc13 in cmake/cflags-ppc64le.make

* Add regression test for double shufti It tests for false positive at vector edges. Signed-off-by: Yoan Picchi <yoan.picchi@arm.com> * Fix double shufti reporting false positives Double shufti used to offset one vector, resulting in losing one character at the end of every vector. This was replaced by a magic value indicating a match. This meant that if the first char of a pattern fell on the last char of a vector, double shufti would assume the second character is present and report a match. This patch fixes it by keeping the previous vector and feeding its data to the new one when we shift it, preventing any loss of data. Signed-off-by: Yoan Picchi <yoan.picchi@arm.com> * vshl() will call the correct implementation * implement missing vshr_512_imm(), simplifies caller x86 code * Fix x86 case, use alignr instead * it's the reverse, the avx512 alignr is incorrect, need to fix * Make shufti's OR reduce size agnostic Signed-off-by: Yoan Picchi <yoan.picchi@arm.com> * Fix test's array size Signed-off-by: Yoan Picchi <yoan.picchi@arm.com> * Fix AVX2/AVX512 alignr implementations and unit tests * Fix Power VSX alignr --------- Signed-off-by: Yoan Picchi <yoan.picchi@arm.com> Co-authored-by: Konstantinos Margaritis <konstantinos@vectorcamp.gr>

Prevents overwriting GNUCC_ARCH with an empty value when parsing output of gcc -Q --help=target. Ensures robustness if detection fails and returns an empty string. Signed-off-by: Ibrahim Kashif <ibrahim.kashif@arm.com>

* Add entry for Changelog * Add new contributors * Bump library version

The domain mask was being flipped, then unfliped, while never using the flipped state. This patch remove this unecessary flipping. Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>

get_conf_stride_1 loads 16 consecutive bytes and apply a mask and shift. We can do that easily in a vectorized way instead. This speeds up fdr by around 5%. get_conf_stride_2 also benefits from it, but with less data, the overhead of vectorisation limit most of the gain. Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>

isildur-g and others added 30 commits March 6, 2024 15:50

formatting in readme

4739c76

more readme format tinkering

f7a4d41

lets not disable warnings

afcbd28

maybe netbsd is more pedantic about this?

523db60

documentation: Add cmake option to build man pages

d9a75dc

Man pages tend to be preferred in some circles, lets add an option to build the vectorscan documentation that way. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>

documentation: Update project name and copyright

2d23d24

The project name in the documentation should probably be updated to reflect that this is vectorscan. Update the copyright too. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>

pkgconfig: Correct library description

0c57b6c

Correct the description in the pkgconfig file, but leave the name alone as we want to remain compatible with projects utilizing hyperscan. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>

hsbench: Update test program output

6bbd482

While fixing the documentation, it was noticed that the hsbench output was still referring to the project as Hyperscan. Lets correct it. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>

Enable sheng32/64 for SVE

f9e254a

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>

Merge pull request VectorCamp#233 from bradlarsen/develop

9db7b52

Add CMake options for more build granularity

Merge pull request VectorCamp#231 from jlinton/develop-add-man-pages

6b45984

Add man page generation, change man section, update docs to reflect name change, and couple other tweaks

incremental improvement in cleanliness

d6d7a96

another small cleanup in readme

226645e

moved HAVE_BUILTIN_POPCOUNT def to cmake

0045a2b

typo fix

d0498f9

shortened freebsd text

b006d7f

whitespace editing in readme

1ea5376

Merge pull request VectorCamp#234 from ypicchi-arm/feature/add-wider-…

c9b3a86

…sheng-implementation-on-arm RFC Enable sheng32/64 for SVE

Revert "RFC Enable sheng32/64 for SVE"

f5412b3

Merge pull request VectorCamp#235 from VectorCamp/revert-234-feature/…

d7fb5f4

…add-wider-sheng-implementation-on-arm Revert "RFC Enable sheng32/64 for SVE"

more verbose instructions for preparing BSD systems.

17c78ff

also added note for CC/CXX vars in fbsd/ppc which are different.

dc371fb

more system prep info for bsd

e239f48

some more bsd detail

e2ce866

also sqlite info for bsd

42653b8

slightly clearer comments in netbsd section

e20ba37

changed color output to csv output

50a62a1

removed color output code

b5a2915

gtsoul-tech and others added 25 commits June 10, 2024 10:08

Bug fix/clang tidy warnings part3 (VectorCamp#298)

a68845c

* clang-analyzer-deadcode.DeadStores * clang-analyzer-optin.performance.Padding --------- Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>

Script for the clang-tidy CI (VectorCamp#299)

0e0c9f8

script to clang-tidy CI Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>

Teddy macros unrolling - initial PR to test in CI (VectorCamp#294)

aa832db

Major refactoring of teddy and teddy_avx2, unrolling macros to C++ templated functions --------- Co-authored-by: G.E <gregory.economou@vectorcamp.gr>

maybe fix the hsbench issue (check_ssse3 again) in sse2/simde env (Ve…

dd43c86

…ctorCamp#306) * maybe fix the hsbench issue (check_ssse3 again) in sse2/simde env * fix the last failing unit test with fat --------- Co-authored-by: G.E. <gregory.economou@vectorcamp.gr>

Rebar based Unit tests (VectorCamp#305)

1dc0600

* rebar based unit tests * fixing paths --------- Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>

Bug fix/rebar tests (VectorCamp#307)

6c8e33e

* fixed paths and utf8-lossy=true * revert to maskz (its the bug) * cppcheck fix --------- Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>

Make vectorscan accept \0 starting pattern (VectorCamp#312)

e4c49f2

Vectorscan used to reject such pattern because they were being compared to "" and found to be an empty string. We now check the pattern length instead. Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>

Fix regression error VectorCamp#317 and add unit test (VectorCamp#318)

4f09e78

Revert the code that produced the regression error in VectorCamp#317 Add the regression error to a unit test regressions.cpp along with the rebar tests --------- Co-authored-by: gtsoul-tech <gtsoulkanakis@gmail.com>

Fix typo in build instructions (VectorCamp#315)

4951b61

Clang 17+ is more restrictive on rebind<T> on MacOS/Boost, remove war…

5e62255

…ning (VectorCamp#332) * Clang 17+ is more restrictive on rebind<T> on MacOS/Boost, remove warning * More clang/boost warnings on MacOS, disable for now

partial_load_u64 will fail if buf == NULL/c_len == 0 (VectorCamp#331)

7e0503c

Various cppcheck fixes (VectorCamp#337)

689556d

Fix/fbsd gcc13 error (VectorCamp#338)

c057c7f

* added static libraries in cmake to fix unit-internal seg fault in freebsd, ppc64le, gcc13 error * Moved gcc13 flags for freebsd-gcc13 in cmake/cflags-ppc64le.make

cmake - guard against failed GNUCC_ARCH extraction (VectorCamp#339)

5515fbb

Prevents overwriting GNUCC_ARCH with an empty value when parsing output of gcc -Q --help=target. Ensures robustness if detection fails and returns an empty string. Signed-off-by: Ibrahim Kashif <ibrahim.kashif@arm.com>

Feature/prepare 5.4.12 (VectorCamp#340)

22b76d1

* Add entry for Changelog * Add new contributors * Bump library version

ypicchi-arm force-pushed the fdr_get_conf_opti branch from b1dea77 to 54ee8a1 Compare September 3, 2025 11:34

ypicchi-arm added 2 commits September 3, 2025 16:25

FDR unflip the domain mask

51858f5

The domain mask was being flipped, then unfliped, while never using the flipped state. This patch remove this unecessary flipping. Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>

ypicchi-arm force-pushed the fdr_get_conf_opti branch from 54ee8a1 to 7bcda07 Compare September 4, 2025 12:30

markos force-pushed the develop branch from 3a70ed4 to eaa8f91 Compare October 29, 2025 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fdr get conf optimization #311

Fdr get conf optimization #311

Uh oh!

ypicchi-arm commented Aug 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Fdr get conf optimization #311

Are you sure you want to change the base?

Fdr get conf optimization #311

Uh oh!

Conversation

ypicchi-arm commented Aug 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants