-
Notifications
You must be signed in to change notification settings - Fork 224
Implement Thompson NFA-based Regular Expressions #1172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
JAi-SATHVIK
wants to merge
119
commits into
fortran-lang:master
Choose a base branch
from
JAi-SATHVIK:regex
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
119 commits
Select commit
Hold shift + click to select a range
dc4aaac
add PCA to public api
JAi-SATHVIK 27599e1
include pca submodule
JAi-SATHVIK d77fb0e
Add PCA module with `pca`, `pca_transform`, and `pca_inverse_transfor…
JAi-SATHVIK 24358d1
add PCA unit test
JAi-SATHVIK 1dd44ad
update end interface statement
JAi-SATHVIK 7f79ef6
update CmakeLists
JAi-SATHVIK 0d2738c
fixed_conflicts
JAi-SATHVIK 20b0e98
update interface
JAi-SATHVIK 654edba
allined with the other linalg function
JAi-SATHVIK b7c2be1
convert to subroutines,updated test
JAi-SATHVIK 63a0a1f
fix errors
JAi-SATHVIK cfbcdee
fixed errors
JAi-SATHVIK db19731
fix PCA BLAS/LAPACK linking
JAi-SATHVIK d9ba548
fix PCA BLAS/LAPACK
JAi-SATHVIK 11902b6
fix: remove xdp/qp from PCA use statements to fix CI builds
JAi-SATHVIK d7f8790
both updated
JAi-SATHVIK f8bbd27
test
JAi-SATHVIK 75db887
modify interfaces for core.
JAi-SATHVIK d72f72c
add stdlib_sorting.fypp in cmakelists.txt
JAi-SATHVIK 44ee2e7
Fix CMakeLists.txt for the addition of stdlib_storting_pca
jvdp1 6d2a4fd
Merge pull request #1 from jvdp1/fix_jai
JAi-SATHVIK b3ea627
Add center_data Helper Subroutine
JAi-SATHVIK 0e94be3
Replace Manual Mean with stdlib mean
JAi-SATHVIK 05d4968
Replace Covariance Loops with BLAS syrk
JAi-SATHVIK d3d1c71
Extract pca_svd_driver and pca_eigh_driver & Updated Main pca Subroutine
JAi-SATHVIK 7b49baa
Merge pull request #2 from JAi-SATHVIK/master-cpy
JAi-SATHVIK 0659b39
optimized for performance and stability
JAi-SATHVIK ac3b0e9
Merge pull request #3 from JAi-SATHVIK/master-cpy
JAi-SATHVIK 4751866
Merge branch 'master-cpy'
JAi-SATHVIK cc21db0
Merge branch 'master' of https://github.com/JAi-SATHVIK/stdlib
JAi-SATHVIK 4ac725c
Cache efficency
JAi-SATHVIK 7348faf
fix issues build issues.
JAi-SATHVIK c58f515
Revert "fix issues build issues."
JAi-SATHVIK c776e8d
use nested do loops
JAi-SATHVIK c47e2b6
resolve compiler errors
JAi-SATHVIK 436a526
fix issue
JAi-SATHVIK 143c211
add PCA to public api
JAi-SATHVIK 17cf473
include pca submodule
JAi-SATHVIK 6d0506d
Add PCA module with `pca`, `pca_transform`, and `pca_inverse_transfor…
JAi-SATHVIK 67c7ddf
add PCA unit test
JAi-SATHVIK 720298c
update end interface statement
JAi-SATHVIK 1c2fc75
update CmakeLists
JAi-SATHVIK c43704c
fixed_conflicts
JAi-SATHVIK 9509dca
update interface
JAi-SATHVIK 36fc211
allined with the other linalg function
JAi-SATHVIK 19f55b6
convert to subroutines,updated test
JAi-SATHVIK 8c4dcd8
fix errors
JAi-SATHVIK 1c97f51
fixed errors
JAi-SATHVIK 2e87b76
fix PCA BLAS/LAPACK linking
JAi-SATHVIK e665dce
fix PCA BLAS/LAPACK
JAi-SATHVIK 1e6cef7
fix: remove xdp/qp from PCA use statements to fix CI builds
JAi-SATHVIK f5f0c60
both updated
JAi-SATHVIK 57b3cc5
test
JAi-SATHVIK f014baf
modify interfaces for core.
JAi-SATHVIK 9dd3212
add stdlib_sorting.fypp in cmakelists.txt
JAi-SATHVIK 202e656
Fix CMakeLists.txt for the addition of stdlib_storting_pca
jvdp1 c61eb79
Add center_data Helper Subroutine
JAi-SATHVIK 6daccc2
Replace Manual Mean with stdlib mean
JAi-SATHVIK 41a3690
Replace Covariance Loops with BLAS syrk
JAi-SATHVIK 074d34e
Extract pca_svd_driver and pca_eigh_driver & Updated Main pca Subroutine
JAi-SATHVIK bcabe8f
optimized for performance and stability
JAi-SATHVIK a769f25
Cache efficency
JAi-SATHVIK 587abf7
fix issues build issues.
JAi-SATHVIK 83fe1d0
Revert "fix issues build issues."
JAi-SATHVIK 5d0c88e
use nested do loops
JAi-SATHVIK 9979449
resolve compiler errors
JAi-SATHVIK b23a670
fix issue
JAi-SATHVIK 6dd0b39
Merge branch 'master' of https://github.com/JAi-SATHVIK/stdlib
JAi-SATHVIK ecbccd1
remove unused BLAS constants to prevent compiler warnings
JAi-SATHVIK cc10e95
remove unused import
JAi-SATHVIK 496d744
remove unused output arrays
JAi-SATHVIK 6c48366
remove unused output arrays
JAi-SATHVIK f1d5182
Merge branch 'master' of https://github.com/JAi-SATHVIK/stdlib
JAi-SATHVIK a837b6b
fix: replace string concatenation with comma args to fix ifx crash
JAi-SATHVIK baf8ff5
Use REAL_KINDS_TYPES
JAi-SATHVIK f931908
Change singular_values
JAi-SATHVIK 53bb939
Remove scale_factor variable
JAi-SATHVIK 3c98dee
fix issues
JAi-SATHVIK cb9a5ea
refactor
JAi-SATHVIK 8c83389
Merge https://github.com/fortran-lang/stdlib
JAi-SATHVIK 3b9b085
remove sort index,lower triangle fill.
JAi-SATHVIK 25e4eab
Fix eigh: use upper_a instead of lower
JAi-SATHVIK 9604ccb
update center data subroutine
JAi-SATHVIK a66aee6
remove elsewhere clause
JAi-SATHVIK 25bf66f
Merge branch 'master' of https://github.com/JAi-SATHVIK/stdlib
JAi-SATHVIK 641243f
update specs
JAi-SATHVIK ef2f624
fix
JAi-SATHVIK 33c0270
,
JAi-SATHVIK 23257e9
update test file
JAi-SATHVIK cb908bf
loop fix
JAi-SATHVIK e2885ae
update documentation
JAi-SATHVIK 775ccc3
add checks
JAi-SATHVIK 1ac24a0
add test
JAi-SATHVIK 3b179e8
add keyword-based transform
JAi-SATHVIK 4d2d7a1
fix interface declarations
JAi-SATHVIK f4c8245
reorder arguments
JAi-SATHVIK b267c33
update docs and tests
JAi-SATHVIK 8d4cd08
Resolve merge conflict in test/stats/CMakeLists.txt and sync local ma…
JAi-SATHVIK fd611a7
Add explicit shape validation and nc < 1 checks to PCA subroutines
JAi-SATHVIK 16f151f
Merge branch 'fortran-lang:master' into master
JAi-SATHVIK 87a1783
Initialize regex module with minimal boilerplate
JAi-SATHVIK b11779b
resolve parsing and bounding checking
JAi-SATHVIK 48e2526
remove use statement, add_subdir
JAi-SATHVIK 629e5bb
rewrote testdrive
JAi-SATHVIK c0df858
Merge branch 'fortran-lang:master' into master
JAi-SATHVIK 4d991c0
fix off-by-one match_start position
JAi-SATHVIK 8e2390f
update doc
JAi-SATHVIK 7b00548
Merge branch 'master' of https://github.com/JAi-SATHVIK/stdlib into r…
JAi-SATHVIK bab5e5e
core engine logic, purity fix
JAi-SATHVIK c58b15d
standalone example (pattern matching)
JAi-SATHVIK 8d27abc
new build config
JAi-SATHVIK 8ed7942
add regex examples
JAi-SATHVIK 29f598b
update docs
JAi-SATHVIK 731b57c
Merge branch 'master' of https://github.com/fortran-lang/stdlib into …
JAi-SATHVIK dbeedce
add strict rules
JAi-SATHVIK 8319cc3
Merge branch 'fortran-lang:master' into regex
JAi-SATHVIK 0cc9abd
refactor utility function
JAi-SATHVIK 544f34a
address review feedback on unit numbers and constants
JAi-SATHVIK d78f89a
regex: add CMake dependency for stdlib_core
JAi-SATHVIK File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,138 @@ | ||
| --- | ||
| title: regex | ||
| --- | ||
|
|
||
| # Regular Expressions | ||
|
|
||
| [TOC] | ||
|
|
||
| ## Overview | ||
|
|
||
| The `stdlib_regex` module provides a pure Fortran regular expression engine | ||
| based on Thompson's NFA (Nondeterministic Finite Automaton) construction. | ||
| It guarantees linear-time matching `O(n × m)` with no backtracking, | ||
| making it safe for use with arbitrary input without risk of catastrophic | ||
| performance degradation. | ||
|
|
||
| ### Supported Syntax | ||
|
|
||
| | Pattern | Description | Example | | ||
| |-------------|--------------------------------------|------------------| | ||
| | `.` | Match any single character | `a.c` → `abc` | | ||
| | `*` | Zero or more of preceding element | `ab*c` → `ac` | | ||
| | `+` | One or more of preceding element | `ab+c` → `abbc` | | ||
| | `?` | Zero or one of preceding element | `colou?r` | | ||
| | `\|` | Alternation | `cat\|dog` | | ||
| | `(` `)` | Grouping | `(ab)+` | | ||
| | `[...]` | Character class | `[a-z]` | | ||
| | `[^...]` | Negated character class | `[^0-9]` | | ||
| | `^` | Start of string anchor | `^foo` | | ||
| | `$` | End of string anchor | `bar$` | | ||
| | `\d` | Digit `[0-9]` | `\d+` | | ||
| | `\w` | Word character `[a-zA-Z0-9_]` | `\w+` | | ||
| | `\s` | Whitespace (space, tab, newline, CR) | `\s+` | | ||
| | `\` | Escape next character | `\.` | | ||
|
|
||
| ## `regex_type` - Regular expression type | ||
|
|
||
| ### Status | ||
|
|
||
| Experimental | ||
|
|
||
| ### Description | ||
|
|
||
| A derived type representing a compiled regular expression. It stores the | ||
| internal NFA state graph produced by `regcomp` and is passed to `regmatch` | ||
| for pattern matching. | ||
|
|
||
| ### Syntax | ||
|
|
||
| ```fortran | ||
| type(regex_type) :: re | ||
| ``` | ||
|
|
||
| ## `regcomp` - Compile a regular expression | ||
|
|
||
| ### Status | ||
|
|
||
| Experimental | ||
|
|
||
| ### Description | ||
|
|
||
| Compiles a regular expression pattern string into a `regex_type` object. | ||
| The compiled object can then be reused for multiple calls to `regmatch` | ||
| without recompilation. | ||
|
|
||
| ### Syntax | ||
|
|
||
| ```fortran | ||
| call [[stdlib_regex(module):regcomp(subroutine)]](re, pattern [, status]) | ||
| ``` | ||
|
|
||
| ### Class | ||
|
|
||
| Subroutine | ||
|
|
||
| ### Arguments | ||
|
|
||
| `re`: Shall be of type `regex_type`. It is an `intent(out)` argument. | ||
| The compiled regular expression. | ||
|
|
||
| `pattern`: Shall be of type `character(len=*)`. It is an `intent(in)` argument. | ||
| The regular expression pattern string to compile. | ||
|
|
||
| `status` (optional): Shall be of type `integer`. It is an `intent(out)` argument. | ||
| Returns 0 on success, or a non-zero value if the pattern is invalid | ||
| (e.g., mismatched parentheses or brackets). | ||
|
|
||
| ### Example | ||
|
|
||
| ```fortran | ||
| {!example/regex/example_regex_regcomp.f90!} | ||
| ``` | ||
|
|
||
| ## `regmatch` - Match a compiled regular expression | ||
|
|
||
| ### Status | ||
|
|
||
| Experimental | ||
|
|
||
| ### Description | ||
|
|
||
| Searches for the first occurrence of the compiled regular expression `re` | ||
| within the input `string`. If a match is found, `is_match` is set to `.true.` | ||
| and the optional `match_start` and `match_end` arguments are set to | ||
| the 1-based start and end positions of the matched substring. | ||
|
|
||
| ### Syntax | ||
|
|
||
| ```fortran | ||
| call [[stdlib_regex(module):regmatch(subroutine)]](re, string, is_match [, match_start, match_end]) | ||
| ``` | ||
|
|
||
| ### Class | ||
|
|
||
| Subroutine | ||
|
|
||
| ### Arguments | ||
|
|
||
| `re`: Shall be of type `regex_type`. It is an `intent(in)` argument. | ||
| A compiled regular expression obtained from `regcomp`. | ||
|
|
||
| `string`: Shall be of type `character(len=*)`. It is an `intent(in)` argument. | ||
| The input string to search for a match. | ||
|
|
||
| `is_match`: Shall be of type `logical`. It is an `intent(out)` argument. | ||
| Set to `.true.` if a match is found, `.false.` otherwise. | ||
|
|
||
| `match_start` (optional): Shall be of type `integer`. It is an `intent(out)` argument. | ||
| The 1-based index of the first character of the match. | ||
|
|
||
| `match_end` (optional): Shall be of type `integer`. It is an `intent(out)` argument. | ||
| The 1-based index of the last character of the match. | ||
|
|
||
| ### Example | ||
|
|
||
| ```fortran | ||
|
jalvesz marked this conversation as resolved.
|
||
| {!example/regex/example_regex_regmatch.f90!} | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| ADD_EXAMPLE(regex_regcomp) | ||
| ADD_EXAMPLE(regex_regmatch) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| program example_regex_regcomp | ||
| use stdlib_regex, only: regex_type, regcomp | ||
| implicit none | ||
| type(regex_type) :: re | ||
| integer :: stat | ||
|
|
||
| call regcomp(re, "(cat|dog)s?", stat) | ||
| if (stat /= 0) error stop "Invalid regex pattern" | ||
| print *, "Pattern compiled successfully." | ||
| end program example_regex_regcomp |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| program example_regex_regmatch | ||
| use stdlib_regex, only: regex_type, regcomp, regmatch | ||
| implicit none | ||
| type(regex_type) :: re | ||
| logical :: found | ||
| integer :: stat, ms, me | ||
|
|
||
| ! Find a sequence of digits | ||
| call regcomp(re, "[0-9]+", stat) | ||
| call regmatch(re, "foo123bar", found, ms, me) | ||
| print "(a,l1,a,i0,a,i0)", "found = ", found, ", ms = ", ms, ", me = ", me | ||
|
|
||
| ! Anchored match | ||
| call regcomp(re, "^hello", stat) | ||
| call regmatch(re, "hello world", found) | ||
| print "(a,l1)", "found = ", found | ||
| call regmatch(re, "say hello", found) | ||
| print "(a,l1)", "found = ", found | ||
|
|
||
| ! Alternation with optional suffix | ||
| call regcomp(re, "(cat|dog)s?", stat) | ||
| call regmatch(re, "I like cats", found, ms, me) | ||
| print "(a,l1,a,i0,a,i0)", "found = ", found, ", ms = ", ms, ", me = ", me | ||
|
|
||
| end program example_regex_regmatch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| set(regex_fppFiles | ||
| ) | ||
|
|
||
| set(regex_cppFiles | ||
| ) | ||
|
|
||
| set(regex_f90Files | ||
| stdlib_regex.f90 | ||
| ) | ||
|
|
||
| configure_stdlib_target(${PROJECT_NAME}_regex regex_f90Files regex_fppFiles regex_cppFiles) | ||
| target_link_libraries(${PROJECT_NAME}_regex PUBLIC ${PROJECT_NAME}_core) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.