-
Notifications
You must be signed in to change notification settings - Fork 28
implement stricter regex checks #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mzuenni
wants to merge
7
commits into
DOMjudge:main
Choose a base branch
from
mzuenni:new-regex
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
cf2de8a
slightly reformat code
mzuenni f6a9b4c
implement stricter regex checks
mzuenni 60fc5da
added a bunch of tests
mzuenni 695d9ee
Add test case that breaks and is required for backward compatibility
eldering 4ea0a50
Small typos and language formatting fixes
eldering 84bc37c
update tests
mzuenni b188d4d
update documentation
mzuenni File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| Checktestdata Regex specification | ||
| ================================= | ||
|
|
||
| A **reg**ular **ex**pression (or regex) can be used to match strings. | ||
| Formally, it describes a set of strings and a string is matched if it is contained in the set. | ||
|
|
||
| Regular expressions can contain both literal and special characters. | ||
| Most literal characters, like `A`, `a`, or `0`, are the simplest regular expressions and they simply match themselves. | ||
| Additionally, more complex regular expressions can be expressed by concatenating simpler regular expressions. | ||
| If *A* and *B* are both regular expressions, then *AB* is also a regular expression. | ||
| In general, if a string *x* matches *A* and another string *y* matches *B*, then the string *xy* matches *AB*. | ||
|
|
||
| Besides the literal characters, there are also the following special characters: `'('`, `')'`, `'{'`, `'}'`, `'['`, `']'`, `'*'`, `'+'`, `'?'`, `'|'`, `'\'`, `'^'`, `'.'`, `'-'`. | ||
| Their meaning is as follows: | ||
|
|
||
| * `.`: this matches any character, including newlines. If you need to match anything except the newline character use `[^\n]` instead. | ||
| * `[]`: indicates a set of characters. | ||
| Inside a set definition: | ||
| * Literal characters can be listed and all of them will be matched, i.e., `[abc]` will match `'a'`, `'b'` as well as `'c'` but not `'abc'`. | ||
| * Ranges can be specified with `-`, for example `[a-z]` will match any lowercase ASCII letter and `[0-9]` will match any digit. | ||
| If `-` is escaped (e.g. `[a\-z]`) or if the character preceding it belongs to another range (e.g. `[a-a-z]`), or if it is the first or last character (e.g. `[-a]` or `[a-]`), it will match a literal `'-'`. | ||
| It is an error if the first character of the range has a higher code point than the last (e.g., `[z-a]`). | ||
| * The complement of a character set is formed if the first character of the set is `^`. | ||
| For example `[^a]` will match anything except `'a'`. | ||
| If `^` is escaped (e.g. `[\^]`) or if it is not the first character (e.g. `[a^]`) it will match a literal `'^'`. | ||
| * `\` can be used to escape a special characters. | ||
| However, most special characters do not need to be escaped. | ||
| Only `'['` and `']'` must be escaped and `'^'` or `'-'` might need to be escaped depending on the position. | ||
| For example both `[\-]` and `[-]` will match a literal `'-'`. | ||
| If `\` is not followed by a special characters it matches a literal `'\'`. | ||
| * It is an error if the character set does not specify any characters (e.g. `[]` or `[^]`). | ||
| * `{m,n}`: causes the resulting regular expression to match from `m` to `n` repetitions of the preceding regular expression. | ||
| Matching is done greedily, i.e., as many repetitions as possible are matched. | ||
| Omitting *m* specifies a lower bound of zero, and omitting *n* specifies an infinite upper bound. | ||
| It is an error if *m* is larger than *n*. | ||
| Both *m* and *n* must be an integer without sign and without leading zeros. | ||
| It is an error if the preceding regular expression is empty or ends with another repetition (e.g. `{1,2}{1,2}`). If you want to do that use `()` (e.g. `({1,2}){1,2}`). | ||
| * `{m}`: is a shorthand for `{m,m}`. | ||
| It is an error to omit `m`. | ||
| * `*`: is a shorthand for `{0,}`. | ||
| * `+`: is a shorthand for `{1,}`. | ||
| * `?`: is a shorthand for `{0,1}`. | ||
| * `|`: can be used to form the union of two regular expressions. | ||
| If *A* and *B* are both regular expressions, then *A|B* is also a regular expression. | ||
| In general, if a string *x* matches *A* or it matches *B*, then it also matches *A|B*. | ||
| Matching is done in *leftmost-first* fashion. | ||
| This means that any match of *A* is preferred over all matches for *B*. | ||
| This means that the checktestdata command `REGEX("p|ps")` will only extract `p` even if the input is `ps`. | ||
| * `(...)`: if *A* is a regular expression then *(A)* is also a regular expression. | ||
| * `\`: escapes the subsequent special character. | ||
| If `\` is not followed by a special character it will match a literal `\` (e.g. `\d` will match `'\d'`). | ||
| Note that checktestdata strings also use `\` to escape characters. | ||
| Therefore, `REGEX("\\*")` becomes the regular expression `\*` and matches a literal `'*'`, not a variable amount of `\`. | ||
|
|
||
| ## Notes | ||
|
|
||
| The regular expression syntax and behaviour is carefully chosen to match a common subset of many modern regular expression definitions and implementations like Perl, Python, JavaScript, Ruby, PHP, Java, C++, Rust, Go, ... | ||
| Advanced features like quantifiers, groups, lookahead, lookbehind, etc. are not supported. | ||
| Shorthands like `\d` or `[:digit:]` are also not supported, use `[0-9]` instead. | ||
|
|
||
| > [!WARNING] | ||
| > Earlier versions of checktestdata used POSIX-like regular expressions with *leftmost-longest* matching and support for `[:digit:]`. | ||
| > This is no longer supported and matching is done *leftmost-first* instead. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,3 @@ | ||
| SET(foo="bar.*") | ||
| STRING(foo) NEWLINE | ||
| REGEX(foo) # Note that '.' also matches newlines and ERE is greedy. | ||
| REGEX(foo) # Note that '.' also matches newlines and is greedy. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| a afJ7bayb |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| # IGNORE GENERATE TESTING | ||
| REGEX("a a") # contains space | ||
| REGEX("[e-h]") # character class | ||
| REGEX("[I-M]") # character class | ||
| REGEX("[5-8]") # character class | ||
| REGEX("[^a]") # character class | ||
| STRING("a") | ||
| REGEX("x?") # optional | ||
| REGEX("y?") # optional | ||
| REGEX("z?") # optional | ||
| STRING("b") | ||
| REGEX(".+") # any including newline |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| 1 | ||
| "[]" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| 1 | ||
| "*" * |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| 1 | ||
| "+" + |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| 1 | ||
| "?" ? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| 1 | ||
| "(" ( |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.