-
Notifications
You must be signed in to change notification settings - Fork 125
Optimize matchesFilenamePattern #875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| case error | ||
| } | ||
|
|
||
| private extension StringProtocol { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was unused after my other changes
| /// - returns: `true` if the pattern matches the input, `false` otherwise. | ||
| /// | ||
| /// - note: On Windows and when using the ``FnmatchOptions/pathname`` option, both separators (`/` and `\`) are recognized (see note on pattern parameter). | ||
| public func fnmatch(pattern: some StringProtocol, input: some StringProtocol, options: FnmatchOptions = .default, pathSeparators: Set<Character> = Path.pathSeparators) throws |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some Collection lets us pass in non-Set types efficiently. Having a whole separate allocation to store 1-2 items is wasteful.
Defaulting to nil (sorry about the weird spelling of nil, it couldn't infer the type properly otherwise) lets us use the new default behavior that avoids lookups entirely.
| input = input.dropFirst() | ||
| case "*": | ||
| var p = pattern.first | ||
| while pattern.first == "*" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"*" is always a single grapheme, so using utf8 will get us the same results as characters
| if p == nil { | ||
| if options.contains(.pathname) { | ||
| // make sure input does not have any more path separators | ||
| return input.firstIndex(matching: pathSeparators) == nil ? true : false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really wanted to delete the pathSeparators parameter entirely, but it's used for testing Windows on non-Windows hosts
| bt_pattern = pattern | ||
| bt_input = input | ||
| case "[": | ||
| guard let _sc = input.first else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reordering these checks lets us do a quick utf8 check without having to construct a Character (since those are String under the hood)
| } | ||
| #endif | ||
|
|
||
| @inline(__always) public static func isPathSeparator(_ char: Character, separators: (some Collection<Character>)?) -> Bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, replacing set lookups with integer comparisons
| var isInPathComponent = false | ||
| var nextCharacterIsEscaped = false | ||
| for idx in pattern.indices { | ||
| for byte in pattern.utf8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything we're comparing this to is a single grapheme, so we can skip grapheme semantics here
| for byte in pattern.utf8 { | ||
| // Skip over path separators, unless they're escaped. | ||
| if pattern[idx] == Path.pathSeparator { | ||
| //TODO: should this (and other similar uses) be Path.isUTF8PathSeparator(byte) instead for Windows? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like feedback on this please. To my eye it looks like a spot where Windows would want us to check both path separators, but I wasn't sure, and the old code didn't do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to say. Generally you wouldn't ever want to match \ without also looking for /, but I think \ is also used as an escape sequence within the pattern here, so I'm not sure...
| } | ||
|
|
||
| // Create a string from the first index we found to the end of the path. | ||
| let trailingSubpath = String(self.str[first..<self.str.endIndex]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just got converted back to a Substring immediately in the callee, no reason not to leave it as one to begin with.
|
|
||
| @Test(arguments: [true, false]) | ||
| func pathnameMatch(isWindows: Bool) throws { | ||
| let separators = isWindows ? Set("\\/") : Set([Character("/")]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set("\/") looks so much like a Set containing one two-character-long String that I got confused. The thing it's passed to is generic now, so let's save the next person the wasted time.
|
@swift-ci please test |
|
Not sure what's up with the Windows failure but the rest of this lgtm generally |
| } | ||
| @inline(__always) public static func firstPathSeparatorIndex(in str: some StringProtocol, separators: (some Collection<Character>)?) -> String.Index? { | ||
| guard let separators else { | ||
| return str.utf8.firstIndex(where: Path.isUTF8PathSeparator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be:
| return str.utf8.firstIndex(where: Path.isUTF8PathSeparator) | |
| return str.utf8.firstIndex(where: { Path.isUTF8PathSeparator($0, separators) }) |
https://github.com/swiftlang/swift-build/pull/875/files#r2478828056 |
A trace I took today shows matchesFilenamePattern spending a surprising amount of time doing some things that don't seem necessary. I haven't verified the speedup yet (figuring out how to get it to run the specific code path I sampled is proving a bit tricky), but this should be a noticeable improvement. I'll annotate the changes inline with what they're for.