Conversation
| parts = template.split(TEMPLATE_EXPRESSION).map! do |part| | ||
| part.start_with?('{') ? ALLOWED_PARAMETER_CHARACTERS : Regexp.escape(part) | ||
| if part.start_with?('{') | ||
| "(?<#{part[1..-2]}>[^/?#]+)" |
There was a problem hiding this comment.
I need a minute to understand this. ⌛
There was a problem hiding this comment.
If it at all helps: https://docs.ruby-lang.org/en/master/MatchData.html#method-i-named_captures
It's to allow the usage of named captures.
"/hello/123".match(/^\/hello\/(?<id>[^/?#]+)$/).named_captures
# => {"id" => "123"}
# vs
m = "/hello/123".match(/^\/hello\/([^/?#]+)$/)
m.captures # => ["123"]
["id"].zip(m.captures).to_h # => {"id" => "123"}So really just allows us to save on the extra work.
There was a problem hiding this comment.
If there is any concern over this, I can pull this change out (and perhaps file into a separate PR for us to assess more thoroughly). Regexes are never fun 😆.
There was a problem hiding this comment.
I like the change to use named_captures, but I don't understand the change from
%r{([^/?#]+)}
to
"(?<#{part[1..-2]}>[^/?#]+)"
I think I have to re-understand my own code here, but I don't have time right now. Can you explain that part?
There was a problem hiding this comment.
Given a template string like
"/users/{userId}/posts/{postId}/comments/{commentId}"The old way would
# First split the template into...
["/users/", "{userId}", "/posts/", "{postId}", "/comments/", "{commentId}"]
# Iterate and replace dynamic path segments...
["/users/", /([^\/?#]+)/, "/posts/", /([^\/?#]+)/, "/comments/", /([^\/?#]+)/]
# Then finally join...
/^\/users\/(?-mix:([^\/?#]+))\/posts\/(?-mix:([^\/?#]+))\/comments\/(?-mix:([^\/?#]+))$/(No idea what that -mix thing is about... something that Ruby is doing). Each matched segment is anonymous, so when you match it against "/users/123/posts/456/comments/789", you just get
["123", "456", "789"]Which required you to zip it with the names of the dynamic path segments.
The new way pretty much does the same thing, it just names each capture group. So
# First split the template into...
["/users/", "{userId}", "/posts/", "{postId}", "/comments/", "{commentId}"]
# Iterate and replace dynamic path segments with named groups...
["/users/", "(?<userId>[^/?#]+)", "/posts/", "(?<postId>[^/?#]+)", "/comments/", "(?<commentId>[^/?#]+)"]
# Then finally join...
/^\/users\/(?<userId>[^\/?#]+)\/posts\/(?<postId>[^\/?#]+)\/comments\/(?<commentId>[^\/?#]+)$/In this case the match gives you
{"userId" => "123", "postId" => "456", "commentId" => "789"}The resulting regex is the same, just with the capture groups named. The "(?<#{part[1..-2]}>[^/?#]+)" is building the same regex, but part[1..-2] is just grabbing the name between {..}, so {id} becomes id.
That said, I think there is an even better way to do this that doesn't require us to split the template and iterate over each part. We basically just need to find and replace {...} segments in the template with the appropriate capture group. I'm gonna try something out to see if it works.
There was a problem hiding this comment.
Ok, so there is another way to build the same regex by substituting dynamic path segments with gsub. Best I could come up with for now is something like
# Matches dynamic segments like {id} (captured in group 1) or static segments like /hello
PATH_SEGMENT = /\{([^{}]+)\}|[^{}]+/
def build_pattern(template)
pattern = template.gsub(PATH_SEGMENT) do
param_name = Regexp.last_match(1)
if param_name
# Dynamic segment like {id} — build a named capture group
"(?<#{param_name}>[^/?#]+)"
else
# Static segment like /hello — escape for use in regex
Regexp.escape(Regexp.last_match(0))
end
end
/^#{pattern}$/
endI am not sure if that is easier to understand, though. The last_match stuff isn't super intuitive.
There is also something like the following that only attempts to match and replace on the dynamic segments with named captured groups. I didn't like the delete('\\\\') being there, but couldn't figure out another way to do it.
ESCAPED_TEMPLATE_EXPRESSION = /\\\{([^{}]+)\\\}/
def build_pattern(template)
pattern = Regexp.escape(template).gsub(ESCAPED_TEMPLATE_EXPRESSION) do
"(?<#{Regexp.last_match(1).delete('\\\\')}>[^/?#]+)"
end
/^#{pattern}$/
endThese all end up producing the same regular expression in the end.
The performance win would be from the resulting regular expression to enable named_captures. build_pattern is only run when the OpanAPI document loads, so how it's built is less important in terms of performance characteristics. We should pick whatever you think is the most readable and easiest to understand.
There was a problem hiding this comment.
Got it! Thanks a lot for the explanation and the attention to detail. I don’t have the capacity to pick one right now. I understand that all options result in the same performance and I have no security concerns about this change. So I would like you to pick one and ping me when you think this is ready to merge. Thank you! 👍🏻👍🏻
There was a problem hiding this comment.
I'd pick the solution that is currently in the pull request. It's the closest to the original implementation, so perhaps it is the easiest one to understand.
Even though performance isn't critical here, I benchmarked the three approaches anyways. Of the three options, the one proposed in this PR is both the fastest and allocates the least amount of memory. If we're looking for a tie breaker, might as well pick the one that performs the best.
|
Thanks. I will merge this until Tuesday. Greetings from Mecklenburg. |
6a0e3da to
c1ba08b
Compare
|
Thank you from Toronto, Ontario, Canada! 😄 |
Introduces a few optimizations for
OpenapiFirst::Router#match.The first one is for
OpenapiFirst::Router#find_path_item. I replaced thefilter_map(builds intermediate array of all matching routes) + min_by (iterates it again) with a single-passeach_value.reducethat tracks the best match as it goes. This eliminates the intermediate array allocation and runs a bit faster. A simple benchmark comparing the old vs new approach directly.The next one was to
PathTemplate#match. I changedbuild_patternto emit named capture groups (ex:(?<id>[^/?#]+)) instead of anonymous ones. This letsmatchcallmatches.named_capturesto get aHashdirectly from the regex engine, instead of@names.zip(values).to_hwhich allocates an intermediate array.Since
@namesappeared to no longer be used for anything meaningful, I also removed that along with the now unusedTEMPLATE_EXPRESSION_NAMEandALLOWED_PARAMETER_CHARACTERSconstants.A simple benchmark comparing the two approaches
The last one was to
FindContent.call. I replacedcontent_type.split(';')[0]andtype.split('/')[0]withindex+ slice(content_type[0, semi], type[0, slash]). Originally, each split allocated an array just to grab the first element. Nowindex/slicecreates only the substring. I had to use the|| type.lengthas a fallback when there's no/in the content type (ex: the was a test for an'unknown'content type). A quick benchmarkPutting it all together, you see an IPS and memory improvement in
Router#find_path_itemFor
Router#match, IPS gets diluted, but there is still a noticeable improvement in memory allocation