Add cache for signifiant speedup on apply_schema() #97
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi,
I have found that for some use cases
DocumentNode.apply_schema()can be very very slow, due to the multiple nested loops of comparisons and checks (especially the real culprit here isiter_substituteswhich in my sample test run was called463,201,585times (for comparison, with the hotfix, it is called a somewhat more reasonable1,490,801times)). As a temporary hotfix patch for this I have implemented a cache in apply_schema() that caches the result of getting a matchingXsdElementfrom aXsdGroupand returns it instead of having to run the very slowis_matching()repeatedly.Test status of patch:
elementpathrun_all_tests script (python 3.14, windows WSL2) = 24 errors, all of which related to the locale in some way (locale.Error: unsupported locale setting), unrelated to hotfix (they happen even without it)elementpathrun_w3c_tests script (python 3.14, windows WSL2) = all passingxmlschemarun_all_tests script with hotfix added on installedelementpathmodule (python 3.14, windows WSL2) = all passingxmlschemarun_w3c_tests script with hotfix added on installedelementpathmodule (python 3.14, windows WSL2) = all passingI have tried multiple times to do more thorough tests with tox. Unfortunately tox does not like me and was constantly spitting out bizarre errors no matter how I tried to reconfigure or change things. So this is the best I can do for now testing wise.
Attached are some images of before and after profile runs (github does not want to let me upload the raw .prof files). For my use case, the speedup is over 100x. For other cases it might be less, or even a tiny slowdown in cases where the cache is never hit, but I would imagine that is exceedingly unlikely, and even in such a case, it should be a very negligible slowdown (compared to >100x speedup in particularly demanding cases)
No hotfix / version 5.0.4:

With hotfix:

I'm sending this as a PR because I think this might be beneficial to other users of
elementpath, if for whatever reason you don't want to merge it, I will maintain it as a patch on a private copy. I am also interested in investigating other ways to speedupelementpath(as well as reduce memory footprint) that might involve more comprehensive rewrites as opposed to a quick caching patch like this, but this is a good start for now at least.