Add cache for signifiant speedup on apply_schema() #97

lor1113 · 2025-12-25T00:06:33Z

Hi,

I have found that for some use cases DocumentNode.apply_schema() can be very very slow, due to the multiple nested loops of comparisons and checks (especially the real culprit here is iter_substitutes which in my sample test run was called 463,201,585 times (for comparison, with the hotfix, it is called a somewhat more reasonable 1,490,801 times)). As a temporary hotfix patch for this I have implemented a cache in apply_schema() that caches the result of getting a matching XsdElement from a XsdGroup and returns it instead of having to run the very slow is_matching() repeatedly.

Test status of patch:

Github actions build+tests = passing
elementpath run_all_tests script (python 3.14, windows WSL2) = 24 errors, all of which related to the locale in some way (locale.Error: unsupported locale setting), unrelated to hotfix (they happen even without it)
elementpath run_w3c_tests script (python 3.14, windows WSL2) = all passing
xmlschema run_all_tests script with hotfix added on installed elementpathmodule (python 3.14, windows WSL2) = all passing
xmlschema run_w3c_tests script with hotfix added on installed elementpathmodule (python 3.14, windows WSL2) = all passing

I have tried multiple times to do more thorough tests with tox. Unfortunately tox does not like me and was constantly spitting out bizarre errors no matter how I tried to reconfigure or change things. So this is the best I can do for now testing wise.

Attached are some images of before and after profile runs (github does not want to let me upload the raw .prof files). For my use case, the speedup is over 100x. For other cases it might be less, or even a tiny slowdown in cases where the cache is never hit, but I would imagine that is exceedingly unlikely, and even in such a case, it should be a very negligible slowdown (compared to >100x speedup in particularly demanding cases)

No hotfix / version 5.0.4:

With hotfix:

I'm sending this as a PR because I think this might be beneficial to other users of elementpath, if for whatever reason you don't want to merge it, I will maintain it as a patch on a private copy. I am also interested in investigating other ways to speedup elementpath (as well as reduce memory footprint) that might involve more comprehensive rewrites as opposed to a quick caching patch like this, but this is a good start for now at least.

brunato · 2025-12-26T14:14:32Z

Hi @lor1113 ,
speed-up tag matching for schema object is one the key-points to improve the overall performance of xmlschema and elementpath in cascade.

The problem in general caching of iter_substitutes() have to consider many situations where the schema is not completely built or extended.

I've tested your double-level caching based on id() of the model group (including ref) and seems to pass all my tests without further errors, so i think that is safe on this local scope of apply_schema() so I'm going to merge and the include in the next version that is arriving soon (maybe this will be a minor version, for Python 3.10+). I will examine if the same approach could be useful for xmlschema in general.

Thank you for the code and analysis

Davide

Lorenzo Curcio added 4 commits December 24, 2025 23:57

added cache for apply_schema() element matching

4347e17

fixed flake8 formatting

7847f67

changed to Union[] syntax for python 3.9 support

f5f92bf

final w3c fix for element match cache

bae08d9

brunato merged commit f3c4db3 into sissaschool:master Dec 26, 2025
19 checks passed

brunato added a commit that referenced this pull request Dec 28, 2025

Merge PR #95 and #97 into develop

378f099

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add cache for signifiant speedup on apply_schema() #97

Add cache for signifiant speedup on apply_schema() #97

Uh oh!

lor1113 commented Dec 25, 2025 •

edited

Loading

Uh oh!

brunato commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add cache for signifiant speedup on apply_schema() #97

Add cache for signifiant speedup on apply_schema() #97

Uh oh!

Conversation

lor1113 commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brunato commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lor1113 commented Dec 25, 2025 •

edited

Loading