Skip to content

Use @cached_property and functools.cache to avoid repeated work #195

@pygarap

Description

@pygarap

I would like to propose a small refactor to use the standard library caching helpers in places where the same work is done many times on the same objects.

selectolax often exposes:

  • properties that compute values based on the underlying DOM node
  • methods that walk the tree or compute derived data from the same arguments on the same object

These values usually do not change for the lifetime of a given node or parser instance, but they may be accessed repeatedly. Using the built in caching tools can avoid repeated work and reduce overhead.

Proposal

  1. Use @functools.cached_property for suitable properties

    For properties that:

    • are computed from the current internal state once
    • do not change for the lifetime of the object
    • are reasonably expensive (they do more than a trivial constant time field access)

    wrap them with @functools.cached_property so that the value is computed once per instance and then reused.

    This is especially useful for properties that walk the DOM, wrap C structures, or build Python level objects from C data.

  2. Use functools.cache for suitable methods

    For methods that:

    • depend only on self and their explicit arguments
    • have no side effects
    • are likely to be called multiple times with the same arguments on the same object

    decorate them with @functools.cache, or refactor the core logic into a helper that is wrapped with @functools.cache.

    This gives simple memoization without needing a custom caching layer.

Notes and constraints

  • Caching should be added only where the underlying node or parser is effectively immutable or where cached results cannot become wrong if the tree changes.
  • If there are places where the tree can be mutated after creation, we should either skip caching for those properties or document clearly that cached results are not updated after mutation.

This refactor would keep the public API unchanged, while improving performance and reducing repeated work in high level Python code that wraps the C layer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions