-
Notifications
You must be signed in to change notification settings - Fork 8
Smart string usage
Smart string is a private str subclass documented in
return types
of XPath evaluation result. Directly quoting from lxml documentation:
XPath string results are 'smart' in that they provide a
getparent()method that knows their origin:
- for attribute values,
result.getparent()returns the Element that carries them. An example is//foo/@attribute, where the parent would be a foo Element.- for the
text()function (as in//text()), it returns the Element that contains the text or tail that was returned.
The actual class is named
_ElementUnicodeResult
in source code. Although for Python 2.x and PyPy this str subclass
represents some other concrete classes, we can forget them as far as
type checking is concerned.
Following are breaking changes since 2023.2.11.
Historically the class is named SmartStr in annotation
package, which is more user friendly but need to be
imported manually for typing. Being underused, it is
decided to break compatibility and revert to concrete
class name (_ElementUnicodeResult) instead.
Because getparent() method needs to known original
element type, smart string is modified as a Generic class,
containing the element type as subscript, as in
_ElementUnicodeResult[_Element].
| Version | Usage |
|---|---|
2023.02.11 or earlier |
SmartStr |
| Afterwards | _ElementUnicodeResult[_Element] |
There are 2 occasions where this class is primarily useful. See further down for examples of both types of usage.
-
XPathselection result -
HtmlElement.text_content()result (which usesXPathinternally)
However this class is almost never used directly in type annotation,
since XPath result is too versatile to be annotated (str, float,
bool, list of them, as well as list of _Element and namespace tuples).
Users are therefore expected to narrow down XPath selection result themselves. First example code below shows how to handle smart strings in selection result.
from lxml.etree import parse, _ElementUnicodeResult, _Element
from typing import TypeIs # (or from typing_extensions)
def is_smart_str(s: str) -> TypeIs[_ElementUnicodeResult[_Element]]:
return hasattr(s, 'getparent')
tree = parse(<...some html file...>)
for result in tree.xpath('//div/span/text()'):
if is_smart_str(result):
# At this point,
# result -> _ElementUnicodeResult[_Element],
# parent -> Optional[_Element]
parent = result.getparent()
if parent is not None:
print(parent.tag) # 'span'