-
Notifications
You must be signed in to change notification settings - Fork 92
Closed
Description
Hello, I have once again came to report a double free or corruption (!prev) (SIGAbort)
GDB Shows the following:
#3 0x00007ffff7cc8476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007ffff7cae7f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007ffff7d0f677 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7e61b77 "%s\n") at ../sysdeps/posix/libc_fatal.c:156
#6 0x00007ffff7d26cfc in malloc_printerr (str=str@entry=0x7ffff7e647b0 "double free or corruption (!prev)") at ./malloc/malloc.c:5666
#7 0x00007ffff7d28e7c in _int_free (av=0x7ffff7ea0c80 <main_arena>, p=0x2ac3800, have_lock=<optimized out>) at ./malloc/malloc.c:4591
#8 0x00007ffff7d2b453 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#9 0x00007ffff6bdfa4e in lexbor_free ()
from /workspace/.venv/lib/python3.14/site-packages/selectolax/lexbor.cpython-314-x86_64-linux-gnu.so
#10 0x00007ffff6b96067 in lexbor_mem_chunk_destroy ()
from /workspace/.venv/lib/python3.14/site-packages/selectolax/lexbor.cpython-314-x86_64-linux-gnu.so
#11 0x00007ffff6b960d4 in lexbor_mem_destroy ()
from /workspace/.venv/lib/python3.14/site-packages/selectolax/lexbor.cpython-314-x86_64-linux-gnu.so
#12 0x00007ffff6b96342 in lexbor_mraw_destroy ()
from /workspace/.venv/lib/python3.14/site-packages/selectolax/lexbor.cpython-314-x86_64-linux-gnu.so
#13 0x00007ffff6bb0d81 in lxb_dom_document_destroy ()
--Type <RET> for more, q to quit, c to continue without paging--c
cpython-314-x86_64-linux-gnu.so
#14 0x00007ffff6bc20bc in lxb_html_document_interface_destroy () from /workspace/.venv/lib/python3.14/site-packages/selecto
lax/lexbor.cpython-314-x86_64-linux-gnu.so
#15 0x00007ffff6bf1dcf in __pyx_tp_dealloc_10selectolax_6lexbor_LexborHTMLParser () from /workspace/.venv/lib/python3.14/si
te-packages/selectolax/lexbor.cpython-314-x86_64-linux-gnu.so
#16 0x0000000001800e9f in _Py_Dealloc ()
#17 0x00007ffff6bf1535 in __pyx_tp_dealloc_10selectolax_6lexbor_LexborNode () from /workspace/.venv/lib/python3.14/site-pac
kages/selectolax/lexbor.cpython-314-x86_64-linux-gnu.so
To reproduce, I've provided the html file that causes the crash as well as the following code:
import pathlib
from selectolax.lexbor import LexborHTMLParser
parser = LexborHTMLParser(pathlib.Path("dump.html").read_text(encoding="utf-8"))
STRIP_TAGS = {
"script",
"style",
"noscript",
"form",
"iframe",
"svg",
"button",
"input",
"textarea",
"select",
"option",
}
BOILERPLATE_TAGS = {"nav", "footer", "aside"}
marked_nodes = []
for node in parser.root.traverse(include_text=True):
if node.tag == "-comment" or node.tag in STRIP_TAGS or node.tag in BOILERPLATE_TAGS:
marked_nodes.append(node)
for marked in marked_nodes:
if marked.parent is None:
continue
marked.decompose()
# Unwrap inline styles that add depth but no meaning
parser.unwrap_tags(["span", "font", "center", "small", "big"])
for _ in range(2):
candidates = parser.css("div > div:only-child")
if not candidates:
break
for node in candidates:
if node.parent is None:
continue
node.unwrap()
parser.merge_text_nodes()
parser = LexborHTMLParser(pathlib.Path("dump.html").read_text(encoding="utf-8"))The crash does not occur during the decompose() or unwrap() calls. It occurs immediately when the parser object is garbage collected (or when the script exits).
It appears that decompose() or unwrap() leaves the underlying C-structs in a state that lxb_dom_document_destroy cannot handle. This resembles the memory management changes discussed in Issue #179, where nodes are detached rather than freed immediately. When the document is finally destroyed, the allocator hits a double-free on these zombie nodes.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels