Skip to content

projected_schema segfaults on a Vortex scanner #5454

@paultiq

Description

@paultiq

Describe the bug

Using projected_schema() on a vortex scanner will segfault:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fff96ab2d3c in __pyx_getprop_7pyarrow_8_dataset_7Scanner_projected_schema(_object*, void*) [clone .lto_priv.0] () from /home/extra/git/goodenv/.venv/lib/python3.13/site-packages/pyarrow/_dataset.cpython-313-x86_64-linux-gnu.so

Discovered in duckdb/duckdb-python#187

To Reproduce

MRE:

import vortex as vx

vx.io.write(vx.array([{"col1": "a string"}]), 'foo.vortex')
x = vx.open('foo.vortex').to_dataset().scanner().projected_schema

Expected behavior

Expected it to behave like:

from pyarrow import parquet as pq
from pyarrow import dataset as ds

x = ds.dataset(pq.read_table("foo.parquet")).scanner().projected_schema

Which returns the schema.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions