bug: read_parquet with Polars engine downgrades to pyarrow.dataset when given multiple paths

### What happened?

Description
When using the Polars backend, calling read_parquet with multiple file paths (a list of specific Parquet files, not a partitioned directory) unexpectedly falls back to using pyarrow.dataset instead of the native Polars scan_parquet.

This behavior is misleading and unnecessary for non-partitioned use cases.
Modern Polars versions support reading multiple Parquet files directly via scan_parquet, so the current fallback causes degraded performance and confusing behavior.

⸻

🔍 Reproduction

import ibis

con = ibis.polars.connect()
t = con.read_parquet([
    "s3://bucket/path/file1.parquet",
    "s3://bucket/path/file2.parquet"
])

Expected:
Use Polars’ native scan_parquet (which supports lists and globs).

Actual:
Automatically switches to pyarrow.dataset → uses pl.scan_pyarrow_dataset, which performs worse and behaves differently.

⸻

🧠 Root Cause & Background

Looking at the Ibis implementation of read_parquet for the Polars engine:

if not isinstance(path, (str, Path)) and len(path) > 1:
    self._import_pyarrow()
    import pyarrow.dataset as ds

    path = [normalize_filename(p) for p in path]
    obj = pl.scan_pyarrow_dataset(
        source=ds.dataset(path, format="parquet"), **kwargs
    )
    self._add_table(table_name, obj)
else:
    path = normalize_filename(path)
    self._add_table(table_name, pl.scan_parquet(path, **kwargs))

This fallback seems to have originated from Polars’ own recommendation in their documentation:

“Partitioned files: If you have a directory-nested (hive-style) partitioned dataset, you should use the scan_pyarrow_dataset() method to read that data instead.”
— [Polars Docs: scan_parquet](https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.scan_parquet.html)

However, in my case, I’m not using a partitioned directory — just providing a list of specific S3 file paths such as:

["s3://bucket/file1.parquet", "s3://bucket/file2.parquet"]

Polars handles this natively and efficiently using scan_parquet, without requiring PyArrow.

Therefore, this fallback penalizes valid, non-partitioned use cases by:
	•	Introducing an unnecessary PyArrow dependency and performance cost
	•	Misleading users who expect Ibis + Polars to behave like Polars itself
	•	Making it harder to apply Polars-native configuration or optimizations

⸻

💡 Proposed Solution

Allow users to opt out of the pyarrow.dataset fallback (or remove it entirely), for example by adding a flag like:

t = con.read_parquet([...], use_pyarrow_dataset=False)

or by detecting whether the provided paths represent a true partitioned dataset.

If the input is just a list of standalone file paths (not a directory or hive-style layout), scan_parquet should be used.

⸻

✅ Expected Behavior

read_parquet([...]) should use Polars’ native scan_parquet implementation for any list of paths that isn’t a partitioned directory.
⸻
🙋‍♂️ Personal Note

I’d be more than happy to open a Pull Request with a proposed solution — either adding an optional use_pyarrow_dataset parameter or improving the path-handling logic to better align with current Polars capabilities.

⸻
📎 Additional Context

Polars docs explicitly mention scan_pyarrow_dataset() only for partitioned datasets, not for general multi-file reads.
Modern Polars versions support reading multiple paths natively and efficiently with scan_parquet.

Removing or making the fallback configurable would make the Ibis–Polars integration more intuitive and performant.


### What version of ibis are you using?

11.0.0 (or any)

### What backend(s) are you using, if any?

Polars

### Relevant log output

```sh

```

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bug: read_parquet with Polars engine downgrades to pyarrow.dataset when given multiple paths #11705

What happened?

What version of ibis are you using?

What backend(s) are you using, if any?

Relevant log output

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

bug: read_parquet with Polars engine downgrades to pyarrow.dataset when given multiple paths #11705

Description

What happened?

What version of ibis are you using?

What backend(s) are you using, if any?

Relevant log output

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions