Skip to content

perf: add SIMD-accelerated UTF-8 validation to core arrow crates#9495

Draft
lyang24 wants to merge 1 commit intoapache:mainfrom
lyang24:simdutf8
Draft

perf: add SIMD-accelerated UTF-8 validation to core arrow crates#9495
lyang24 wants to merge 1 commit intoapache:mainfrom
lyang24:simdutf8

Conversation

@lyang24
Copy link
Contributor

@lyang24 lyang24 commented Mar 1, 2026

Which issue does this PR close?

Rationale for this change

Add simdutf8 for fast UTF-8 validation in arrow-data, arrow-array, arrow-row, and arrow-csv. A shared check_utf8() utility in arrow-data uses SIMD on the happy path and falls back to std::str::from_utf8 on error for detailed Utf8Error. The feature is default-enabled in the arrow umbrella crate.

What changes are included in this PR?

simd impl of utf8 instead of the standard lib method

Are these changes tested?

all tests passed

Are there any user-facing changes?

no

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants