Skip to content

arrow-cast numeric parsers fail to parse whitespace-padded strings #9538

@aryan-212

Description

@aryan-212

Describe the bug

The Parser::parse implementations for numeric types fail to parse strings that contain leading or trailing whitespace.

In practice, this happens quite often when reading data from CSVs or other text-based sources where values may be padded with spaces, tabs, or newline characters. Instead of parsing successfully, these inputs currently return None.

To Reproduce

use arrow_array::types::*;
use arrow_cast::parse::Parser;

// they return None instead of the parsed number
assert_eq!(Float32Type::parse(" 1.5 "), None);   // expected Some(1.5)
assert_eq!(Int32Type::parse(" 42 "), None);      // expected Some(42)
assert_eq!(Int64Type::parse("\t100\n"), None);   // expected Some(100)
assert_eq!(UInt64Type::parse(" 7 "), None);      // expected Some(7)

Expected behavior

Numeric parsers should ignore leading and trailing whitespace before parsing. For example, " 42 " should parse successfully to Some(42) rather than returning None.

This behavior is consistent with how most data ingestion systems handle text-to-number conversion.

Additional context

The issue originates in arrow-cast/src/parse.rs. The float parsers pass string.as_bytes() directly to lexical_core::parse, and the parser_primitive! macro (used for integers and durations) similarly operates on the input without trimming.

A simple fix would be to call .trim() on the input string before attempting to parse.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions