-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
The Parser::parse implementations for numeric types fail to parse strings that contain leading or trailing whitespace.
In practice, this happens quite often when reading data from CSVs or other text-based sources where values may be padded with spaces, tabs, or newline characters. Instead of parsing successfully, these inputs currently return None.
To Reproduce
use arrow_array::types::*;
use arrow_cast::parse::Parser;
// they return None instead of the parsed number
assert_eq!(Float32Type::parse(" 1.5 "), None); // expected Some(1.5)
assert_eq!(Int32Type::parse(" 42 "), None); // expected Some(42)
assert_eq!(Int64Type::parse("\t100\n"), None); // expected Some(100)
assert_eq!(UInt64Type::parse(" 7 "), None); // expected Some(7)Expected behavior
Numeric parsers should ignore leading and trailing whitespace before parsing. For example, " 42 " should parse successfully to Some(42) rather than returning None.
This behavior is consistent with how most data ingestion systems handle text-to-number conversion.
Additional context
The issue originates in arrow-cast/src/parse.rs. The float parsers pass string.as_bytes() directly to lexical_core::parse, and the parser_primitive! macro (used for integers and durations) similarly operates on the input without trimming.
A simple fix would be to call .trim() on the input string before attempting to parse.