fix: Prevent String to TimestampNTZ cast from incorrectly adding UTC timezone metadata#3710
fix: Prevent String to TimestampNTZ cast from incorrectly adding UTC timezone metadata#37100lai0 wants to merge 6 commits intoapache:mainfrom
Conversation
|
Thanks @0lai0. Could you also add a SQL file test under Also, could you add a test where the input string includes an explicit timezone offset (like "2020-01-01T12:34:56+05:00")? |
|
Thanks @andygrove for the feedback. PR updated. |
|
I noticed Spark strips timezone offsets for The WITH TZ path is untouched — it still doesn't handle offsets natively, so the third SQL query excludes the offset string to avoid a mismatch there. |
|
Local build test success. |
…timestamp conversion
Which issue does this PR close?
Closes #3179
Rationale for this change
Previously, the code used a catch-all pattern DataType::Timestamp(_, _) and unconditionally applied "UTC" to the resulting array. This fundamentally changed the semantics of TimestampNTZ and blocked proper downstream support for the NTZ type.
What changes are included in this PR?
This PR fixes the bug where casting a string to TimestampNTZ (Timestamp without time zone) incorrectly produces an Arrow array with UTC timezone metadata.
Key changes include:
Refactored cast_utf8_to_timestamp! macro: Updated the macro to accept an Option<&str> for the timezone metadata ($with_tz). The array builder now conditionally applies .with_timezone(tz_str) only if a timezone string is explicitly provided.
Separated Timestamp and TimestampNTZ handling: In cast_string_to_timestamp, we now explicitly pattern match DataType::Timestamp(, Some()) and DataType::Timestamp(_, None).
Corrected NTZ semantics: For TimestampNTZ, the macro now uses &Utc as the baseline for timestamp_parser to calculate the correct naive epoch microseconds, while passing None::<&str> to ensure the resulting Arrow array has no timezone metadata attached.
How are these changes tested?
Add test