Summary
The hour, minute, and second expressions (and potentially other timestamp extraction functions) incorrectly apply timezone conversion when the input is a TimestampNTZ (timestamp without timezone) value.
Root Cause
In native/spark-expr/src/datetime_funcs/extract_date_part.rs, the implementation uses array_with_timezone:
let array = array_with_timezone(
array,
self.timezone.clone(),
Some(&DataType::Timestamp(
Microsecond,
Some(self.timezone.clone().into()),
)),
)?;
For a TimestampNTZ input (Timestamp(Microsecond, None)), this calls timestamp_ntz_to_timestamp in utils.rs, which converts the local time value to UTC by applying the session timezone offset.
Expected Behavior
For TimestampNTZ inputs:
- Values are stored as local time without any timezone information
hour(timestamp_ntz_value) should return the hour component directly from the local time value
- No timezone conversion should be applied
For Timestamp (with timezone) inputs:
- Values are stored in UTC
hour(timestamp_value) should convert from UTC to the session timezone before extracting the hour
- This is the current behavior and is correct
Example
-- With session timezone = America/Los_Angeles (UTC-8)
-- TimestampNTZ value representing "2024-01-15 10:30:00" local time
--
-- Expected: hour() = 10 (extract directly from local time)
-- Actual (bug): hour() = 18 (incorrectly converts assuming value is in UTC)
Affected Functions
hour (SparkHour)
minute (SparkMinute)
second (SparkSecond)
All three use the same extract_date_part! macro with array_with_timezone.
Impact
This bug can cause incorrect results when using hour/minute/second functions on TimestampNTZ columns in non-UTC timezones. Currently the Scala serde layer does not validate input types, so TimestampNTZ inputs are not blocked.
Suggested Fix
Either:
- Add input type validation in the Scala serde layer (like
CometUnixTimestamp) to mark TimestampNTZ as unsupported and fall back to Spark
- Fix the Rust implementation to handle TimestampNTZ correctly by not applying timezone conversion
Option 1 is safer as a short-term fix.
Related
Note: This issue was generated with AI assistance.
Summary
The
hour,minute, andsecondexpressions (and potentially other timestamp extraction functions) incorrectly apply timezone conversion when the input is a TimestampNTZ (timestamp without timezone) value.Root Cause
In
native/spark-expr/src/datetime_funcs/extract_date_part.rs, the implementation usesarray_with_timezone:For a TimestampNTZ input (
Timestamp(Microsecond, None)), this callstimestamp_ntz_to_timestampinutils.rs, which converts the local time value to UTC by applying the session timezone offset.Expected Behavior
For TimestampNTZ inputs:
hour(timestamp_ntz_value)should return the hour component directly from the local time valueFor Timestamp (with timezone) inputs:
hour(timestamp_value)should convert from UTC to the session timezone before extracting the hourExample
Affected Functions
hour(SparkHour)minute(SparkMinute)second(SparkSecond)All three use the same
extract_date_part!macro witharray_with_timezone.Impact
This bug can cause incorrect results when using hour/minute/second functions on TimestampNTZ columns in non-UTC timezones. Currently the Scala serde layer does not validate input types, so TimestampNTZ inputs are not blocked.
Suggested Fix
Either:
CometUnixTimestamp) to mark TimestampNTZ as unsupported and fall back to SparkOption 1 is safer as a short-term fix.
Related