Support extracting struct fields as Variant using ExtensionType#9598
Support extracting struct fields as Variant using ExtensionType#9598codephage2020 wants to merge 1 commit intoapache:mainfrom
Conversation
scovich
left a comment
There was a problem hiding this comment.
Thanks for looking into this. A bunch of questions.
| .collect::<Result<Vec<_>>>()?; | ||
| .collect(); | ||
| let children = children?; |
There was a problem hiding this comment.
This bit seems like a gratuitous change?
let children = fields...collect::<Result<Vec_>>>()?;becomes
let children: Result<Vec<_>> = fields...collect();
let children = children?;| field | ||
| .as_ref() | ||
| .clone() | ||
| .with_data_type(child.data_type().clone()) |
There was a problem hiding this comment.
Why needed? Doesn't it already have that type?
| // Field was requested as Variant but data is all-null; | ||
| // preserve the data type from the child without extension metadata. |
There was a problem hiding this comment.
Is this intentional, to strip away e.g. JSON or other extensions that might be present?
Also, why is it correct to change the data type merely because the data are all-null?
Shouldn't it just come back as an all-null StructArray with variant annotations?
| let is_variant_field = field.try_extension_type::<VariantType>().is_ok(); | ||
| let field_as_type: Option<&Field> = if is_variant_field { | ||
| None | ||
| } else { | ||
| Some(field.as_ref()) | ||
| }; |
There was a problem hiding this comment.
This is a bit of a double negative... trying to decide if it could be clearer with inverted logic?
let is_strongly_typed = field.try_extension_type::<VariantType>().is_err();
let field_as_type = is_strongly_typed.then(|| field.as_ref());| // Field exists but is not a StructArray (VariantArray), | ||
| // which means it's not shredded further. | ||
| if !cast_options.safe { | ||
| return Err(ArrowError::CastError(format!( |
There was a problem hiding this comment.
- Probably clashes with variant_get should follow JSONpath semantics #9606
(but maybe we want to fix that everywhere in a separate dedicated PR)
| field | ||
| .as_ref() | ||
| .clone() |
There was a problem hiding this comment.
This part is the same in all cases. Can we simplify by hoisting out a let mut new_field and then:
let updated_field = if is_strongly_typed {
new_field
} else {
// handle a variant field that may or may not exist in the data (see PR comment below)
};
Which issue does this PR close?
Rationale for this change
variant_get cannot extract a struct where some fields remain as VariantArrays. Callers must call variant_get per leaf and assemble results manually.
What changes are included in this PR?
Support extracting struct fields as VariantArray via VariantType extension metadata in variant_get, and fix panics on absent fields and silent data loss on non-struct typed values.
Are these changes tested?
Yes — test_struct_extraction_with_variant_fields and test_struct_extraction_missing_variant_field_no_panic.
Are there any user-facing changes?
No breaking changes. Fields marked with VariantType extension metadata are extracted as VariantArrays in struct extraction.