Skip to content

Support extracting struct fields as Variant using ExtensionType#9598

Open
codephage2020 wants to merge 1 commit intoapache:mainfrom
codephage2020:issue-9519
Open

Support extracting struct fields as Variant using ExtensionType#9598
codephage2020 wants to merge 1 commit intoapache:mainfrom
codephage2020:issue-9519

Conversation

@codephage2020
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

variant_get cannot extract a struct where some fields remain as VariantArrays. Callers must call variant_get per leaf and assemble results manually.

What changes are included in this PR?

Support extracting struct fields as VariantArray via VariantType extension metadata in variant_get, and fix panics on absent fields and silent data loss on non-struct typed values.

Are these changes tested?

Yes — test_struct_extraction_with_variant_fields and test_struct_extraction_missing_variant_field_no_panic.

Are there any user-facing changes?

No breaking changes. Fields marked with VariantType extension metadata are extracted as VariantArrays in struct extraction.

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Mar 21, 2026
@codephage2020
Copy link
Contributor Author

@scovich @sdf-jkl @klion26 Feel free to give me some advice when you have time. Thanks.

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. A bunch of questions.

Comment on lines -229 to +277
.collect::<Result<Vec<_>>>()?;
.collect();
let children = children?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit seems like a gratuitous change?

let children = fields...collect::<Result<Vec_>>>()?;

becomes

let children: Result<Vec<_>> = fields...collect();
let children = children?;

field
.as_ref()
.clone()
.with_data_type(child.data_type().clone())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why needed? Doesn't it already have that type?

Comment on lines +263 to +264
// Field was requested as Variant but data is all-null;
// preserve the data type from the child without extension metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional, to strip away e.g. JSON or other extensions that might be present?

Also, why is it correct to change the data type merely because the data are all-null?
Shouldn't it just come back as an all-null StructArray with variant annotations?

Comment on lines +234 to +239
let is_variant_field = field.try_extension_type::<VariantType>().is_ok();
let field_as_type: Option<&Field> = if is_variant_field {
None
} else {
Some(field.as_ref())
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a double negative... trying to decide if it could be clearer with inverted logic?

let is_strongly_typed = field.try_extension_type::<VariantType>().is_err();
let field_as_type = is_strongly_typed.then(|| field.as_ref());

Comment on lines +93 to +96
// Field exists but is not a StructArray (VariantArray),
// which means it's not shredded further.
if !cast_options.safe {
return Err(ArrowError::CastError(format!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(but maybe we want to fix that everywhere in a separate dedicated PR)

Comment on lines +257 to +259
field
.as_ref()
.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is the same in all cases. Can we simplify by hoisting out a let mut new_field and then:

let updated_field = if is_strongly_typed {
    new_field
} else {
    // handle a variant field that may or may not exist in the data (see PR comment below)
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] Support extracting struct fields as Variant using ExtensionType

2 participants