Skip to content

Conversation

@wackywendell
Copy link
Contributor

@wackywendell wackywendell commented Sep 15, 2025

Summary

Begins addressing #367 and #342 by adding support for parsing the types in YAML Simple Extension Files into Rustic types - with validity enforced. This includes a string text parser handling built-in types, compound types, named structs, custom types, and validated parameter constraints in the Simple Extension YAML files.

Scope

  • Types-only: no functions or call validation yet.
  • Public API exposes parsed types (ExtensionFile, Registry, CustomType, ConcreteType) and enforces validation of those on creation / read.

Key Changes

  • Type system
    • New BuiltinType, CompoundType, ConcreteType, CustomType with Display/round‑trip support for alias and named‑struct structures.
    • Parameter constraints: data type, integer (with min/max), enum (validated/deduped), boolean, string.
    • Parsing to and from the YAML structures (TryFrom<TypeParamDefsItem>, Parse<RawType>)
  • File/Registry type: abstraction for handling YAML files
  • Context and proto glue
    • Separates out ProtoContext from Context, to distinguish between things needed for Protobuf parsing (ProtoContext)
  • Type expression parser
    • Parses simple, user‑defined (u!Name) and type variables; visits extension references for linkage bookkeeping.
  • Build/CI
    • parse feature includes serde_yaml; include!(extensions.in) is gated behind extensions feature.
    • Aligns actions/checkout to v4, updates Cargo dependency set, and bumps the substrait submodule.

Compatibility Notes

  • New trait bound ProtoContext on proto parsing that previously required only Context.
  • extensions.in now compiled only with features=["extensions"].
  • Minimal, types-only round‑trip implemented; other sections remain empty when converting back to text.

Testing

  • New unit tests cover:
    • Type parsing and round‑trip for alias and named‑struct.
    • Parameter constraint handling including enum validation and integer bounds (with current truncation behavior).
    • Registry creation and type lookup; core registry smoke test behind features=["extensions"].

@wackywendell wackywendell changed the title Add Initial Extension Support feat: Add Initial Extension Support Oct 10, 2025
@wackywendell wackywendell changed the title feat: Add Initial Extension Support feat: add initial extension support Oct 10, 2025
@wackywendell wackywendell marked this pull request as ready for review October 20, 2025 15:32
Copy link
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments but still haven't gone through most of the PR. Just wanted to flush what I have so far, but I will review more later. Thanks!

prost = "0.14.1"
prost-types = "0.14.1"
# Required by generated text schemas: the typify-generated code emits
# ::regress::Regex for `pattern` validations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, it looks like regress was already present before. Is this comment just clarifying why it was there in the first place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just clarifying its use!

pub trait ProtoContext: Context {
/// Add a [SimpleExtensionUrn] to this context. Must return an error for duplicate
/// anchors or when the urn is not supported.
/// anchors or when the URI is not supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just consistently use URN here? We have support for both in all of the other libraries and we will be dropping URI eventually.

}
}

pub trait ProtoContext: Context {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, I don't have a ton of Rust experience, so it may be that this is totally standard.

But I wonder if there's a simpler way to handle parsing that isn't so trait-heavy. AFAICT, there is only one implementor of the ProtoContext trait, which is the test fixture (lines 81-107).

Could we drop the ProtoContext trait and use a concrete type instead, then update tests to use explicit instantiations of that type? If not, what is the benefit of using this trait here?

Copy link
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more comments, but excited to get this in! Sorry that this PR has been sitting so long 😅

impl Registry {
/// Create a new Global Registry from validated extension files
pub fn new(extensions: Vec<ExtensionFile>) -> Self {
Self { extensions }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if two extension files contain conflicting definitions? E.g. two different files have the same urn but introduce types with the same name but different parameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense to instead maintain a hashmap from URN to ExtensionFile? The spec says:

Each YAML file must include a required urn field that uniquely identifies the extension.

To me, this suggests that (at least for now) all URNs must uniquely identify a single extension file.

Self { extensions }
}

// Private helper methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment suggests that the below methods are private but get_type below is public.

Suggested change
// Private helper methods

}

let back = ext.to_raw();
assert_eq!(back.urn, "extension:example.com:param_test");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify this test by just generating the PartialEq implementation for text::simple_extensions::SimpleExtensions and then just directly checking roundtrip equality here (rather than digging into individual fields). We will probably want to introduce a roundtrip test framework at some point to be consistent with the other implementations, so might as well start here 🙂


impl<'a> TypeExpr<'a> {
/// Parse a type string into a [`TypeExpr`].
pub fn parse(type_str: &'a str) -> Result<Self, TypeParseError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a particular suggestion here, but just wanted to call it that it would be awesome if we could use the antlr grammar here. There aren't great rust bindings for antlr (there is this but it was last updated 3 years ago). That being said, maybe an evil solution in the future is to generate C++ bindings and then call those from rust 🤷


#[test]
fn test_user_defined_and_parameters() {
let expr = "u!geo?<i32?, point<i32, i32>>";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to follow the structure above of just having a list of strings and expected types that we iterate over? I can imagine wanting to add a few more cases in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I mean something like:

let test_cases = [
  (case1, expect1),
  (case2, expect2),
  etc.. 
];

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants