-
Notifications
You must be signed in to change notification settings - Fork 30
feat: Add metadata models package with dynamic schema download #807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add airbyte_cdk.metadata_models package with auto-generated Pydantic models - Models are generated from JSON schemas in airbytehq/airbyte repository - Schemas are downloaded on-demand during build process (no submodules) - Uses pydantic.v1 compatibility layer for consistency with declarative models - Includes comprehensive README with usage examples - Adds py.typed marker for type hint compliance The metadata models enable validation of connector metadata.yaml files using the same schemas maintained in the main Airbyte repository. Co-Authored-By: AJ Steers <aj@airbyte.io>
Original prompt from AJ Steers |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new airbyte_cdk.metadata_models package containing auto-generated Pydantic models for validating connector metadata.yaml files. The models are dynamically generated during build by downloading JSON schemas from the airbytehq/airbyte repository.
Key changes:
- New metadata_models package with auto-generated Pydantic validation models
- Build script extension to download schemas from GitHub during build process
- Comprehensive README with usage examples and documentation
Reviewed Changes
Copilot reviewed 3 out of 35 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| airbyte_cdk/metadata_models/py.typed | Marker file for PEP 561 type hint compliance |
| airbyte_cdk/metadata_models/init.py | Package initialization re-exporting generated models |
| airbyte_cdk/metadata_models/README.md | Documentation covering usage, available models, and regeneration process |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1760999875-add-metadata-models#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1760999875-add-metadata-modelsHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
📝 WalkthroughWalkthroughAdds tooling and packaging to generate and expose Airbyte connector metadata Pydantic models: new generation script, consolidated JSON schema output, package re-exports for generated models, docs for regeneration, build-task updates, and lint/type-ignore rules for generated artifacts. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Script as generate_connector_metadata_files.py
participant GitHub as Remote Schema Repo
participant Generator as datamodel-codegen (container)
participant FS as Output Files
User->>Script: run main()
Script->>GitHub: clone_schemas_from_github (sparse clone)
GitHub-->>Script: YAML schema files
Script->>Script: consolidate_yaml_schemas_to_json
Script->>Generator: generate_models_from_json_schema (datamodel-codegen)
Generator-->>Script: generated Python content
Script->>Script: post-process imports (pydantic.v1) & merge
Script->>FS: write generated/models.py and metadata_schema.json
FS-->>User: artifacts ready
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–50 minutes
Would you like me to call out any of these files in more detail for review? wdyt? Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (4)
airbyte_cdk/metadata_models/__init__.py (1)
1-1: Consider adding an explicit__all__for API clarity, wdyt?While wildcard imports are acceptable for package
__init__.pyfiles, defining an explicit__all__list would make the public API surface more obvious and give you finer control over what gets exported. This is especially useful for generated code where you might want to filter out internal types or helpers.Example approach:
+__all__ = [ + "ConnectorMetadataDefinitionV0", + # Add other public models as needed +] + from .generated import *Alternatively, you could generate the
__all__list ingenerated.pyitself during the build process.airbyte_cdk/metadata_models/README.md (1)
37-47: Second example assumes undefinedmetadata_dict, wdyt?The second usage example references
metadata_dicton line 40, but this variable isn't defined in that code block. While readers might understand it comes from the first example, making this example standalone (or explicitly noting it continues from the first) would improve clarity.Consider either:
- Making it self-contained:
from airbyte_cdk.metadata_models import ConnectorMetadataDefinitionV0 import yaml metadata_dict = yaml.safe_load(Path("path/to/metadata.yaml").read_text()) metadata = ConnectorMetadataDefinitionV0(**metadata_dict) # Access fields with full type safety print(f"Connector: {metadata.data.name}") # ...
- Or adding a comment linking to the first example:
# Continuing from the validation example above... from airbyte_cdk.metadata_models import ConnectorMetadataDefinitionV0 metadata = ConnectorMetadataDefinitionV0(**metadata_dict) # ...bin/generate_component_manifest_files.py (2)
26-27: Consider consolidating the YAML file discovery functions, wdyt?Both
get_all_yaml_files_without_ext()andget_all_yaml_files_from_dir()perform the same operation, with the new one being more parameterized. The old function is still used on line 237 for declarative models.You could simplify by using the new parameterized version everywhere:
-def get_all_yaml_files_without_ext() -> list[str]: - return [Path(f).stem for f in glob(f"{LOCAL_YAML_DIR_PATH}/*.yaml")] - - def get_all_yaml_files_from_dir(directory: str) -> list[str]: return [Path(f).stem for f in glob(f"{directory}/*.yaml")]Then update line 237:
- declarative_yaml_files = get_all_yaml_files_without_ext() + declarative_yaml_files = get_all_yaml_files_from_dir(LOCAL_YAML_DIR_PATH)Also applies to: 30-32
160-179: Consider extracting common post-processing logic, wdyt?There's code duplication between
post_process_metadata_modelsandpost_process_codegen(lines 136-157). Both functions:
- Create
/generated_post_processeddirectory- Iterate through
.pyfiles- Replace pydantic imports
- Write to new files
You could extract the common pattern:
async def apply_post_processing( codegen_container: dagger.Container, transform_fn: callable[[str], str] ) -> dagger.Container: """Apply a transformation function to all generated Python files.""" codegen_container = codegen_container.with_exec( ["mkdir", "/generated_post_processed"], use_entrypoint=True ) for generated_file in await codegen_container.directory("/generated").entries(): if generated_file.endswith(".py"): original_content = await codegen_container.file( f"/generated/{generated_file}" ).contents() post_processed_content = transform_fn(original_content) codegen_container = codegen_container.with_new_file( f"/generated_post_processed/{generated_file}", contents=post_processed_content ) return codegen_container async def post_process_metadata_models(codegen_container: dagger.Container): """Post-process metadata models to use pydantic.v1 compatibility layer.""" def transform(content: str) -> str: return content.replace("from pydantic", "from pydantic.v1") return await apply_post_processing(codegen_container, transform)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (31)
airbyte_cdk/metadata_models/generated/ActorDefinitionResourceRequirements.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/AirbyteInternal.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/AllowedHosts.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorBreakingChanges.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorBuildOptions.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorIPCOptions.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorMetadataDefinitionV0.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorMetrics.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorPackageInfo.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorRegistryDestinationDefinition.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorRegistryReleases.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorRegistrySourceDefinition.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorRegistryV0.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorReleases.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ConnectorTestSuiteOptions.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/GeneratedFields.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/GitInfo.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/JobType.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/NormalizationDestinationDefinitionConfig.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/RegistryOverrides.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ReleaseStage.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/RemoteRegistries.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/ResourceRequirements.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/RolloutConfiguration.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/Secret.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/SecretStore.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/SourceFileInfo.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/SuggestedStreams.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/SupportLevel.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/TestConnections.pyis excluded by!**/generated/**airbyte_cdk/metadata_models/generated/__init__.pyis excluded by!**/generated/**
📒 Files selected for processing (4)
airbyte_cdk/metadata_models/README.md(1 hunks)airbyte_cdk/metadata_models/__init__.py(1 hunks)airbyte_cdk/metadata_models/py.typed(1 hunks)bin/generate_component_manifest_files.py(3 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
airbyte_cdk/metadata_models/README.md
8-8: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (3)
airbyte_cdk/metadata_models/py.typed (1)
1-1: Looks good! Quick packaging question though.The py.typed marker is correctly placed and will signal to type checkers that the package is fully typed. Just to confirm—have you verified that
py.typedwill be included in the package distribution? This is sometimes missed if the build configuration (pyproject.toml, setup.py, or MANIFEST.in) doesn't explicitly include package marker files. Wdyt?bin/generate_component_manifest_files.py (2)
5-5: LGTM on the new imports!The addition of
tempfilefor temporary schema storage andhttpxfor async HTTP requests are appropriate choices for this functionality.Also applies to: 11-11
224-231: Clarify mutual exclusivity ofpost_processandmetadata_modelsflags, wdyt?The conditional logic here suggests
post_processandmetadata_modelsshould be mutually exclusive, but the function signature allows both to beTrue. If both areTrue, theelifon line 227 meansmetadata_modelstakes precedence andpost_processis silently ignored.Consider either:
- Adding validation to ensure only one is True:
) -> None: """Generate Pydantic models from YAML schemas using datamodel-codegen.""" + if post_process and metadata_models: + raise ValueError("post_process and metadata_models cannot both be True") + init_module_content = generate_init_module_content(yaml_files)
- Or simplifying to a single enum parameter:
async def generate_models_from_schemas( dagger_client: dagger.Client, yaml_dir_path: str, output_dir_path: str, yaml_files: list[str], processing_mode: Literal["none", "declarative", "metadata"] = "none", ) -> None:This would make the intent clearer and prevent misuse.
| ``` | ||
| airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src/ | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add language specifier to fenced code block, wdyt?
The fenced code block showing the repository path should have a language specifier for proper syntax highlighting and markdown lint compliance.
Apply this diff:
-```
+```text
airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src/
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>
8-8: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
In airbyte_cdk/metadata_models/README.md around lines 8 to 10, the fenced code
block lacks a language specifier; update the opening fence from totext
so the block becomes a plain-text fenced code block (i.e., replace the existing
fence with ```text and keep the content and closing fence unchanged).
</details>
<!-- This is an auto-generated comment by CodeRabbit -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Devin, let's move the generated models module until this module instead: airbyte_cdk.test.models.connector_metadata
…etadata - Moved from airbyte_cdk.metadata_models.generated to airbyte_cdk.test.models.connector_metadata - Updated build script to output to new location - Updated README with new import paths Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 15
♻️ Duplicate comments (13)
airbyte_cdk/test/models/connector_metadata/README.md (1)
8-10: Address the markdown lint issue.A previous review flagged this fenced code block for missing a language specifier. While this is a minor formatting issue, it's good practice to fix for markdown compliance.
Apply this diff:
-``` +```text airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src/</blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/ReleaseStage.py (1)</summary><blockquote> `9-13`: **Same duplication issue as SupportLevel, wdyt?** The `ReleaseStage` enum is also defined identically in at least 5 other generated files (ConnectorMetadataDefinitionV0.py, ConnectorRegistryDestinationDefinition.py, ConnectorRegistryReleases.py, ConnectorRegistrySourceDefinition.py, ConnectorRegistryV0.py), causing the same type identity and import conflicts described in the SupportLevel comment. This is part of the same systemic code generation issue where shared types are being duplicated rather than imported. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/SourceFileInfo.py (1)</summary><blockquote> `11-16`: **Same duplication issue affects SourceFileInfo, wdyt?** The `SourceFileInfo` model is defined identically in at least 6 other files (ConnectorMetadataDefinitionV0.py, ConnectorRegistryDestinationDefinition.py, ConnectorRegistryReleases.py, ConnectorRegistrySourceDefinition.py, ConnectorRegistryV0.py, GeneratedFields.py). This is part of the same systemic code generation issue. Since this is a Pydantic model rather than just an enum, the type conflicts are even more severe—instances won't validate against models from different modules. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/ResourceRequirements.py (1)</summary><blockquote> `11-18`: **ResourceRequirements also duplicated across 5+ files, wdyt?** The `ResourceRequirements` model is defined identically in at least 5 other files (ActorDefinitionResourceRequirements.py, ConnectorMetadataDefinitionV0.py, ConnectorRegistryDestinationDefinition.py, ConnectorRegistryReleases.py, RegistryOverrides.py), part of the same code generation issue. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/TestConnections.py (1)</summary><blockquote> `9-14`: **TestConnections duplicated in 2 files, wdyt?** The `TestConnections` model is defined identically in ConnectorMetadataDefinitionV0.py (lines 56-61) and ConnectorTestSuiteOptions.py (lines 31-36), part of the same code generation issue. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/RemoteRegistries.py (1)</summary><blockquote> `11-23`: **Both PyPi and RemoteRegistries duplicated, wdyt?** Both `PyPi` (lines 11-16) and `RemoteRegistries` (lines 19-23) are defined identically in ConnectorMetadataDefinitionV0.py (PyPi at lines 203-208, RemoteRegistries at lines 322-326), part of the same code generation issue. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (1)</summary><blockquote> `13-32`: **GitInfo duplication already flagged.** This `GitInfo` definition is identical to the one in `GitInfo.py`. See the earlier comment on that file regarding the broader duplication pattern. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/GitInfo.py (1)</summary><blockquote> `12-31`: **Duplicate model definition - part of broader pattern.** The `GitInfo` model is duplicated across at least 6 files in this package (ConnectorMetadataDefinitionV0.py, ConnectorRegistryDestinationDefinition.py, ConnectorRegistryReleases.py, ConnectorRegistrySourceDefinition.py, ConnectorRegistryV0.py, and GeneratedFields.py). This is the most widely duplicated model in the codebase. Consider this as part of the broader duplication issue flagged in Secret.py. The generation script should be configured to create these shared models once and use imports. Wdyt? </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/RolloutConfiguration.py (1)</summary><blockquote> `11-29`: **Model structure looks good; duplication noted.** The `RolloutConfiguration` model has well-chosen defaults and appropriate validation constraints (percentage ranges, minimum delay). However, it's duplicated in ConnectorMetadataDefinitionV0.py, ConnectorRegistryReleases.py, and ConnectorRegistrySourceDefinition.py. This is part of the broader duplication pattern. Wdyt about consolidating these shared models? </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/AirbyteInternal.py (1)</summary><blockquote> `29-39`: **Model structure is appropriate; duplication noted.** The `AirbyteInternal` model correctly uses `Extra.allow` (unlike most other models in this package that use `Extra.forbid`), which is appropriate for internal/extensible metadata. The defaults are sensible (isEnterprise=False, requireVersionIncrementsInPullRequests=True). However, this model along with the `Sl` and `Ql` enums is duplicated in ConnectorMetadataDefinitionV0.py, ConnectorRegistryDestinationDefinition.py, and ConnectorRegistrySourceDefinition.py—part of the broader duplication pattern. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/ConnectorBreakingChanges.py (1)</summary><blockquote> `13-70`: **Breaking change models are well-structured; duplication noted.** The breaking change models use appropriate Pydantic v1 patterns: - `const=True` for type discrimination in `StreamBreakingChangeScope` - `__root__` wrappers for dict-based schemas - `date` and `AnyUrl` types for proper validation However, all five models are duplicated in ConnectorMetadataDefinitionV0.py—part of the broader duplication pattern discussed in earlier files. </blockquote></details> <details> <summary>airbyte_cdk/test/models/connector_metadata/ConnectorTestSuiteOptions.py (1)</summary><blockquote> `19-49`: **Duplicate definitions: Should import from Secret.py instead.** `SecretStore` (lines 19-29) and `Secret` (lines 40-49) are identical to the models defined in `Secret.py` within the same PR. Since both files are in the same package (`airbyte_cdk/test/models/connector_metadata/`), `ConnectorTestSuiteOptions.py` should import these models rather than redefining them: ```python from airbyte_cdk.test.models.connector_metadata.Secret import Secret, SecretStoreThis would eliminate the duplication and ensure consistency. Wdyt?
airbyte_cdk/test/models/connector_metadata/ActorDefinitionResourceRequirements.py (1)
22-48: Resource requirements models are well-structured; duplication noted.The models provide good flexibility with global defaults and per-job overrides. The
JobTypeenum comprehensively covers connector operations.However, all four models are duplicated in ConnectorMetadataDefinitionV0.py—part of the broader duplication pattern.
🧹 Nitpick comments (22)
airbyte_cdk/test/models/connector_metadata/ConnectorMetrics.py (1)
12-15: Consider stronger typing for metrics fields, wdyt?The
ConnectorMetricsclass usesOptional[Any]for all three fields (all, cloud, oss), which sacrifices type safety. This might be intentional for flexible metric schemas, but if the metrics structure is known, consider using more specific types (e.g.,Optional[ConnectorMetric]or a typed dictionary).That said, the flexibility might be needed if metrics schemas vary. What's the rationale here?
bin/generate_component_manifest_files.py (1)
160-179: Consider extracting the shared post-processing logic, wdyt?The
post_process_metadata_modelsfunction has nearly identical logic topost_process_codegen(lines 136-157), both iterating through generated files and applying string replacements. The main difference is thatpost_process_metadata_modelsonly replaces pydantic imports whilepost_process_codegendoes additional replacements.Consider extracting the common pattern into a shared helper:
+async def post_process_generated_files( + codegen_container: dagger.Container, + replacements: list[tuple[str, str]], + apply_deprecation_fix: bool = False +) -> dagger.Container: + """Apply string replacements to generated Python files.""" + codegen_container = codegen_container.with_exec( + ["mkdir", "/generated_post_processed"], use_entrypoint=True + ) + for generated_file in await codegen_container.directory("/generated").entries(): + if generated_file.endswith(".py"): + original_content = await codegen_container.file( + f"/generated/{generated_file}" + ).contents() + + post_processed_content = original_content + for old, new in replacements: + post_processed_content = post_processed_content.replace(old, new) + + if apply_deprecation_fix: + post_processed_content = replace_base_model_for_classes_with_deprecated_fields( + post_processed_content + ) + + codegen_container = codegen_container.with_new_file( + f"/generated_post_processed/{generated_file}", contents=post_processed_content + ) + return codegen_container async def post_process_codegen(codegen_container: dagger.Container): - codegen_container = codegen_container.with_exec( - ["mkdir", "/generated_post_processed"], use_entrypoint=True - ) - for generated_file in await codegen_container.directory("/generated").entries(): - if generated_file.endswith(".py"): - original_content = await codegen_container.file( - f"/generated/{generated_file}" - ).contents() - post_processed_content = original_content.replace( - " _parameters:", " parameters:" - ).replace("from pydantic", "from pydantic.v1") - - post_processed_content = replace_base_model_for_classes_with_deprecated_fields( - post_processed_content - ) - - codegen_container = codegen_container.with_new_file( - f"/generated_post_processed/{generated_file}", contents=post_processed_content - ) - return codegen_container + return await post_process_generated_files( + codegen_container, + replacements=[(" _parameters:", " parameters:"), ("from pydantic", "from pydantic.v1")], + apply_deprecation_fix=True + ) async def post_process_metadata_models(codegen_container: dagger.Container): """Post-process metadata models to use pydantic.v1 compatibility layer.""" - codegen_container = codegen_container.with_exec( - ["mkdir", "/generated_post_processed"], use_entrypoint=True - ) - for generated_file in await codegen_container.directory("/generated").entries(): - if generated_file.endswith(".py"): - original_content = await codegen_container.file( - f"/generated/{generated_file}" - ).contents() - - post_processed_content = original_content.replace( - "from pydantic", "from pydantic.v1" - ) - - codegen_container = codegen_container.with_new_file( - f"/generated_post_processed/{generated_file}", contents=post_processed_content - ) - return codegen_container + return await post_process_generated_files( + codegen_container, + replacements=[("from pydantic", "from pydantic.v1")], + apply_deprecation_fix=False + )airbyte_cdk/test/models/connector_metadata/ActorDefinitionResourceRequirements.py (1)
12-19: Consider adding validation for resource requirement formats.The
ResourceRequirementsfields areOptional[str]with no format validation. While this flexibility allows Kubernetes-style values like "100m" or "1Gi", it also accepts invalid values like "abc" or "100xyz" that would fail at runtime.Since these are auto-generated models, would it make sense to add format validation in the upstream YAML schema (e.g., regex patterns or custom validators)? Wdyt?
airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (1)
49-58: Consider consolidating identical enums.
UsageandSyncSuccessRatehave identical members (low, medium, high). If these represent the same conceptual scale, they could be consolidated into a single enum likeMetricLevelto reduce duplication. However, keeping them separate provides better semantic clarity if they represent distinct dimensions.Wdyt—worth consolidating or better to keep separate for clarity?
airbyte_cdk/test/models/connector_metadata/RegistryOverrides.py (1)
12-18: Should Extra be forbid on simple leaf models to catch typos?AllowedHosts/SuggestedStreams/NormalizationDestinationDefinitionConfig use Extra.allow, which can mask misspelled fields on these small shapes. Would switching to Extra.forbid for these specific leaves help catch config mistakes, or is “additionalProperties” intentionally allowed in the upstream schema, wdyt?
Also applies to: 22-37
airbyte_cdk/test/models/connector_metadata/ConnectorReleases.py (2)
43-44: Prefer Literal for const fields for clearer runtime checksUsing Any with const works but is looser than necessary. Shall we make this a Literal for better type clarity, wdyt?
-from typing import Any, Dict, List, Optional +from typing import Any, Dict, List, Optional, Literal @@ class StreamBreakingChangeScope(BaseModel): - scopeType: Any = Field("stream", const=True) + scopeType: Literal["stream"] = "stream"
17-32: Add cross-field check: initialPercentage ≤ maxPercentageA simple validator will prevent inconsistent rollout configs. Should we add it, wdyt?
class RolloutConfiguration(BaseModel): @@ advanceDelayMinutes: Optional[conint(ge=10)] = Field( 10, description="The number of minutes to wait before advancing the rollout percentage.", ) + + from pydantic.v1 import root_validator + + @root_validator + def _validate_rollout_bounds(cls, values): + init = values.get("initialPercentage") + maxp = values.get("maxPercentage") + if init is not None and maxp is not None and init > maxp: + raise ValueError("initialPercentage must be ≤ maxPercentage") + return valuesairbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
298-301: Tighten URL/time/number types and add a small integrity checkWould you be open to these stricter types and a light data check, wdyt?
- Use AnyUrl where fields are URLs:
- documentationUrl: str + documentationUrl: AnyUrl @@ - sbomUrl: Optional[str] = Field(None, description="URL to the SBOM file") + sbomUrl: Optional[AnyUrl] = Field(None, description="URL to the SBOM file")
- Use datetime for timestamps in SourceFileInfo:
- metadata_last_modified: Optional[str] = None - registry_entry_generated_at: Optional[str] = None + metadata_last_modified: Optional[datetime] = None + registry_entry_generated_at: Optional[datetime] = None
- Enforce positive values for maxSecondsBetweenMessages:
-from typing import Any, Dict, List, Optional, Union +from typing import Any, Dict, List, Optional, Union @@ - maxSecondsBetweenMessages: Optional[int] = Field( + maxSecondsBetweenMessages: Optional[conint(ge=1)] = Field( None, description="Number of seconds allowed between 2 airbyte protocol messages. The source will timeout if this delay is reach", )
- Ensure unique jobType in jobSpecific (same pattern as suggested elsewhere):
class ActorDefinitionResourceRequirements(BaseModel): @@ jobSpecific: Optional[List[JobTypeResourceLimit]] = None + from pydantic.v1 import validator + @validator("jobSpecific") + def _unique_job_types(cls, v): + if not v: + return v + seen = set() + for item in v: + if item.jobType in seen: + raise ValueError(f"duplicate jobType '{item.jobType.value}' in jobSpecific") + seen.add(item.jobType) + return vAlso applies to: 331-334, 240-241, 165-170, 186-188, 326-329, 247-252
airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py (1)
303-304: Optional tightenings: URLs, datetimes, and rollout boundsWould you consider these small improvements, wdyt?
- Use AnyUrl for documentationUrl and sbomUrl:
- documentationUrl: str + documentationUrl: AnyUrl @@ - sbomUrl: Optional[str] = Field(None, description="URL to the SBOM file") + sbomUrl: Optional[AnyUrl] = Field(None, description="URL to the SBOM file")
- Use datetime for SourceFileInfo timestamps:
- metadata_last_modified: Optional[str] = None - registry_entry_generated_at: Optional[str] = None + metadata_last_modified: Optional[datetime] = None + registry_entry_generated_at: Optional[datetime] = None
- Add rollout bound check:
class RolloutConfiguration(BaseModel): @@ advanceDelayMinutes: Optional[conint(ge=10)] = Field( 10, description="The number of minutes to wait before advancing the rollout percentage.", ) + from pydantic.v1 import root_validator + @root_validator + def _validate_rollout_bounds(cls, values): + init, maxp = values.get("initialPercentage"), values.get("maxPercentage") + if init is not None and maxp is not None and init > maxp: + raise ValueError("initialPercentage must be ≤ maxPercentage") + return valuesAlso applies to: 395-399, 240-241, 183-189, 79-94
airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
236-241: Minor hardening: URLs, datetimes, Literal scope type, rollout bound, uniquenessWould you consider these small improvements, wdyt?
- Use AnyUrl for sbomUrl:
- sbomUrl: Optional[str] = Field(None, description="URL to the SBOM file") + sbomUrl: Optional[AnyUrl] = Field(None, description="URL to the SBOM file")
- Use datetime for SourceFileInfo timestamps:
- metadata_last_modified: Optional[str] = None - registry_entry_generated_at: Optional[str] = None + metadata_last_modified: Optional[datetime] = None + registry_entry_generated_at: Optional[datetime] = None
- Prefer Literal for scopeType:
-from typing import Any, Dict, List, Optional, Union +from typing import Any, Dict, List, Optional, Union, Literal @@ class StreamBreakingChangeScope(BaseModel): - scopeType: Any = Field("stream", const=True) + scopeType: Literal["stream"] = "stream"
- Add rollout bounds validator:
class RolloutConfiguration(BaseModel): @@ advanceDelayMinutes: Optional[conint(ge=10)] = Field( 10, description="The number of minutes to wait before advancing the rollout percentage.", ) + from pydantic.v1 import root_validator + @root_validator + def _validate_rollout_bounds(cls, values): + init, maxp = values.get("initialPercentage"), values.get("maxPercentage") + if init is not None and maxp is not None and init > maxp: + raise ValueError("initialPercentage must be ≤ maxPercentage") + return values
- Ensure unique jobType in jobSpecific:
class ActorDefinitionResourceRequirements(BaseModel): @@ jobSpecific: Optional[List[JobTypeResourceLimit]] = None + from pydantic.v1 import validator + @validator("jobSpecific") + def _unique_job_types(cls, v): + if not v: + return v + seen = set() + for item in v: + if item.jobType in seen: + raise ValueError(f"duplicate jobType '{item.jobType.value}' in jobSpecific") + seen.add(item.jobType) + return vAlso applies to: 165-170, 146-161, 221-225, 35-37, 18-32
airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (6)
1-3: Reproducibility: embed generator + schema provenance?Since builds fetch schemas at build time, could we stamp the generator version, source schema URL, and commit SHA in this header to make artifacts reproducible and debuggable, wdyt?
101-111: Use Literal for const field to tighten validation and typingscopeType is effectively a constant. Shall we use Literal["stream"] instead of Any+const for stronger type-checking and clearer errors, wdyt?
Apply this diff:
-from typing import Any, Dict, List, Optional, Union +from typing import Any, Dict, List, Optional, Union, Literal @@ -class StreamBreakingChangeScope(BaseModel): +class StreamBreakingChangeScope(BaseModel): @@ - scopeType: Any = Field("stream", const=True) + scopeType: Literal["stream"] = Field("stream", const=True)
236-241: URL typing: prefer AnyUrl for sbomUrlsbomUrl looks like a URL. Would you switch to AnyUrl for stricter validation, wdyt?
- sbomUrl: Optional[str] = Field(None, description="URL to the SBOM file") + sbomUrl: Optional[AnyUrl] = Field(None, description="URL to the SBOM file")
323-327: Semver: add a regex constraint on protocolVersionIf protocolVersion is semver, shall we add a constraint to avoid accidental non-semver strings, wdyt?
- protocolVersion: Optional[str] = Field( + protocolVersion: Optional[constr(regex=r'^\d+\.\d+\.\d+(?:-[0-9A-Za-z-.]+)?$')] = Field( None, description="the Airbyte Protocol version supported by the connector" )
294-301: <!-- [scratchpad_update]
[observations_updated]
- Web search confirms AnyUrl is a valid Pydantic type from the networks module
- AnyUrl provides URL validation that catches malformed values at parse-time
- Pydantic v2 behavior: AnyUrl appends trailing slash; may affect downstream usage
- The suggestion to use AnyUrl is technically sound and recommended by Pydantic docs
- Need to verify: (1) Pydantic version used in this codebase, (2) whether AnyUrl is imported
[pending]
- Check Pydantic import and version in ConnectorRegistryDestinationDefinition.py
- Verify if changing to AnyUrl would break any dependent code (due to type differences)
- Determine if this needs coordination across all 8 instances found
[scratchpad_end] --><!-- [scratchpad_update_2]
[key_finding]- Airbyte CDK 2.0 (released mid-2024) uses Pydantic V2
- The files being reviewed are test/models/connector_metadata files
- These are likely part of the updated Pydantic V2 stack
- AnyUrl is available in Pydantic V2
[critical_observation]
- Need to check if these files are already using Pydantic V2 imports
- Suggested change (str -> AnyUrl) is valid for Pydantic V2
- The pattern exists across 8 locations in 4 files
- This represents a systemic opportunity for coordinated refactoring
[pending]
- Verify imports in target file to confirm Pydantic V2 usage
- Check if AnyUrl is already imported
- Determine scope of refactor (all 8 instances or just the target file?)
[scratchpad_end] -->documentationUrl/iconUrl: widen refactor scope—apply across all registry modelsThe suggestion to use AnyUrl for validation is sound. However, the verification found this pattern repeated in 8 locations across 4 files:
- ConnectorRegistryV0.py (lines 303, 395)
- ConnectorRegistrySourceDefinition.py (lines 298, 389)
- ConnectorRegistryReleases.py (lines 335, 389)
- ConnectorRegistryDestinationDefinition.py (lines 298, 390)
Rather than fixing just the target file, consider coordinating this into a single refactor pass across all registry models to maintain consistency. With CDK 2.0, Pydantic V2 is available, so AnyUrl is a valid choice. Would updating all four files at once work for your workflow, or should we proceed incrementally?
436-439: Consider addinglocals()and includingSourceDefinitionfor robustness and consistencyYour suggestion is spot on. The file defines
ConnectorRegistrySourceDefinitionat line 382, yet it's not receiving anupdate_forward_refs()call while the other three registry classes are. Passinglocals()also aligns with Pydantic best practices for ensuring forward references resolve against all types in the current module scope.The refactored version would look like:
_ns = locals() ConnectorRegistryDestinationDefinition.update_forward_refs(**_ns) ConnectorRegistryReleases.update_forward_refs(**_ns) ConnectorReleaseCandidates.update_forward_refs(**_ns) VersionReleaseCandidate.update_forward_refs(**_ns) ConnectorRegistrySourceDefinition.update_forward_refs(**_ns)This makes the handling symmetric and more resilient. Worth adopting?
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (6)
1-3: Reproducibility: capture generator + schema commit in headerTo help future debugging (and rate-limit workarounds), could we include the datamodel-codegen version, source schema URL(s), and pinned commit SHA here, wdyt?
166-172: Use Literal for const fieldSame as the other module: would you switch scopeType to Literal["stream"] for stricter typing, wdyt?
- scopeType: Any = Field("stream", const=True) + scopeType: Literal["stream"] = Field("stream", const=True)
330-335: URL typing: sbomUrl could be AnyUrlAlign with other URL fields to validate early. Shall we switch, wdyt?
- sbomUrl: Optional[str] = Field(None, description="URL to the SBOM file") + sbomUrl: Optional[AnyUrl] = Field(None, description="URL to the SBOM file")
397-404: connectorSubtype override: align type with enum while keeping flexibilityFor RegistryOverrides.connectorSubtype, would you type it as Optional[Union[str, ConnectorSubtype]] so we validate known values but still accept future strings, wdyt?
- connectorSubtype: Optional[str] = None + connectorSubtype: Optional[Union[str, ConnectorSubtype]] = None
441-488: StandardizedocumentationUrltype across metadata and registry definitionsThe inconsistency is confirmed:
ConnectorMetadataDefinitionV0.pyusesAnyUrlfordocumentationUrl(line 452), while all registry definition files (ConnectorRegistry*) usestr(6 instances across multiple files). This divergence could introduce parsing surprises when the same field is handled in different contexts. Would standardizing onAnyUrlacross both codegen flows make sense for your use cases?
471-474: Consider refactoring tags field to usedefault_factory=listwith non-optional typeThe current implementation uses a mutable default argument with an Optional type, which is a Python anti-pattern. The suggested change to
List[str] = Field(default_factory=list, ...)aligns with Pydantic best practices and eliminates the mutable default concern. A broader search of the codebase didn't surface other similar patterns, so this appears to be an isolated instance. Wdyt about making this change?- tags: Optional[List[str]] = Field( - [], - description="An array of tags that describe the connector. E.g: language:python, keyword:rds, etc.", - ) + tags: List[str] = Field( + default_factory=list, + description="An array of tags that describe the connector. E.g: language:python, keyword:rds, etc.", + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (34)
airbyte_cdk/test/models/connector_metadata/ActorDefinitionResourceRequirements.py(1 hunks)airbyte_cdk/test/models/connector_metadata/AirbyteInternal.py(1 hunks)airbyte_cdk/test/models/connector_metadata/AllowedHosts.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorBreakingChanges.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorBuildOptions.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorIPCOptions.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorMetrics.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorPackageInfo.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorReleases.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ConnectorTestSuiteOptions.py(1 hunks)airbyte_cdk/test/models/connector_metadata/GeneratedFields.py(1 hunks)airbyte_cdk/test/models/connector_metadata/GitInfo.py(1 hunks)airbyte_cdk/test/models/connector_metadata/JobType.py(1 hunks)airbyte_cdk/test/models/connector_metadata/NormalizationDestinationDefinitionConfig.py(1 hunks)airbyte_cdk/test/models/connector_metadata/README.md(1 hunks)airbyte_cdk/test/models/connector_metadata/RegistryOverrides.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ReleaseStage.py(1 hunks)airbyte_cdk/test/models/connector_metadata/RemoteRegistries.py(1 hunks)airbyte_cdk/test/models/connector_metadata/ResourceRequirements.py(1 hunks)airbyte_cdk/test/models/connector_metadata/RolloutConfiguration.py(1 hunks)airbyte_cdk/test/models/connector_metadata/Secret.py(1 hunks)airbyte_cdk/test/models/connector_metadata/SecretStore.py(1 hunks)airbyte_cdk/test/models/connector_metadata/SourceFileInfo.py(1 hunks)airbyte_cdk/test/models/connector_metadata/SuggestedStreams.py(1 hunks)airbyte_cdk/test/models/connector_metadata/SupportLevel.py(1 hunks)airbyte_cdk/test/models/connector_metadata/TestConnections.py(1 hunks)airbyte_cdk/test/models/connector_metadata/__init__.py(1 hunks)airbyte_cdk/test/models/connector_metadata/py.typed(1 hunks)bin/generate_component_manifest_files.py(3 hunks)
✅ Files skipped from review due to trivial changes (3)
- airbyte_cdk/test/models/connector_metadata/py.typed
- airbyte_cdk/test/models/connector_metadata/init.py
- airbyte_cdk/test/models/connector_metadata/JobType.py
🧰 Additional context used
🧬 Code graph analysis (21)
airbyte_cdk/test/models/connector_metadata/SupportLevel.py (5)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (1)
SupportLevel(72-75)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
SupportLevel(21-24)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
SupportLevel(66-69)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
SupportLevel(28-31)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py (1)
SupportLevel(21-24)
airbyte_cdk/test/models/connector_metadata/AllowedHosts.py (6)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (7)
AllowedHosts(78-85)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)Config(107-108)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
AllowedHosts(65-72)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
AllowedHosts(92-99)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
AllowedHosts(54-61)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py (1)
AllowedHosts(65-72)airbyte_cdk/test/models/connector_metadata/RegistryOverrides.py (1)
AllowedHosts(12-19)
airbyte_cdk/test/models/connector_metadata/ResourceRequirements.py (5)
airbyte_cdk/test/models/connector_metadata/ActorDefinitionResourceRequirements.py (4)
ResourceRequirements(12-19)Config(13-14)Config(33-34)Config(41-42)airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (6)
ResourceRequirements(116-123)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
ResourceRequirements(27-34)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
ResourceRequirements(72-79)airbyte_cdk/test/models/connector_metadata/RegistryOverrides.py (1)
ResourceRequirements(50-57)
airbyte_cdk/test/models/connector_metadata/Secret.py (3)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (7)
SecretStore(44-54)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)Secret(296-305)airbyte_cdk/test/models/connector_metadata/ConnectorTestSuiteOptions.py (2)
SecretStore(19-29)Secret(40-49)airbyte_cdk/test/models/connector_metadata/SecretStore.py (1)
SecretStore(11-21)
airbyte_cdk/test/models/connector_metadata/ConnectorIPCOptions.py (1)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (11)
SupportedSerializationEnum(269-272)SupportedTransportEnum(275-277)DataChannel(280-286)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)Config(107-108)Config(117-118)ConnectorIPCOptions(289-293)
airbyte_cdk/test/models/connector_metadata/NormalizationDestinationDefinitionConfig.py (2)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (6)
NormalizationDestinationDefinitionConfig(88-103)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
NormalizationDestinationDefinitionConfig(47-62)
airbyte_cdk/test/models/connector_metadata/ConnectorBreakingChanges.py (1)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (13)
DeadlineAction(157-159)StreamBreakingChangeScope(162-171)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)Config(107-108)Config(117-118)Config(137-138)BreakingChangeScope(316-320)VersionBreakingChange(362-384)ConnectorBreakingChanges(406-414)
airbyte_cdk/test/models/connector_metadata/RegistryOverrides.py (4)
airbyte_cdk/test/models/connector_metadata/ActorDefinitionResourceRequirements.py (7)
Config(13-14)Config(33-34)Config(41-42)ResourceRequirements(12-19)JobType(22-29)JobTypeResourceLimit(32-37)ActorDefinitionResourceRequirements(40-48)airbyte_cdk/test/models/connector_metadata/NormalizationDestinationDefinitionConfig.py (1)
NormalizationDestinationDefinitionConfig(9-24)airbyte_cdk/test/models/connector_metadata/ResourceRequirements.py (1)
ResourceRequirements(11-18)airbyte_cdk/test/models/connector_metadata/JobType.py (1)
JobType(9-16)
airbyte_cdk/test/models/connector_metadata/RemoteRegistries.py (1)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (7)
PyPi(204-209)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)RemoteRegistries(323-327)
airbyte_cdk/test/models/connector_metadata/ConnectorPackageInfo.py (4)
airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
ConnectorPackageInfo(217-218)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
ConnectorPackageInfo(199-200)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
ConnectorPackageInfo(217-218)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py (1)
ConnectorPackageInfo(217-218)
airbyte_cdk/test/models/connector_metadata/TestConnections.py (2)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (6)
TestConnections(57-62)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)airbyte_cdk/test/models/connector_metadata/ConnectorTestSuiteOptions.py (1)
TestConnections(32-37)
airbyte_cdk/test/models/connector_metadata/ConnectorBuildOptions.py (1)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (7)
ConnectorBuildOptions(30-34)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)Config(107-108)
airbyte_cdk/test/models/connector_metadata/SourceFileInfo.py (6)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (1)
SourceFileInfo(234-239)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
SourceFileInfo(182-187)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
SourceFileInfo(164-169)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
SourceFileInfo(182-187)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py (1)
SourceFileInfo(182-187)airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (1)
SourceFileInfo(35-40)
airbyte_cdk/test/models/connector_metadata/GitInfo.py (6)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (6)
GitInfo(212-231)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
GitInfo(160-179)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
GitInfo(142-161)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
GitInfo(160-179)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py (1)
GitInfo(160-179)airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (1)
GitInfo(13-32)
airbyte_cdk/test/models/connector_metadata/ActorDefinitionResourceRequirements.py (1)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (12)
ResourceRequirements(116-123)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)Config(107-108)Config(117-118)Config(137-138)JobType(126-133)JobTypeResourceLimit(308-313)ActorDefinitionResourceRequirements(351-359)
airbyte_cdk/test/models/connector_metadata/ReleaseStage.py (5)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (1)
ReleaseStage(65-69)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (1)
ReleaseStage(14-18)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
ReleaseStage(59-63)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
ReleaseStage(21-25)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryV0.py (1)
ReleaseStage(14-18)
airbyte_cdk/test/models/connector_metadata/RolloutConfiguration.py (3)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (6)
RolloutConfiguration(136-154)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (1)
RolloutConfiguration(14-32)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (1)
RolloutConfiguration(74-92)
airbyte_cdk/test/models/connector_metadata/ConnectorMetrics.py (2)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (9)
ConnectorMetrics(242-245)Usage(248-251)SyncSuccessRate(254-257)ConnectorMetric(260-266)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (4)
ConnectorMetrics(43-46)Usage(49-52)SyncSuccessRate(55-58)ConnectorMetric(61-67)
airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (1)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (12)
GitInfo(212-231)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)SourceFileInfo(234-239)ConnectorMetrics(242-245)Usage(248-251)SyncSuccessRate(254-257)ConnectorMetric(260-266)GeneratedFields(330-334)
airbyte_cdk/test/models/connector_metadata/AirbyteInternal.py (3)
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (9)
Sl(174-178)Ql(181-188)AirbyteInternal(191-201)Config(31-32)Config(45-46)Config(58-59)Config(79-80)Config(89-90)Config(107-108)airbyte_cdk/test/models/connector_metadata/ConnectorRegistryDestinationDefinition.py (3)
Sl(130-134)Ql(137-144)AirbyteInternal(147-157)airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (3)
Sl(130-134)Ql(137-144)AirbyteInternal(147-157)
airbyte_cdk/test/models/connector_metadata/ConnectorRegistryReleases.py (11)
airbyte_cdk/test/models/connector_metadata/ConnectorRegistrySourceDefinition.py (28)
RolloutConfiguration(74-92)DeadlineAction(95-97)StreamBreakingChangeScope(100-109)SourceType(14-18)ReleaseStage(21-25)SupportLevel(28-31)ResourceRequirements(34-41)JobType(44-51)AllowedHosts(54-61)SuggestedStreams(64-71)Sl(130-134)Ql(137-144)AirbyteInternal(147-157)GitInfo(160-179)SourceFileInfo(182-187)ConnectorMetrics(190-193)ConnectorMetric(208-214)ConnectorPackageInfo(217-218)NormalizationDestinationDefinitionConfig(112-127)BreakingChangeScope(229-233)JobTypeResourceLimit(221-226)GeneratedFields(236-240)VersionBreakingChange(254-276)ActorDefinitionResourceRequirements(243-251)ConnectorBreakingChanges(279-287)ConnectorRegistryReleases(344-354)ConnectorRegistrySourceDefinition(290-341)ConnectorRegistryDestinationDefinition(381-433)airbyte_cdk/test/models/connector_metadata/RolloutConfiguration.py (1)
RolloutConfiguration(11-29)airbyte_cdk/test/models/connector_metadata/ActorDefinitionResourceRequirements.py (7)
Config(13-14)Config(33-34)Config(41-42)ResourceRequirements(12-19)JobType(22-29)JobTypeResourceLimit(32-37)ActorDefinitionResourceRequirements(40-48)airbyte_cdk/test/models/connector_metadata/AirbyteInternal.py (4)
Config(30-31)Sl(12-16)Ql(19-26)AirbyteInternal(29-39)airbyte_cdk/test/models/connector_metadata/AllowedHosts.py (2)
Config(12-13)AllowedHosts(11-18)airbyte_cdk/test/models/connector_metadata/ConnectorBreakingChanges.py (8)
Config(19-20)Config(38-39)Config(63-64)DeadlineAction(13-15)StreamBreakingChangeScope(18-27)BreakingChangeScope(30-34)VersionBreakingChange(37-59)ConnectorBreakingChanges(62-70)airbyte_cdk/test/models/connector_metadata/SuggestedStreams.py (1)
SuggestedStreams(11-18)airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (5)
GitInfo(13-32)SourceFileInfo(35-40)ConnectorMetrics(43-46)ConnectorMetric(61-67)GeneratedFields(70-74)airbyte_cdk/test/models/connector_metadata/GitInfo.py (1)
GitInfo(12-31)airbyte_cdk/test/models/connector_metadata/ConnectorPackageInfo.py (1)
ConnectorPackageInfo(11-12)airbyte_cdk/test/models/connector_metadata/NormalizationDestinationDefinitionConfig.py (1)
NormalizationDestinationDefinitionConfig(9-24)
🪛 markdownlint-cli2 (0.18.1)
airbyte_cdk/test/models/connector_metadata/README.md
8-8: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (15)
airbyte_cdk/test/models/connector_metadata/README.md (2)
1-79: The documentation structure and examples are excellent!The README provides clear usage examples, describes the model generation workflow, and properly documents the schema source. The validation examples and field access patterns will be helpful for users.
21-21: The import path in the README is correct—no changes needed.The script confirms the package exists at
airbyte_cdk/test/models/connector_metadata/, which matches exactly what the README imports on line 21. The filesystem structure validates the documentation.The original concern about a path mismatch appears to have been based on PR objectives that differ from the actual implementation. The README accurately reflects the real package structure.
Likely an incorrect or invalid review comment.
airbyte_cdk/test/models/connector_metadata/SecretStore.py (1)
11-21: The model structure looks good!The use of
Extra.forbidis appropriate for security-sensitive secret store configuration, preventing accidental inclusion of unvalidated fields. TheLiteral["GSM"]type for the secret store ensures type safety.Quick check: Is the restriction to only GSM (Google Secret Manager) intentional for the current scope? Just want to confirm this isn't limiting support for other secret stores (AWS Secrets Manager, Azure Key Vault, etc.) that might be needed.
airbyte_cdk/test/models/connector_metadata/ConnectorMetrics.py (1)
30-36: Nice design: Union types provide flexibility!The use of
Union[str, Usage]andUnion[str, SyncSuccessRate]is a good pattern—it allows both string literals and typed enums, providing backwards compatibility while enabling type-safe code when using the enums directly.airbyte_cdk/test/models/connector_metadata/ConnectorIPCOptions.py (2)
12-20: Good IPC serialization and transport options!The enum choices make sense:
- Serialization: JSONL (human-readable), PROTOBUF (efficient), FLATBUFFERS (zero-copy)
- Transport: STDIO (standard streams), SOCKET (network/IPC sockets)
This provides a solid foundation for connector IPC configuration.
23-36: Strict validation is appropriate for IPC config.Using
Extra.forbidin bothDataChannelandConnectorIPCOptionsis the right choice for IPC configuration—it prevents misconfiguration from typos or unsupported fields. Making all DataChannel fields required ensures complete IPC specifications.bin/generate_component_manifest_files.py (1)
181-232: Nice refactoring to extract the common generation flow!The
generate_models_from_schemasfunction nicely consolidates the common logic for generating models from YAML schemas, making it reusable for both declarative components and metadata models. Thepost_processandmetadata_modelsflags provide clear control over the different post-processing paths.airbyte_cdk/test/models/connector_metadata/SuggestedStreams.py (1)
11-18: Model structure looks good!The
SuggestedStreamsmodel is well-defined with a clear description explaining the semantics of missing vs empty arrays. The use ofExtra.allowprovides forward compatibility for additional fields.airbyte_cdk/test/models/connector_metadata/ConnectorTestSuiteOptions.py (1)
52-63: ConnectorTestSuiteOptions model structure looks good.The model appropriately aggregates test suite configuration with required
suitefield and optional lists of secrets and test connections. The use ofExtra.forbidis appropriate for a configuration model.airbyte_cdk/test/models/connector_metadata/GeneratedFields.py (2)
43-46: Flexible typing withAnyfields.The
ConnectorMetricsmodel usesAnyfor theall,cloud, andossfields, which sacrifices type safety for schema flexibility. This is likely intentional to accommodate varying metric structures across environments, but it means invalid data won't be caught at validation time.If metric schemas are known and stable, consider using more specific types. Otherwise, this trade-off seems reasonable for extensibility.
70-74: GeneratedFields aggregation looks good.The model appropriately aggregates various generated metadata fields (git info, source file info, metrics, SBOM URL). The structure makes sense for generated/computed connector metadata.
airbyte_cdk/test/models/connector_metadata/Secret.py (1)
11-33: Confirmed: Systematic duplication of models across auto-generated files. Consider centralizing generation configuration.The duplication you identified is real and extends far beyond
SecretStoreandSecret—GitInfoappears in 8 files,RolloutConfigurationin 7, and the same pattern repeats across the generated models. This suggests the code generator is inlining these shared schemas rather than generating them once and importing them.Your suggestion about configuring
datamodel-codegento generate shared models is the right direction. Since these are auto-generated files, the fix should be in the generation configuration or source schema (not by modifying the generated output directly).Could you clarify: Is the generation configuration set up to inline these common models intentionally, or would this be worth investigating as part of the code generation pipeline improvement?
airbyte_cdk/test/models/connector_metadata/ConnectorBreakingChanges.py (1)
66-66: Regex pattern is intentional, not a limitation.The regex
^\d+\.\d+\.\d+$reflects a deliberate versioning strategy rather than an oversight. The project's publish workflow checks version format with the same pattern: versions matching the strict MAJOR.MINOR.PATCH format are marked as stable releases (IS_PRERELEASE=false), while versions with additional identifiers are handled separately as pre-releases. Breaking changes in this schema are scoped to production releases only, not pre-releases. Since the file is auto-generated from a YAML schema (datamodel-codegen), the pattern is maintained by the build process rather than manual editing anyway.No further action needed here—this is working as designed.
Likely an incorrect or invalid review comment.
airbyte_cdk/test/models/connector_metadata/ConnectorReleases.py (1)
87-91: Confirm semver rule: stable versions only for breakingChanges?Regex ^\d+.\d+.\d+$ excludes pre-releases/build metadata. Is this intentional (i.e., breaking changes only tracked on stable releases), or should RCs be allowed here too, wdyt?
airbyte_cdk/test/models/connector_metadata/ConnectorMetadataDefinitionV0.py (1)
65-76: <!-- [scratchpad_update]
[task verify_duplication 1/1]
[observations added]
- Web search confirms datamodel-codegen has
--reuse-modelflag (reuses models when module has same content)- Duplication is widespread across 8 distinct files + multiple redefinitions in ConnectorRegistry*.py
- Dedicated enum modules exist (ReleaseStage.py, SupportLevel.py, JobType.py) but redefinitions occur in 8 other places
- Review comment mentioned using datamodel-codegen's model reuse feature—this is a real capability
[analysis added]
- The original review comment's suggestion about datamodel-codegen's model reuse is technically valid
- The extent of duplication is more severe than what's in ConnectorMetadataDefinitionV0.py alone—it's a systemic issue
- The concern about "mypy friction" and drift is valid given 8 redundant definitions
- Standalone enum modules appear to exist but are not being imported by dependent files
[decision]
The review comment is essentially correct. The duplication is verified. This should be marked as since the suggestion requires team decision on implementation strategy (datamodel-codegen reuse vs manual consolidation).
[scratchpad_end] -->Verified: Extensive duplication of ReleaseStage/SupportLevel/JobType enums across the connector_metadata module
Your concern is spot-on. The script confirms these enums are redefined across 8+ files instead of being reused:
- ReleaseStage: 6 definitions (standalone module + ConnectorMetadataDefinitionV0, ConnectorRegistryV0, ConnectorRegistryDestinationDefinition, ConnectorRegistryReleases, ConnectorRegistrySourceDefinition)
- SupportLevel: 6 definitions (same pattern)
- JobType: 8 definitions (including ActorDefinitionResourceRequirements and RegistryOverrides)
The original suggestion to enable datamodel-codegen's
--reuse-modelflag to reuse models when a module has the model with the same content, or consolidate into a single shared module, remains a solid approach to eliminate drift and type-checking friction.Consider whether the code generation pipeline should enforce the
--reuse-modeloption or whether refactoring toward explicit imports from the dedicated enum modules (ReleaseStage.py, SupportLevel.py, JobType.py) fits your architecture better—which path appeals to you?
airbyte_cdk/test/models/connector_metadata/generated/AirbyteInternal.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/AllowedHosts.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/ConnectorBuildOptions.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/ConnectorIPCOptions.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/ConnectorMetrics.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/ConnectorRegistryV0.py
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/ConnectorReleases.py
Outdated
Show resolved
Hide resolved
...yte_cdk/test/models/connector_metadata/generated/NormalizationDestinationDefinitionConfig.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/RegistryOverrides.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/test/models/connector_metadata/generated/SupportLevel.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Devin, Great, that's much better, I think. Now, let's once again put the module until a 'generated' submodule breadcrumb. And import into airbyte_cdk/tests/models/init.py:
- the top level manifest file model (I don't know the name)
- the test suites definition model
This gives us most of what we would need to import for our use cases. Imports through the "generated" breadcrumb will still be possible but we can reduce surface area to maintain by only importing the models we need to call or instantiate directly.
…mports - Moved all generated models to connector_metadata/generated/ - Added __init__.py in connector_metadata to export key models - Updated airbyte_cdk/test/models/__init__.py to import: - ConnectorMetadataDefinitionV0 (top-level manifest model) - ConnectorTestSuiteOptions (test suites definition model) - Updated README with new import paths and examples - Updated build script to output to new location This reduces the API surface area while keeping all models accessible through the generated submodule. Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
bin/generate_component_manifest_files.py (3)
17-18: Master branch dependency affects build reproducibility.These URLs point to the
masterbranch, meaning every build fetches the latest schemas. If upstream introduces breaking changes, CDK builds will fail unpredictably.This concern was previously raised. The recommendation was to pin to a specific commit SHA or tag for reproducible builds.
41-77: Multiple reliability concerns remain unaddressed.Previous reviews identified several critical issues:
- Misleading error message (lines 55-59): Claims "Using cached schemas if available" but immediately re-raises, causing build failure with no actual fallback
- No GitHub authentication: Unauthenticated requests hit 60/hour rate limit vs. 5000/hour with auth
- No retry logic: Transient network errors cause immediate build failure
- No fallback mechanism: If GitHub is unavailable, builds always fail
These concerns were raised in earlier reviews and would significantly impact build reliability.
246-257: Missing error handling for schema download.If
download_metadata_schemasfails at line 249 (due to rate limits, network issues, etc.), users will see a cryptic stack trace without actionable guidance.A previous review suggested wrapping this call in try/except to catch specific exceptions and provide helpful error messages (e.g., "Set GITHUB_TOKEN environment variable to avoid rate limits" for 403 errors, "Check network connectivity" for network errors).
airbyte_cdk/test/models/connector_metadata/README.md (1)
8-10: Add language specifier to fenced code block.The code block showing the repository path should specify a language (e.g.,
text) for proper syntax highlighting and markdown lint compliance.This was flagged in a previous review.
🧹 Nitpick comments (1)
bin/generate_component_manifest_files.py (1)
224-231: Clarify mutually exclusive flags, wdyt?The
if/elif/elsestructure meanspost_processandmetadata_modelsare mutually exclusive—if both areTrue, only post-processing runs and metadata post-processing is skipped.Is this intentional? If so, consider adding a docstring note or an assertion at the function start to make the constraint explicit, or using an enum parameter instead of two booleans.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (31)
airbyte_cdk/test/models/connector_metadata/generated/ActorDefinitionResourceRequirements.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/AirbyteInternal.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/AllowedHosts.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorBreakingChanges.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorBuildOptions.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorIPCOptions.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorMetadataDefinitionV0.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorMetrics.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorPackageInfo.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorRegistryDestinationDefinition.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorRegistryReleases.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorRegistrySourceDefinition.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorRegistryV0.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorReleases.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ConnectorTestSuiteOptions.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/GeneratedFields.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/GitInfo.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/JobType.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/NormalizationDestinationDefinitionConfig.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/RegistryOverrides.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ReleaseStage.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/RemoteRegistries.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/ResourceRequirements.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/RolloutConfiguration.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/Secret.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/SecretStore.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/SourceFileInfo.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/SuggestedStreams.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/SupportLevel.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/TestConnections.pyis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/__init__.pyis excluded by!**/generated/**
📒 Files selected for processing (4)
airbyte_cdk/test/models/__init__.py(1 hunks)airbyte_cdk/test/models/connector_metadata/README.md(1 hunks)airbyte_cdk/test/models/connector_metadata/__init__.py(1 hunks)bin/generate_component_manifest_files.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte_cdk/test/models/__init__.py (2)
airbyte_cdk/test/models/connector_metadata/generated/ConnectorMetadataDefinitionV0.py (2)
ConnectorMetadataDefinitionV0(490-495)ConnectorTestSuiteOptions(337-348)airbyte_cdk/test/models/connector_metadata/generated/ConnectorTestSuiteOptions.py (1)
ConnectorTestSuiteOptions(52-63)
airbyte_cdk/test/models/connector_metadata/__init__.py (2)
airbyte_cdk/test/models/connector_metadata/generated/ConnectorMetadataDefinitionV0.py (2)
ConnectorMetadataDefinitionV0(490-495)ConnectorTestSuiteOptions(337-348)airbyte_cdk/test/models/connector_metadata/generated/ConnectorTestSuiteOptions.py (1)
ConnectorTestSuiteOptions(52-63)
🪛 markdownlint-cli2 (0.18.1)
airbyte_cdk/test/models/connector_metadata/README.md
8-8: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (7)
bin/generate_component_manifest_files.py (3)
30-32: Good refactoring for reusability!Parameterizing the directory path makes this function reusable for both local and downloaded schemas.
34-38: LGTM!Accepting
yaml_filesas a parameter improves testability and makes the function more flexible for different use cases.
160-179: LGTM!The post-processing logic correctly applies the pydantic.v1 compatibility layer to generated metadata models, consistent with the approach used for declarative component models.
airbyte_cdk/test/models/__init__.py (2)
4-7: LGTM!The imports correctly expose the new metadata models from the
connector_metadatapackage.
11-16: LGTM!The
__all__list is correctly updated to include the new metadata models, maintaining alphabetical ordering.airbyte_cdk/test/models/connector_metadata/README.md (1)
1-92: Excellent documentation!The README is comprehensive and well-structured, with clear usage examples covering validation, field access, and model imports. The examples accurately reflect the package structure.
airbyte_cdk/test/models/connector_metadata/__init__.py (1)
1-7: LGTM!The package initialization correctly re-exports the primary metadata models with proper
__all__definition, making them accessible via theconnector_metadatanamespace.
…tput - Modified build script to generate all models into a single models.py file - Added consolidated JSON schema generation (metadata_schema.json) - Updated imports to reference generated.models module - Removed 30+ individual model files in favor of single-file approach - Updated README to document new structure and outputs - Added GitHub token support to avoid rate limiting Co-Authored-By: AJ Steers <aj@airbyte.io>
Co-Authored-By: AJ Steers <aj@airbyte.io>
- Moved detailed documentation from README to CONTRIBUTING.md - Simplified README to just show usage and link to CONTRIBUTING.md - Added docstring to __init__.py pointing to CONTRIBUTING.md - Removed py.typed from connector_metadata (already exists at root) Co-Authored-By: AJ Steers <aj@airbyte.io>
|
Done! I've:
The documentation is now consolidated in CONTRIBUTING.md alongside the existing "Auto-Generating the Declarative Schema File" section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/test/models/connector_metadata/README.md(1 hunks)airbyte_cdk/test/models/connector_metadata/__init__.py(1 hunks)docs/CONTRIBUTING.md(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/CONTRIBUTING.md
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/test/models/connector_metadata/README.md
🧰 Additional context used
🪛 GitHub Actions: Generate Docs
airbyte_cdk/test/models/connector_metadata/__init__.py
[error] 8-8: ModuleNotFoundError: No module named 'airbyte_cdk.test.models.connector_metadata.generated.models' during docs generation (caused by importing ConnectorMetadataDefinitionV0, ConnectorTestSuiteOptions in init.py).
🪛 GitHub Actions: Pytest (Fast)
airbyte_cdk/test/models/connector_metadata/__init__.py
[error] 1-1: ModuleNotFoundError: No module named 'airbyte_cdk.test.models.connector_metadata.generated.models' during test collection. Ensure the generated models module exists or is generated before tests.
[error] 1-1: Import cascading failure from test module import caused pytest collection to fail.
🪛 GitHub Actions: PyTest Matrix
airbyte_cdk/test/models/connector_metadata/__init__.py
[error] 1-1: ModuleNotFoundError: No module named 'airbyte_cdk.test.models.connector_metadata.generated.models'. This prevents tests from importing ConnectorMetadata definitions during collection.
🪛 GitHub Actions: Test Connectors
airbyte_cdk/test/models/connector_metadata/__init__.py
[error] 8-8: ModuleNotFoundError: No module named 'airbyte_cdk.test.models.connector_metadata.generated.models'
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
🔇 Additional comments (2)
airbyte_cdk/test/models/connector_metadata/__init__.py (2)
1-6: LGTM!The docstring clearly documents that models are auto-generated and provides a helpful link to regeneration instructions. Nice documentation!
10-13: LGTM!The
__all__definition correctly exposes the public API. Once the import timing issue on line 8 is resolved, this will work as intended.
- Switch from downloading individual YAML files via GitHub API to using sparse git clone - Eliminates rate limiting issues (60 req/hour -> no API calls) - Fix single-file model generation to properly filter out relative imports - Add multi-line import block detection and filtering - Generate and commit metadata models and consolidated JSON schema artifacts Co-Authored-By: AJ Steers <aj@airbyte.io>
…schema.py Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
bin/generate_component_manifest_files.py (3)
40-73: Consider pinning to a stable ref for reproducible builds, wdyt?The git clone approach nicely avoids GitHub API rate limits from the previous HTTP-based implementation. However, cloning without a ref pin will fetch the default branch (likely
mainormaster), which means builds could break if upstream introduces schema changes or becomes temporarily unstable.Consider adding a ref parameter (defaulting to a known-good commit SHA or tag):
-def clone_metadata_schemas(temp_dir: Path) -> Path: +def clone_metadata_schemas(temp_dir: Path, ref: str = "a1b2c3d4e5f6") -> Path: """Clone metadata schema YAML files from GitHub using sparse checkout.""" repo_url = "https://github.com/airbytehq/airbyte.git" schema_path = "airbyte-ci/connectors/metadata_service/lib/metadata_service/models/src" clone_dir = temp_dir / "airbyte" print("Cloning metadata schemas from airbyte repo...", file=sys.stderr) subprocess.run( [ "git", "clone", "--depth", "1", "--filter=blob:none", "--sparse", + "--branch", + ref, repo_url, str(clone_dir), ], check=True, capture_output=True, )Also note: this requires
gitto be available in the build environment.
176-226: Shouldpost_processandmetadata_modelsbe mutually exclusive, wdyt?The branching logic at lines 219-226 gives
post_processprecedence overmetadata_modelsif both are True, but it's not immediately clear whether this is intentional or just defensive coding.Consider making the intent explicit:
- if post_process: + if post_process and metadata_models: + raise ValueError("post_process and metadata_models are mutually exclusive") + elif post_process: codegen_container = await post_process_codegen(codegen_container) await codegen_container.directory("/generated_post_processed").export(output_dir_path) elif metadata_models:Or if the precedence is intentional, a docstring comment would help clarify the behavior.
399-418: Consider adding error handling for common clone failures, wdyt?The
clone_metadata_schemascall can fail in several ways (missing git, network issues, repository access problems), and users would benefit from actionable error messages.Consider wrapping the metadata generation phase:
print("\nGenerating metadata models...", file=sys.stderr) with tempfile.TemporaryDirectory() as temp_dir: temp_path = Path(temp_dir) - schemas_dir = clone_metadata_schemas(temp_path) + try: + schemas_dir = clone_metadata_schemas(temp_path) + except subprocess.CalledProcessError as e: + print(f"\nError: Failed to clone metadata schemas from GitHub.", file=sys.stderr) + print(f"Details: {e}", file=sys.stderr) + print("Tip: Ensure git is installed and you have network connectivity.", file=sys.stderr) + raise + except Exception as e: + print(f"\nError: Failed to clone metadata schemas: {e}", file=sys.stderr) + raiseThis provides clearer guidance when builds fail.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
airbyte_cdk/test/models/connector_metadata/generated/metadata_schema.jsonis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/models.pyis excluded by!**/generated/**
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py(13 hunks)bin/generate_component_manifest_files.py(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- airbyte_cdk/sources/declarative/models/declarative_component_schema.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
🔇 Additional comments (6)
bin/generate_component_manifest_files.py (6)
3-18: LGTM on the new imports and constant!The additions support the new metadata models generation flow—subprocess for git operations, json/yaml for schema processing, and tempfile for temporary directories. All are well-chosen for the task.
29-30: Nice abstraction!Parameterizing the directory lookup is a clean way to support both declarative and metadata schemas.
33-37: Good refactor for flexibility!Passing in
yaml_filesrather than hardcoding the discovery makes this function more reusable across different schema sets.
157-173: Clean post-processing implementation!The metadata models don't need the deprecation handling or parameter name fixes, so this simplified version is appropriate.
229-264: Schema consolidation looks good!The approach of treating
ConnectorMetadataDefinitionV0as the root and other schemas as definitions is sensible, and the fallback for missing main schema is a nice safety net.
267-384: The generated file passes syntax validation, though full import verification is limited by the sandbox environment.Good news: the syntax check passed successfully, confirming the consolidated Python file is well-formed. However, the full import test couldn't complete in this environment due to missing dependencies (
dunamai), which is expected in the sandbox and not a reflection on your generated code.The original concerns still hold merit though: the import consolidation logic is complex and depends on
datamodel-codegen's output format staying stable. The multi-line import detection (checking for commas/parens on lines 334-335) should handle most cases, but if the tool's output format changes, the parsing logic may need adjustments.Would it make sense to add a brief comment in the code noting this dependency on
datamodel-codegen's format conventions, wdyt? And consider running a manual import test in your local environment to verify the full pipeline works end-to-end with your actual schemas?
- Exclude declarative_component_schema.py from formatting/linting - Exclude connector_metadata generated models from formatting/linting - Prevents accidental reformatting of auto-generated code Co-Authored-By: AJ Steers <aj@airbyte.io>
- Remove $id fields from definitions to ensure refs resolve to root - Fix .yaml refs to use #/definitions/ format - Replace custom type references with $ref - Skip invalid 'type: const' entries - Merge nested definitions into top-level definitions Co-Authored-By: AJ Steers <aj@airbyte.io>
- IDE validators expect definitions to not have $schema or $id fields - These fields should only exist at the root level - Fixes validation errors in IDEs when using the consolidated schema Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
mypy.ini (1)
24-26: Plugins configured under [tool.mypy] won’t load from mypy.ini; move them under [mypy], wdyt?In mypy.ini, the plugins line should live in the [mypy] section. Otherwise pydantic and pytest-mypy-plugins won’t be active.
-[tool.mypy] -plugins = ["pydantic.mypy", "pytest-mypy-plugins"] +; Load plugins from the main mypy section +[mypy] +plugins = pydantic.mypy, pytest-mypy-pluginsAlso, since you’re excluding generated code already, this won’t impact those paths. Want me to open a follow-up to tighten ignores after verifying plugin behavior? Wdyt?
♻️ Duplicate comments (1)
bin/generate_component_manifest_files.py (1)
40-74: Pin the schema ref for reproducible builds and allow override via env, wdyt?Cloning the default branch makes builds non-deterministic. Could we pin to a commit/tag and support an env override (AIRBYTE_METADATA_SCHEMAS_REF) while keeping sparse checkout? Example:
+import os @@ -def clone_metadata_schemas(temp_dir: Path) -> Path: +def clone_metadata_schemas(temp_dir: Path) -> Path: """Clone metadata schema YAML files from GitHub using sparse checkout.""" - repo_url = "https://github.com/airbytehq/airbyte.git" + repo_url = "https://github.com/airbytehq/airbyte.git" + # Pin to a specific ref (commit SHA or tag) for reproducibility; allow override + ref = os.environ.get("AIRBYTE_METADATA_SCHEMAS_REF", "master") @@ - subprocess.run( - [ - "git", - "clone", - "--depth", - "1", - "--filter=blob:none", - "--sparse", - repo_url, - str(clone_dir), - ], - check=True, - capture_output=True, - ) + subprocess.run( + [ + "git", + "clone", + "--depth", + "1", + "--filter=blob:none", + "--sparse", + "--branch", + ref, + repo_url, + str(clone_dir), + ], + check=True, + capture_output=True, + text=True, + ) @@ - subprocess.run( - ["git", "-C", str(clone_dir), "sparse-checkout", "set", schema_path], - check=True, - capture_output=True, - ) + subprocess.run( + ["git", "-C", str(clone_dir), "sparse-checkout", "set", schema_path], + check=True, + capture_output=True, + text=True, + )Optionally, would you want to support authenticated clones if GITHUB_TOKEN is set (to avoid anonymous fetch limits), e.g., by rewriting repo_url at runtime? Wdyt?
🧹 Nitpick comments (5)
bin/generate_component_manifest_files.py (4)
192-196: Include nested YAML files to avoid missing schemas, wdyt?If schemas are nested in subfolders, the current include pattern will skip them. Shall we switch to a recursive glob?
- .with_mounted_directory( - "/yaml", dagger_client.host().directory(yaml_dir_path, include=["*.yaml"]) - ) + .with_mounted_directory( + "/yaml", dagger_client.host().directory(yaml_dir_path, include=["**/*.yaml"]) + )Could you confirm whether upstream metadata schemas ever appear in subdirectories?
317-319: Match the recursive YAML include here too for consistency, wdyt?- .with_mounted_directory( - "/yaml", dagger_client.host().directory(yaml_dir_path, include=["*.yaml"]) - ) + .with_mounted_directory( + "/yaml", dagger_client.host().directory(yaml_dir_path, include=["**/*.yaml"]) + )
18-18: Is the metadata output path under test/ intentional, or should this live under the new runtime package, wdyt?Currently set to airbyte_cdk/test/models/connector_metadata/generated. If the intent is to ship models in airbyte_cdk.metadata_models, should the generator write there instead (and keep tests importing from that package)? Or is this path just for generation/testing? Can we clarify in README and possibly make it configurable?
49-62: Minor: avoid swallowing useful git stderr on failure, wdyt?You capture output but never surface it unless an exception is caught upstream. If we keep capture_output=True, shall we print e.stderr in the except path (as in the earlier suggestion) so users see the git error?
mypy.ini (1)
15-16: Exclude regex looks fine; consider anchoring and normalizing separators, wdyt?Optional nit: to reduce accidental matches, you could anchor and allow either slash:
-exclude = (unit_tests/|airbyte_cdk/sources/declarative/models/declarative_component_schema\.py|airbyte_cdk/test/models/connector_metadata/generated/) +exclude = (^unit_tests/|^airbyte_cdk[\\/ ]sources[\\/ ]declarative[\\/ ]models[\\/ ]declarative_component_schema\.py|^airbyte_cdk[\\/ ]test[\\/ ]models[\\/ ]connector_metadata[\\/ ]generated/)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
airbyte_cdk/test/models/connector_metadata/generated/metadata_schema.jsonis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/models.pyis excluded by!**/generated/**
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py(13 hunks)bin/generate_component_manifest_files.py(3 hunks)mypy.ini(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- airbyte_cdk/sources/declarative/models/declarative_component_schema.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Applied to files:
mypy.ini
🪛 GitHub Actions: Linters
bin/generate_component_manifest_files.py
[error] 237-285: Ruff formatting check produced diffs. 1 file would be reformatted. Exit code 1.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
🔇 Additional comments (1)
bin/generate_component_manifest_files.py (1)
426-461: Ruff formatting verified and applied successfully.The verification confirms that ruff found and fixed formatting issues in the file. The output shows "1 file reformatted" and "All checks passed!" — the formatting concern you raised has been addressed and the code now meets all style requirements.
The file is ready to commit (the git commit command failed only due to a sandbox environment limitation, not a code issue). Are you good to push this update, wdyt?
| all_schema_names = set(schemas.keys()) | ||
|
|
||
| for schema_content in schemas.values(): | ||
| if isinstance(schema_content, dict) and "definitions" in schema_content: | ||
| all_schema_names.update(schema_content["definitions"].keys()) | ||
|
|
||
| def fix_refs(obj, in_definition=False): | ||
| """Recursively fix $ref and type references in schema objects.""" | ||
| if isinstance(obj, dict): | ||
| new_obj = {} | ||
| for key, value in obj.items(): | ||
| if (key == "$id" or key == "$schema") and in_definition: | ||
| continue | ||
| elif key == "$ref" and isinstance(value, str): | ||
| if value.endswith(".yaml"): | ||
| schema_name = value.replace(".yaml", "") | ||
| new_obj[key] = f"#/definitions/{schema_name}" | ||
| else: | ||
| new_obj[key] = value | ||
| elif key == "type" and isinstance(value, str) and value in all_schema_names: | ||
| new_obj["$ref"] = f"#/definitions/{value}" | ||
| elif key == "type" and value == "const": | ||
| pass | ||
| else: | ||
| new_obj[key] = fix_refs(value, in_definition=in_definition) | ||
| return new_obj | ||
| elif isinstance(obj, list): | ||
| return [fix_refs(item, in_definition=in_definition) for item in obj] | ||
| else: | ||
| return obj | ||
|
|
||
| # Find the main schema (ConnectorMetadataDefinitionV0) | ||
| main_schema = schemas.get("ConnectorMetadataDefinitionV0") | ||
|
|
||
| if main_schema: | ||
| # Create a consolidated schema with definitions | ||
| consolidated = { | ||
| "$schema": main_schema.get("$schema", "http://json-schema.org/draft-07/schema#"), | ||
| "title": "Connector Metadata Schema", | ||
| "description": "Consolidated JSON schema for Airbyte connector metadata validation", | ||
| **main_schema, | ||
| "definitions": {}, | ||
| } | ||
|
|
||
| # Add all schemas (including their internal definitions) as top-level definitions | ||
| for schema_name, schema_content in schemas.items(): | ||
| if schema_name != "ConnectorMetadataDefinitionV0": | ||
| if isinstance(schema_content, dict) and "definitions" in schema_content: | ||
| for def_name, def_content in schema_content["definitions"].items(): | ||
| consolidated["definitions"][def_name] = fix_refs(def_content, in_definition=True) | ||
| schema_without_defs = {k: v for k, v in schema_content.items() if k != "definitions"} | ||
| consolidated["definitions"][schema_name] = fix_refs(schema_without_defs, in_definition=True) | ||
| else: | ||
| consolidated["definitions"][schema_name] = fix_refs(schema_content, in_definition=True) | ||
|
|
||
| consolidated = fix_refs(consolidated, in_definition=False) | ||
|
|
||
| Path(output_json_path).write_text(json.dumps(consolidated, indent=2)) | ||
| print(f"Generated consolidated JSON schema: {output_json_path}", file=sys.stderr) | ||
| else: | ||
| print( | ||
| "Warning: ConnectorMetadataDefinitionV0 not found, generating simple consolidation", | ||
| file=sys.stderr, | ||
| ) | ||
| Path(output_json_path).write_text(json.dumps(schemas, indent=2)) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix consolidation correctness: preserve main definitions, avoid clobbering JSON primitives, and support .yaml refs with fragments, wdyt?
Two issues:
- Main schema’s internal definitions are overwritten, then never re-added.
- Converting any type that matches a schema name into a $ref can break primitives like object/string.
Patch proposal:
@@
- all_schema_names = set(schemas.keys())
-
+ all_schema_names = set(schemas.keys())
+ json_primitives = {"string", "number", "integer", "boolean", "object", "array", "null"}
@@
- def fix_refs(obj, in_definition=False):
+ def fix_refs(obj, in_definition=False):
"""Recursively fix $ref and type references in schema objects."""
if isinstance(obj, dict):
new_obj = {}
for key, value in obj.items():
if (key == "$id" or key == "$schema") and in_definition:
continue
elif key == "$ref" and isinstance(value, str):
- if value.endswith(".yaml"):
- schema_name = value.replace(".yaml", "")
- new_obj[key] = f"#/definitions/{schema_name}"
+ # Map file refs like Foo.yaml or Foo.yaml#/Bar to #/definitions/Foo[#/Bar]
+ m = re.match(r"(?:.*/)?(?P<name>[^/#]+)\.yaml(?P<frag>#.*)?$", value)
+ if m:
+ schema_name = m.group("name")
+ frag = m.group("frag") or ""
+ new_obj[key] = f"#/definitions/{schema_name}{frag}"
else:
new_obj[key] = value
- elif key == "type" and isinstance(value, str) and value in all_schema_names:
- new_obj["$ref"] = f"#/definitions/{value}"
+ elif key == "type" and isinstance(value, str):
+ # Only rewrite to $ref if it's not a JSON primitive
+ if value in all_schema_names and value not in json_primitives:
+ new_obj["$ref"] = f"#/definitions/{value}"
+ else:
+ new_obj[key] = value
elif key == "type" and value == "const":
pass
else:
new_obj[key] = fix_refs(value, in_definition=in_definition)
return new_obj
@@
- if main_schema:
+ if main_schema:
# Create a consolidated schema with definitions
- consolidated = {
- "$schema": main_schema.get("$schema", "http://json-schema.org/draft-07/schema#"),
- "title": "Connector Metadata Schema",
- "description": "Consolidated JSON schema for Airbyte connector metadata validation",
- **main_schema,
- "definitions": {},
- }
+ consolidated = dict(main_schema) # shallow copy
+ consolidated.setdefault("$schema", "http://json-schema.org/draft-07/schema#")
+ consolidated.setdefault("title", "Connector Metadata Schema")
+ consolidated.setdefault("description", "Consolidated JSON schema for Airbyte connector metadata validation")
+ # Preserve existing main-schema definitions
+ consolidated_definitions = dict(consolidated.get("definitions", {}))
@@
- for schema_name, schema_content in schemas.items():
- if schema_name != "ConnectorMetadataDefinitionV0":
+ for schema_name, schema_content in schemas.items():
+ if schema_name != "ConnectorMetadataDefinitionV0":
if isinstance(schema_content, dict) and "definitions" in schema_content:
for def_name, def_content in schema_content["definitions"].items():
- consolidated["definitions"][def_name] = fix_refs(def_content, in_definition=True)
+ consolidated_definitions[def_name] = fix_refs(def_content, in_definition=True)
schema_without_defs = {k: v for k, v in schema_content.items() if k != "definitions"}
- consolidated["definitions"][schema_name] = fix_refs(schema_without_defs, in_definition=True)
+ consolidated_definitions[schema_name] = fix_refs(schema_without_defs, in_definition=True)
else:
- consolidated["definitions"][schema_name] = fix_refs(schema_content, in_definition=True)
+ consolidated_definitions[schema_name] = fix_refs(schema_content, in_definition=True)
+
+ consolidated["definitions"] = consolidated_definitions
@@
- consolidated = fix_refs(consolidated, in_definition=False)
+ consolidated = fix_refs(consolidated, in_definition=False)Would you like me to add a small sanity check that ensures $ref targets exist in definitions and warn if not, to aid debugging? Wdyt?
🤖 Prompt for AI Agents
In bin/generate_component_manifest_files.py around lines 239-304, the
consolidation currently overwrites the main schema's internal definitions and
naively converts any "type" string matching a schema name into a $ref (breaking
primitives), and it only handles .yaml refs without fragments. Preserve the main
schema's original "definitions" by merging them into consolidated["definitions"]
before adding other schemas; when converting "type" to "$ref", only do so if the
type is not a JSON primitive (object, array, string, number, integer, boolean,
null) and the target definition actually exists; support .yaml refs with
fragments by parsing values like "other.yaml#/definitions/X" into
"#/definitions/X" (and strip just ".yaml" when there is no fragment); and add an
optional sanity check that warns if a $ref target is missing from
consolidated["definitions"] to aid debugging.
| with tempfile.TemporaryDirectory() as temp_dir: | ||
| temp_path = Path(temp_dir) | ||
| schemas_dir = clone_metadata_schemas(temp_path) | ||
|
|
||
| output_dir = Path(LOCAL_METADATA_OUTPUT_DIR_PATH) | ||
| output_dir.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| print("Generating single Python file with all models...", file=sys.stderr) | ||
| output_file = str(output_dir / "models.py") | ||
| await generate_metadata_models_single_file( | ||
| dagger_client=dagger_client, | ||
| yaml_dir_path=str(schemas_dir), | ||
| output_file_path=output_file, | ||
| ) | ||
|
|
||
| print("Generating consolidated JSON schema...", file=sys.stderr) | ||
| json_schema_file = str(output_dir / "metadata_schema.json") | ||
| consolidate_yaml_schemas_to_json(schemas_dir, json_schema_file) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap clone and codegen in helpful error handling with clear remediation tips, wdyt?
A failed git clone/sparse-checkout currently raises without guidance. Could we catch errors and print actionable hints (set AIRBYTE_METADATA_SCHEMAS_REF, check network, optional GITHUB_TOKEN), then re-raise?
print("\nGenerating metadata models...", file=sys.stderr)
with tempfile.TemporaryDirectory() as temp_dir:
temp_path = Path(temp_dir)
- schemas_dir = clone_metadata_schemas(temp_path)
+ try:
+ schemas_dir = clone_metadata_schemas(temp_path)
+ except subprocess.CalledProcessError as e:
+ print("\nError: Failed to clone metadata schemas.", file=sys.stderr)
+ if e.stderr:
+ print(e.stderr, file=sys.stderr)
+ print("Tips:", file=sys.stderr)
+ print(" - Ensure network access from CI.", file=sys.stderr)
+ print(" - Pin a stable ref via AIRBYTE_METADATA_SCHEMAS_REF=<commit|tag>.", file=sys.stderr)
+ print(" - Optionally use GITHUB_TOKEN for higher reliability.", file=sys.stderr)
+ raise🤖 Prompt for AI Agents
In bin/generate_component_manifest_files.py around lines 439 to 457, wrap the
clone_metadata_schemas call and the subsequent codegen steps
(generate_metadata_models_single_file and consolidate_yaml_schemas_to_json) in a
try/except block that catches Exception, logs a clear, actionable error message
that suggests checking/setting AIRBYTE_METADATA_SCHEMAS_REF, verifying network
access, and optionally providing GITHUB_TOKEN for authenticated checkout, then
re-raises the exception; ensure the TemporaryDirectory context is preserved so
cleanup still occurs and keep the original exception chained when re-raising for
debugging.
- Created new bin/generate_connector_metadata_files.py for metadata models - Reverted bin/generate_component_manifest_files.py to main (no changes) - Updated build script to call both generators - Keeps concerns separated for easier review and maintenance Co-Authored-By: AJ Steers <aj@airbyte.io>
| set -e | ||
|
|
||
| python bin/generate_component_manifest_files.py | ||
| python bin/generate_connector_metadata_files.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Devin, let's revert this and move it upstream - maybe into to poe task?
- Added assemble-declarative and assemble-metadata poe tasks - Reverted shell script to main (no changes) - Updated assemble task to call both generators - Addresses PR feedback to move script call upstream Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (5)
bin/generate_connector_metadata_files.py (4)
120-132: Import bucketing misses typing_extensions.Generated code often uses typing_extensions (e.g., Literal, TypedDict). It’ll fall through into class content. Should we treat it as third‑party imports to keep imports together? wdyt?
Apply this diff:
- elif line.startswith("from pydantic") or line.startswith("import "): + elif ( + line.startswith("from pydantic") + or line.startswith("from typing_extensions") + or line.startswith("import ") + ): third_party_imports.add(line)
156-159: Also rewrite “import pydantic” to v1 for completeness.If any generated file uses “import pydantic”, it won’t be redirected to pydantic.v1. Shall we handle both forms with a regex? wdyt?
Apply this diff (and add the import):
@@ -import json +import json +import re @@ - post_processed_content = final_content.replace("from pydantic", "from pydantic.v1") + post_processed_content = re.sub(r"\bfrom pydantic\b", "from pydantic.v1", final_content) + post_processed_content = re.sub( + r"(?m)^\s*import pydantic\b", "import pydantic.v1 as pydantic", post_processed_content + )
85-99: Optionally pin target Python for codegen.Since we run on python:3.10, would adding “--target-python-version 3.10” reduce surprises from default inference? wdyt?
Apply this diff:
"datamodel-codegen", "--input", "/yaml", "--output", "/generated_temp", + "--target-python-version", + "3.10",
33-65: Support local schema overrides and ref pinning to improve CI determinism and resilience.Currently, the schema clone always fetches from the airbyte monorepo at depth 1 without any control over the git ref, which can be unpredictable in CI and vulnerable to GitHub rate limits. Adding environment variable support would help:
- Local override (
AIRBYTE_SCHEMAS_DIR): Useful for testing and offline builds- Ref pinning (
AIRBYTE_SCHEMAS_REF): Enables reproducible builds against specific commits/branches- Future token support: Would mitigate 403 rate-limit errors from GitHub API
Here's a minimal path forward—would adding these env var checks help your CI workflow, wdyt?
+import os def clone_schemas_from_github(temp_dir: Path) -> Path: """Clone metadata schema YAML files from GitHub using sparse checkout.""" clone_dir = temp_dir / "airbyte" + local_override = Path(os.getenv("AIRBYTE_SCHEMAS_DIR", "")) if os.getenv("AIRBYTE_SCHEMAS_DIR") else None + ref = os.getenv("AIRBYTE_SCHEMAS_REF") # e.g., a commit SHA or branch + if local_override and local_override.exists(): + print(f"Using local schemas from {local_override}", file=sys.stderr) + return local_override print("Cloning metadata schemas from airbyte repo...", file=sys.stderr) subprocess.run( [ "git", "clone", "--depth", "1", "--filter=blob:none", "--sparse", + *(["--branch", ref] if ref else []), AIRBYTE_REPO_URL, str(clone_dir), ], check=True, capture_output=True, )Then document
AIRBYTE_SCHEMAS_DIRandAIRBYTE_SCHEMAS_REFin the README—and optionally test withAIRBYTE_SCHEMAS_REF=main python bin/generate_connector_metadata_files.pyto confirm the behavior is what you expect.bin/generate-component-manifest-dagger.sh (1)
8-11: Harden the bash script with strict flags.To fail fast and catch unset vars/pipes, shall we use -Eeuo pipefail? wdyt?
Apply this diff:
-set -e +set -Eeuo pipefail
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
bin/generate-component-manifest-dagger.sh(1 hunks)bin/generate_connector_metadata_files.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
bin/generate_connector_metadata_files.py (1)
bin/generate_component_manifest_files.py (1)
main(113-155)
🪛 GitHub Actions: Linters
bin/generate_connector_metadata_files.py
[error] 243-251: Ruff format check failed: 1 file would be reformatted (bin/generate_connector_metadata_files.py). Ruff exited with code 1.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pyproject.toml (1)
147-149: Consider adding GitHub rate-limit mitigation to build configuration.Per the PR description,
assemble-metadatadownloads JSON Schema YAML files from GitHub on-demand without authentication, which can hit rate limits (~60 requests/hour for unauthenticated requests). This could cause CI builds to fail intermittently, especially in fork workflows or frequent rebuilds.Would it be worth documenting or implementing one of these mitigations in the build config?
- Environment variable for GitHub token:
GITHUB_TOKEN→ pass to the generation script- Local caching/fallback mechanism for schema files
- Retry logic with exponential backoff (already available via
backoffpackage, which is a dependency)Consider documenting the optional
GITHUB_TOKENenvironment variable in a build or generation script, and potentially storing schema files in a cache directory that persists across builds (e.g.,.cache/schemas/). This is especially important for CI/CD workflows.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pyproject.toml(1 hunks)
🔇 Additional comments (1)
pyproject.toml (1)
147-149: Task sequencing looks correct; confirm implicit dependency is intended.The
assembletask now sequencesassemble-declarative→assemble-metadata. This order makes sense logically, but is there an actual data dependency (i.e., does the metadata generation require outputs from declarative generation)? If not, consider documenting the rationale or allowing them to run in parallel for faster builds.Could you clarify: Does
assemble-metadatadepend on any outputs fromassemble-declarative, or are they independent? Wdyt?
| codegen_container = ( | ||
| dagger_client.container() | ||
| .from_(PYTHON_IMAGE) | ||
| .with_exec(["mkdir", "-p", "/generated_temp"], use_entrypoint=True) | ||
| .with_exec(["pip", "install", " ".join(PIP_DEPENDENCIES)], use_entrypoint=True) | ||
| .with_mounted_directory( | ||
| "/yaml", dagger_client.host().directory(yaml_dir_path, include=["*.yaml"]) | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Devin, we should not use Dagger here (read: never use Dagger in new code). You can use uvx if you need to run with something that isn't part of our project's dev dependencies.
Or, you can just use uv script syntax with PEP 723 for inline dependency declarations for any libraries you need this script to have direct access to call. (I don't know if that solves for CLI based invocation from the script though.)
|
|
||
|
|
||
| async def main(): | ||
| async with dagger.Connection(dagger.Config(log_output=sys.stderr)) as dagger_client: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. Dagger disallowed in new code.
- Removed Dagger dependency from new script - Use uvx to run datamodel-codegen directly - Simplified script significantly (no async, no container orchestration) - Script is now ~280 lines vs ~290 lines with Dagger - Addresses PR feedback to avoid Dagger in new code Co-Authored-By: AJ Steers <aj@airbyte.io>
- Changed order of operations: YAML → JSON → Python (instead of YAML → Python → merge) - Removed all manual Python file merging logic (~100 lines of code removed) - datamodel-codegen now generates single file directly from consolidated JSON - Much simpler and more maintainable - Addresses AJ's feedback to avoid manual Python code manipulation Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
bin/generate_connector_metadata_files.py (4)
1-281: Run ruff format to fix formatting issues.The pipeline is blocked because ruff format detected formatting changes. This was previously flagged and needs to be addressed.
Run the following locally to fix:
ruff format bin/generate_connector_metadata_files.py
183-183: Use recursive glob to catch nested YAML schemas.Only top-level
*.yamlfiles are matched here, which risks missing nested schemas. This was previously flagged—would switching to.rglob("*.yaml")work? wdyt?Apply this diff:
- for yaml_file in yaml_dir_path.glob("*.yaml"): + for yaml_file in sorted(yaml_dir_path.rglob("*.yaml")):
201-206: Handle $ref fragments like "Foo.yaml#/definitions/Bar".References with fragments (e.g.,
"SomeSchema.yaml#/definitions/SomeType") won't be rewritten correctly. This was previously flagged—should we handle the fragment portion explicitly? wdyt?Consider applying this diff to handle fragments:
elif key == "$ref" and isinstance(value, str): - if value.endswith(".yaml"): - schema_name = value.replace(".yaml", "") - new_obj[key] = f"#/definitions/{schema_name}" + if ".yaml" in value: + # Handle both "Foo.yaml" and "Foo.yaml#/definitions/Bar" + base, sep, fragment = value.partition("#") + schema_name = base.replace(".yaml", "") + if fragment: + # Preserve fragment after rewriting base + new_obj[key] = f"#/definitions/{schema_name}{sep}{fragment}" + else: + new_obj[key] = f"#/definitions/{schema_name}" else: new_obj[key] = value
224-242: Merge main schema's internal definitions into the consolidated output.The spread operator on line 228 will overwrite the custom
titleanddescription, and the main schema's internaldefinitionsaren't being merged intoconsolidated["definitions"]. This was previously flagged—should we reorder the spread and explicitly merge the main schema's definitions? wdyt?Apply this diff:
# Create a consolidated schema with definitions consolidated = { + **main_schema, "$schema": main_schema.get("$schema", "http://json-schema.org/draft-07/schema#"), "title": "Connector Metadata Schema", "description": "Consolidated JSON schema for Airbyte connector metadata validation", - **main_schema, "definitions": {}, } + + # Preserve any internal definitions from the main schema + if "definitions" in main_schema: + for def_name, def_content in main_schema["definitions"].items(): + consolidated["definitions"][def_name] = fix_refs(def_content, in_definition=True) # Add all schemas (including their internal definitions) as top-level definitions for schema_name, schema_content in schemas.items():
🧹 Nitpick comments (2)
bin/generate_connector_metadata_files.py (2)
255-278: Consider adding error handling for network failures.The script will crash immediately on any subprocess failure. While
check=Trueensures failures are caught, there's no retry logic for transient network issues or validation that the cloned directory contains schemas. Would it be worth adding basic error handling and validation, or is fail-fast acceptable for this build script? wdyt?Example validation you could add:
with tempfile.TemporaryDirectory() as temp_dir: temp_path = Path(temp_dir) schemas_dir = clone_schemas_from_github(temp_path) + + # Validate that we actually got some schemas + yaml_files = list(schemas_dir.rglob("*.yaml")) + if not yaml_files: + print(f"Error: No YAML schemas found in {schemas_dir}", file=sys.stderr) + sys.exit(1) + print(f"Found {len(yaml_files)} YAML schema file(s)", file=sys.stderr)
99-176: The import filtering logic works but has some duplication.The logic for detecting and filtering relative import blocks appears twice (lines 114-138 and 155-173). Both sections handle multi-line imports ending with
)or,. Would consolidating this into a helper function make the code more maintainable? wdyt?Example helper:
def is_relative_import_line(line: str) -> bool: """Check if a line is part of a relative import statement.""" stripped = line.strip() return stripped.startswith("from .") or stripped.startswith("import .") def skip_relative_import_block(lines: list[str], start_idx: int) -> int: """Skip a relative import block and return the index after it.""" if not lines[start_idx].rstrip().endswith(",") and not lines[start_idx].rstrip().endswith("("): return start_idx + 1 idx = start_idx + 1 while idx < len(lines): if lines[idx].strip().endswith(")"): return idx + 1 idx += 1 return idx
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
bin/generate_connector_metadata_files.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
bin/generate_connector_metadata_files.py (1)
bin/generate_component_manifest_files.py (1)
main(113-155)
🪛 GitHub Actions: Linters
bin/generate_connector_metadata_files.py
[error] 1-1: ruff format --diff detected formatting changes required. 1 file would be reformatted. Run 'ruff format' to fix code style issues.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
🔇 Additional comments (3)
bin/generate_connector_metadata_files.py (3)
76-97: Good use of uvx instead of Dagger!The implementation correctly uses
uvxto run datamodel-codegen with a pinned version, which aligns with the feedback to avoid Dagger in new code. The command-line arguments look appropriate for generating the models.
20-24: Good practice checking for pyyaml availability.The try/except block with a clear error message is helpful for users who might be missing the dependency.
207-210: Verification complete—the type→$ref conversion is correct.The script found actual examples in Airbyte's schema YAML files:
ConnectorMetrics.yamlusestype: ConnectorMetric(a schema name, not a JSON Schema primitive). This confirms the conversion logic is intentional and necessary for handling Airbyte's custom schema structure. The code correctly transforms these schema name references into$refpointers.
- Preserve main schema's internal definitions (don't overwrite them) - Only convert custom types to $ref, not JSON primitives (string, object, etc.) - Support YAML refs with fragments (e.g., Foo.yaml#/Bar) - Use regex to properly parse .yaml references with optional fragments Addresses CodeRabbit's critical feedback on bin/generate_connector_metadata_files.py Co-Authored-By: AJ Steers <aj@airbyte.io>
- Added type hints to fix_refs function and main function - Fixed ruff formatting issues - All mypy checks now pass for the new script Co-Authored-By: AJ Steers <aj@airbyte.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
bin/generate_connector_metadata_files.py (3)
40-53: Consider adding timeout to subprocess calls.The git clone operation has no timeout, so it could hang indefinitely if the GitHub server is unresponsive. Would it make sense to add a reasonable timeout (e.g., 300 seconds) to avoid hanging the build? wdyt?
Apply this diff:
subprocess.run( [ "git", "clone", "--depth", "1", "--filter=blob:none", "--sparse", AIRBYTE_REPO_URL, str(clone_dir), ], check=True, capture_output=True, + timeout=300, )Similarly for the sparse-checkout command at line 55-59.
163-186: Consider adding timeout to subprocess call.Similar to the git operations, the datamodel-codegen invocation has no timeout. Would it make sense to add one to prevent hanging builds? wdyt?
Apply this diff:
subprocess.run( [ "uvx", "--from", f"datamodel-code-generator=={DATAMODEL_CODEGEN_VERSION}", "datamodel-codegen", "--input", str(json_schema_path), "--output", str(output_file_path), "--input-file-type", "jsonschema", "--disable-timestamp", "--enum-field-as-literal", "one", "--set-default-enum-member", "--use-double-quotes", "--remove-special-field-name-prefix", "--field-extra-keys", "deprecated", "deprecation_message", ], check=True, + timeout=600, )
188-190: Fragile pydantic import rewriting could break in edge cases.The simple string replacement
content.replace("from pydantic", "from pydantic.v1")will incorrectly modify comments, docstrings, or string literals containing that phrase. While unlikely with generated code, would a regex-based approach targeting actual import statements be more robust? wdyt?Apply this diff to use a more targeted regex replacement:
content = output_file_path.read_text() - content = content.replace("from pydantic", "from pydantic.v1") + # Rewrite pydantic imports to use v1 compatibility layer + content = re.sub( + r'^(\s*)from pydantic import', + r'\1from pydantic.v1 import', + content, + flags=re.MULTILINE + ) + content = re.sub( + r'^(\s*)import pydantic\b', + r'\1import pydantic.v1 as pydantic', + content, + flags=re.MULTILINE + ) output_file_path.write_text(content)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
airbyte_cdk/test/models/connector_metadata/generated/metadata_schema.jsonis excluded by!**/generated/**airbyte_cdk/test/models/connector_metadata/generated/models.pyis excluded by!**/generated/**
📒 Files selected for processing (1)
bin/generate_connector_metadata_files.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
bin/generate_connector_metadata_files.py (1)
bin/generate_component_manifest_files.py (1)
main(113-155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-intercom
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-pokeapi
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
bin/generate_connector_metadata_files.py (2)
195-214: LGTM! Clean orchestration logic.The main function properly uses a temporary directory context manager and orchestrates the steps logically with helpful progress messages to stderr.
103-104: Remove or document the unusedtype == "const"branch.The code silently drops entries where
typeequals"const"(lines 103-104), but our search found zero instances of this pattern in any source YAML schemas. Either this is dead code that can be removed, or it's defensive handling for a malformed edge case that should be documented with a comment explaining why it exists and how it should be handled. Which is it? If this is intentional, would you mind adding a comment clarifying the edge case and perhaps logging a warning when it occurs? That way future maintainers understand the rationale. wdyt?Likely an incorrect or invalid review comment.
feat: Add metadata models package with dynamic schema download
Summary
This PR adds a new
airbyte_cdk.metadata_modelspackage containing auto-generated Pydantic models for validating connectormetadata.yamlfiles. The models are generated from JSON Schema YAML files maintained in the airbytehq/airbyte repository.Key changes:
airbyte_cdk/metadata_models/package with generated Pydantic models (~3400 lines)bin/generate_component_manifest_files.pyto download schemas from GitHub on-demand during buildpydantic.v1compatibility layer for consistency with existing declarative component modelspy.typedmarker for type hint complianceUsage example:
Review & Testing Checklist for Human
This is a yellow risk PR - the implementation is straightforward but has network dependencies and generated code that need verification:
metadata.yamlfile to ensure the generated models work correctlypoetry run poe buildin CI to ensure it works in that environment (may hit GitHub rate limits)Notes
Requested by: AJ Steers (aj@airbyte.io) / @aaronsteers
Devin session: https://app.devin.ai/sessions/9b487ed33f5842c087a1de30d33db888
Known limitations:
Why this approach:
datamodel-code-generatorinfrastructure used for declarative component modelsSummary by CodeRabbit
New Features
Documentation
Chores
Style