Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 37 additions & 8 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,32 +65,61 @@ pydantic2linkml [OPTIONS] MODULE_NAMES...
pydantic2linkml -o output.yml -l INFO dandischema.models
```

Options: `--output-file`/`-o` (path), `--log-level`/`-l` (default: WARNING).
Options:

- `--output-file`/`-o` (path) — write output to a file instead of stdout
- `--merge-file`/`-M` (path) — deep-merge a YAML file into the generated
schema; values from the file win on conflict; no field filtering applied
- `--overlay-file`/`-O` (path) — shallow-merge a YAML file into the
generated schema; only `SchemaDefinition` fields are applied; unknown
keys are skipped with a warning
- `--log-level`/`-l` (default: WARNING)

## Architecture

### Core Translation Pipeline

1. **`tools.py`** — Low-level utilities for introspecting Pydantic internals:
1. **`tools.py`** — Low-level utilities for introspecting Pydantic internals
and post-processing the generated schema YAML:
- `get_all_modules()` — imports modules and collects them with submodules
- `fetch_defs()` — extracts `BaseModel` subclasses and `Enum` subclasses from modules
- `get_field_schema()` / `get_locally_defined_fields()` — extracts resolved `pydantic_core.CoreSchema` objects for fields, distinguishing newly defined vs. overriding fields
- `FieldSchema` (NamedTuple) — bundles a field's core schema, its resolution context, field name, `FieldInfo`, and owning model
- `resolve_ref_schema()` — resolves `definition-ref` and `definitions` schema types to concrete schemas
- `fetch_defs()` — extracts `BaseModel` subclasses and `Enum` subclasses
from modules
- `get_field_schema()` / `get_locally_defined_fields()` — extracts
resolved `pydantic_core.CoreSchema` objects for fields, distinguishing
newly defined vs. overriding fields
- `FieldSchema` (NamedTuple) — bundles a field's core schema, its
resolution context, field name, `FieldInfo`, and owning model
- `resolve_ref_schema()` — resolves `definition-ref` and `definitions`
schema types to concrete schemas
- `apply_schema_overlay(schema_yml, overlay_file)` — shallow-merges a
YAML file into a schema YAML string; restricts keys to
`SchemaDefinition` fields
- `apply_yaml_deep_merge(schema_yml, merge_file)` — deep-merges a YAML
file into a schema YAML string using `deepmerge`; no field filtering
- `remove_schema_key_duplication(yml)` — strips redundant `name`/`text`
fields from serialized LinkML YAML
- `add_section_breaks(yml)` — inserts blank lines before top-level
sections

2. **`gen_linkml.py`** — Main translation logic:
- `translate_defs(module_names)` — top-level entry point; loads modules, fetches defs, runs `LinkmlGenerator`
- `LinkmlGenerator` — single-use class; converts a collection of Pydantic models and enums into a `SchemaDefinition`. Call `generate()` once per instance.
- `SlotGenerator` — single-use class; translates a single Pydantic `CoreSchema` into a `SlotDefinition`. Dispatches on schema `type` strings via handler methods. Handles nesting, optionality, lists, unions, literals, UUIDs, dates, etc.
- `any_class_def` — module-level `ClassDefinition` constant for the LinkML `Any` type

3. **`cli/`** — Typer-based CLI wrapping `translate_defs`; `cli/__init__.py` defines the `app` and `main` command.
3. **`cli/`** — Typer-based CLI wrapping `translate_defs`; `cli/__init__.py`
defines the `app` and `main` command. After translation the pipeline is:
dump YAML → `remove_schema_key_duplication` → optional `-M` deep merge
→ optional `-O` overlay → `add_section_breaks` → output.

4. **`exceptions.py`** — Custom exceptions:
- `NameCollisionError` — duplicate class/enum names across modules
- `GeneratorReuseError` — attempting to reuse a single-use generator
- `TranslationNotImplementedError` — schema type not yet handled
- `SlotExtensionError` — cannot extend a base slot to match a target via slot_usage
- `SlotExtensionError` — cannot extend a base slot to match a target
via slot_usage
- `YAMLContentError` — YAML file content is not what is expected (e.g.,
not a mapping)

### Key Design Patterns

Expand Down
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,12 @@ A tool for translating models expressed in Pydantic to LinkML
```console
pydantic2linkml -o o.yml -l INFO dandischema.models
```

### Options

| Flag | Description |
|------|-------------|
| `-o` / `--output-file` | Write output to a file (default: stdout) |
| `-M` / `--merge-file` | Deep-merge a YAML file into the generated schema. Values from the file win on conflict; no field filtering is applied. |
| `-O` / `--overlay-file` | Shallow-merge a YAML file into the generated schema. Only `SchemaDefinition` fields are applied; unknown keys are skipped with a warning. |
| `-l` / `--log-level` | Log level (default: `WARNING`) |
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ classifiers = [
"Programming Language :: Python :: Implementation :: CPython",
]
dependencies = [
"deepmerge",
"linkml",
"pydantic~=2.7,<2.11",
"PyYAML",
Expand Down
39 changes: 36 additions & 3 deletions src/pydantic2linkml/cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,17 @@
from typing import Annotated, Optional

import typer
import yaml
from linkml_runtime.dumpers import yaml_dumper
from pydantic import ValidationError

from pydantic2linkml.cli.tools import LogLevel
from pydantic2linkml.exceptions import OverlayContentError
from pydantic2linkml.exceptions import YAMLContentError
from pydantic2linkml.gen_linkml import translate_defs
from pydantic2linkml.tools import (
add_section_breaks,
apply_schema_overlay,
apply_yaml_deep_merge,
remove_schema_key_duplication,
)

Expand All @@ -22,6 +24,18 @@
@app.command()
def main(
module_names: list[str],
merge_file: Annotated[
Optional[Path],
typer.Option(
"--merge-file",
"-M",
help="A YAML file whose contents are deep-merged into the generated "
"schema. Values from this file win on conflict. The result is "
"always a valid YAML file but may not be a valid LinkML schema — "
"it is the user's responsibility to supply a merge file that "
"produces a valid schema.",
),
] = None,
overlay_file: Annotated[
Optional[Path],
typer.Option(
Expand All @@ -46,6 +60,25 @@ def main(
schema = translate_defs(module_names)
logger.info("Dumping schema")
yml = remove_schema_key_duplication(yaml_dumper.dumps(schema))
if merge_file is not None:
logger.info("Applying deep merge from %s", merge_file)
try:
yml = apply_yaml_deep_merge(schema_yml=yml, merge_file=merge_file)
except ValidationError as e:
raise typer.BadParameter(
f"The merge file path is invalid: {e}",
param_hint="'--merge-file'",
) from e
except yaml.YAMLError as e:
raise typer.BadParameter(
f"The merge file does not contain valid YAML: {e}",
param_hint="'--merge-file'",
) from e
except YAMLContentError as e:
raise typer.BadParameter(
f"The merge file does not contain a valid YAML mapping: {e}",
param_hint="'--merge-file'",
) from e
if overlay_file is not None:
logger.info("Applying overlay from %s", overlay_file)
try:
Expand All @@ -55,14 +88,14 @@ def main(
f"The overlay file path is invalid: {e}",
param_hint="'--overlay-file'",
) from e
except OverlayContentError as e:
except YAMLContentError as e:
raise typer.BadParameter(
f"The overlay file does not contain a valid YAML mapping: {e}",
param_hint="'--overlay-file'",
) from e
yml = add_section_breaks(yml)
if not output_file:
print(yml, end='') # noqa: T201
print(yml, end="") # noqa: T201
else:
with output_file.open("w") as f:
f.write(yml)
Expand Down
4 changes: 2 additions & 2 deletions src/pydantic2linkml/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ def __repr__(self):
)


class OverlayContentError(ValueError):
class YAMLContentError(ValueError):
"""
Raise when the content of an overlay file is not a valid YAML mapping
Raise when the content of a YAML file is not what is expected
"""
46 changes: 43 additions & 3 deletions src/pydantic2linkml/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@

from pydantic2linkml.exceptions import (
NameCollisionError,
OverlayContentError,
YAMLContentError,
SlotExtensionError,
)

Expand Down Expand Up @@ -543,7 +543,7 @@ def apply_schema_overlay(schema_yml: str, overlay_file: FilePath) -> str:
:return: YAML string with the overlay applied, keys ordered to match
SchemaDefinition field order
:raises ValueError: If ``schema_yml`` does not deserialize to a dict
:raises OverlayContentError: If the overlay file does not contain a YAML
:raises YAMLContentError: If the overlay file does not contain a YAML
mapping
"""
schema_dict = yaml.safe_load(schema_yml)
Expand All @@ -556,7 +556,7 @@ def apply_schema_overlay(schema_yml: str, overlay_file: FilePath) -> str:
overlay = yaml.safe_load(f)

if not isinstance(overlay, dict):
raise OverlayContentError(
raise YAMLContentError(
f"Overlay file {overlay_file} must contain a YAML mapping"
)

Expand All @@ -580,6 +580,46 @@ def apply_schema_overlay(schema_yml: str, overlay_file: FilePath) -> str:
return yaml.dump(ordered, allow_unicode=True, sort_keys=False)


@validate_call
def apply_yaml_deep_merge(schema_yml: str, merge_file: FilePath) -> str:
"""Deep-merge a YAML file into a serialized schema YAML string.

Values from the merge file win on conflict. The merge is unrestricted —
no field filtering is applied.

:param schema_yml: YAML string of a valid LinkML schema
:param merge_file: Path to an existing YAML file containing a mapping
:return: YAML string with the deep merge applied
:raises ValueError: If ``schema_yml`` does not contain valid YAML or does
not deserialize to a dict
:raises yaml.YAMLError: If the merge file does not contain valid YAML
:raises YAMLContentError: If the merge file does not contain a YAML mapping
"""
from deepmerge import always_merger

try:
schema_dict = yaml.safe_load(schema_yml)
except yaml.YAMLError as e:
raise ValueError(f"schema_yml does not contain valid YAML: {e}") from e

if not isinstance(schema_dict, dict):
raise ValueError(
f"schema_yml did not deserialize to a dict: {type(schema_dict)}"
)

with merge_file.open() as f:
merge_dict = yaml.safe_load(f) # raises yaml.YAMLError on invalid YAML

if not isinstance(merge_dict, dict):
raise YAMLContentError(f"Merge file {merge_file} must contain a YAML mapping")

return yaml.dump(
always_merger.merge(schema_dict, merge_dict),
allow_unicode=True,
sort_keys=False,
)


def remove_schema_key_duplication(yml: str) -> str:
"""Remove redundant name/text fields from a valid serialized LinkML schema.

Expand Down
50 changes: 49 additions & 1 deletion tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,11 @@

from pydantic2linkml.cli import app, main

runner = CliRunner()
# Use a wide terminal so Typer's Rich error boxes are never wrapped across lines.
# terminal_width kwarg is not sufficient because Typer's Rich-based error
# formatting reads terminal width from shutil.get_terminal_size(), which
# respects the COLUMNS environment variable.
runner = CliRunner(env={"COLUMNS": "200"})

_MOCK_SCHEMA = SchemaDefinition(id="https://example.com/test", name="test-schema")

Expand Down Expand Up @@ -58,3 +62,47 @@ def test_unknown_key(self, tmp_path: Path):
result = runner.invoke(app, ["dandischema.models", "-O", str(overlay_file)])
assert result.exit_code == 0
assert "not_a_field" not in result.output


class TestCliDeepMerge:
@pytest.fixture(autouse=True)
def mock_translate_defs(self, mocker):
mocker.patch("pydantic2linkml.cli.translate_defs", return_value=_MOCK_SCHEMA)

def test_valid_field(self, tmp_path: Path):
merge_file = tmp_path / "merge.yaml"
merge_file.write_text("name: my-name\n")
result = runner.invoke(app, ["dandischema.models", "-M", str(merge_file)])
assert result.exit_code == 0
assert "name: my-name" in result.output

def test_nested_merge(self, tmp_path: Path):
merge_file = tmp_path / "merge.yaml"
merge_file.write_text("classes:\n Foo:\n description: test-desc\n")
result = runner.invoke(app, ["dandischema.models", "-M", str(merge_file)])
assert result.exit_code == 0
assert "description: test-desc" in result.output
# Original top-level fields are preserved
assert "id: https://example.com/test" in result.output

def test_nonexistent_file(self, tmp_path: Path):
result = runner.invoke(
app,
["dandischema.models", "-M", str(tmp_path / "no-such-file.yaml")],
)
assert result.exit_code == 2
assert "merge file path is invalid" in result.output

def test_non_mapping(self, tmp_path: Path):
merge_file = tmp_path / "merge.yaml"
merge_file.write_text("- item1\n")
result = runner.invoke(app, ["dandischema.models", "-M", str(merge_file)])
assert result.exit_code == 2
assert "does not contain a valid YAML mapping" in result.output

def test_invalid_yaml(self, tmp_path: Path):
merge_file = tmp_path / "merge.yaml"
merge_file.write_text("key: [unclosed\n")
result = runner.invoke(app, ["dandischema.models", "-M", str(merge_file)])
assert result.exit_code == 2
assert "does not contain valid YAML" in result.output
Loading
Loading