Skip to content

Ensure canonical key order and schema validity after merge/overlay#44

Merged
candleindark merged 2 commits intodandi:mainfrom
candleindark:enh-canonical-order-and-schema-validation
Mar 26, 2026
Merged

Ensure canonical key order and schema validity after merge/overlay#44
candleindark merged 2 commits intodandi:mainfrom
candleindark:enh-canonical-order-and-schema-validation

Conversation

@candleindark
Copy link
Copy Markdown
Member

@candleindark candleindark commented Mar 26, 2026

Closes #43.

Summary

  • Canonical key ordering: after -M (deep merge) or -O (overlay),
    the output YAML is round-tripped through SchemaDefinition via
    canonicalize_schema_yml, so keys always appear in the same order as a
    freshly serialized SchemaDefinition.
  • Schema validation: the canonical output is validated against the
    LinkML meta schema using linkml.validator.Validator with
    JsonschemaValidationPlugin(closed=True). Unknown field names and
    wrong-type values raise InvalidLinkMLSchemaError, which the CLI
    surfaces as a BadParameter error.
  • prefix_prefix deduplication: remove_schema_key_duplication now
    also strips the redundant prefix_prefix key from each prefix entry
    (the dict key already identifies the prefix).
  • Distinct error messages: the two detection paths produce different
    prefixes ("Unknown field in schema:" from yaml_loader TypeError vs.
    "Schema validation failed:" from the meta-schema validator), making
    the error origin unambiguous in tests and in the field.

Test plan

  • hatch run test.py3.10:pytest tests/ — all 146 tests pass
  • ruff check . && ruff format --check . — clean
  • TestCanonicalizeSchemaYml.test_wrong_type_raises_invalid_schema_error — mocks yaml_dumper.dumps to inject a wrong-type canonical YAML and asserts "Schema validation failed:" in the error
  • TestApplySchemaOverlay.test_unknown_field_raises_invalid_schema_error — asserts "Unknown field in schema:" prefix
  • TestApplyYamlDeepMerge.test_unknown_field_raises_invalid_schema_error — asserts "Unknown field in schema:" prefix
  • Manual smoke test: pydantic2linkml dandischema.models | head -20
  • Manual error test: echo "not_a_real_field: foo" > /tmp/bad.yaml && pydantic2linkml -M /tmp/bad.yaml dandischema.models

🤖 Generated with Claude Code

candleindark and others added 2 commits March 25, 2026 17:33
After `-M` deep merge, output keys could appear in non-canonical order
because deepmerge preserves dict insertion order. After `-O` overlay, key
ordering relied on a manual SchemaDefinition field list that also silently
dropped unknown keys.

Both functions now call a new `canonicalize_schema_yml` helper that
round-trips the YAML through `SchemaDefinition` via linkml-runtime's
yaml_loader/yaml_dumper. This produces canonical key ordering and raises
`InvalidLinkMLSchemaError` (new exception) for any field name unknown to
`SchemaDefinition` or its nested objects, which the CLI converts to a
`BadParameter` error.

`remove_schema_key_duplication` is moved to after both merge/overlay
steps in the CLI pipeline (so it strips the `name`/`text`/`prefix_prefix`
fields re-introduced by the round-trip), and is extended to also strip the
redundant `prefix_prefix` key from prefix entries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add `_get_meta_schema_validator()` (lazily initialized, cached via
`functools.cache`) and extend `canonicalize_schema_yml` to validate the
canonical output against the LinkML meta schema using
`linkml.validator.Validator` with `JsonschemaValidationPlugin(closed=True)`.
This catches unknown field names and wrong-type values that the
`yaml_loader` round-trip alone does not detect.

The two detection paths now produce distinct `InvalidLinkMLSchemaError`
messages: "Unknown field in schema:" for `TypeError` from `yaml_loader`,
and "Schema validation failed:" for violations found by the meta-schema
validator. CLI `BadParameter` messages and all documentation are updated
accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@candleindark candleindark merged commit db476a1 into dandi:main Mar 26, 2026
13 checks passed
@candleindark candleindark deleted the enh-canonical-order-and-schema-validation branch March 26, 2026 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ensure order of key-value pairs in LinkML schema and validity of LinkML schema

1 participant