Auto-generate JSON Schema from individual YAML files by titusz · Pull Request #37 · iscc/iscc-schema

titusz · 2026-03-16T22:36:19Z

Summary

Eliminate iscc-collection.yaml (534 lines of manually maintained duplication) by auto-flattening individual schema files in build_json_schema.py
Fix 16 fields of accumulated drift (missing x-iscc-status, x-iscc-context, stale keywords description)
Individual YAML schema files are now the single source of truth for all derived artifacts

Test plan

poe all pipeline passes (24/24 tests)
Generated JSON Schema property order matches previous output
All content differences are drift fixes (verified via semantic diff)

- Convert pyproject.toml from [tool.poetry] to PEP 621 [project] format - Move dev dependencies to [dependency-groups] - Switch build backend from poetry.core to hatchling - Replace poetry.lock with uv.lock - Update CI workflow to use astral-sh/setup-uv - Update poe update task to use uv sync

- Update requires-python, classifiers, and black target-version - Update datamodel-code-generator target to PY_310 - Update CI test matrix to 3.10, 3.11, 3.12

- Rewrite base.py: model_validator, ConfigDict, model_dump/model_dump_json - Rewrite fields.py: AnyUrl with __get_pydantic_core_schema__ and __get_pydantic_json_schema__ for v2 custom type protocol - Update build_code.py: robust regex-based import patching, remove pydantic.v1 compat shim from generated code - Regenerate schema.py and generator.py with datamodel-code-generator 0.55 - Update build_json_ld_context.py for v2 JSON schema key format - Update tests: remove pydantic.v1 imports, adapt to compact JSON output, use model_construct, ValidationError from pydantic - Pin pydantic>=2 in dependencies

- Set explicit input_file_type=JsonSchema for schema generation - Set formatters=[] to suppress FutureWarning about black/isort deprecation - Run black on generated files in build_code.py so poe formatcode is a no-op

Bump package version from 0.4.1 to 0.5.0 and align schema/context URLs from 0.3.7 to 0.5.0. Regenerate all derived artifacts.

checkout v4→v6, setup-uv v6→v7, setup-python v5→v6

Move --md-typeset-a-color into a [data-md-color-scheme="slate"] selector so it only applies in dark mode, preventing it from overriding the default link color in light mode.

Add optional `signature` object to IsccMeta for compatibility with iscc-crypto's EdDSA/JCS signing protocol. The nested object supports version, controller, keyid, pubkey, and proof sub-fields.

- Add `units` field for individual ISCC-UNITs in composite ISCC-CODEs - Add `text` field for extracted plaintext of digital content - Widen `parts` items to accept both strings and objects - Add `minLength: 1` to `name` and `description` fields - Sync iscc-collection.yaml with individual schema files - Fix `keywords` description typo ("sting" → "string") - Align `credentials` description across schema files - Remove `x-iscc-status: stable` from chain, wallet, and NFT fields

- declerations → declarations, Field → Fields (iscc-declaration) - one ore more → one or more (iscc-declaration, iscc-collection) - automaticaly → automatically (iscc-technical) - secondes → seconds (iscc-technical, iscc-collection, iscc-generator) - URI an → URI of an (iscc-technical, iscc-collection)

Eliminate iscc-collection.yaml (534 lines of manually maintained duplication) by auto-flattening individual schema files in build_json_schema.py. This fixes 16 fields of accumulated drift (missing x-iscc-status, x-iscc-context, stale keywords description) and prevents future sync issues.

Add iscc_code as an explicit alternative to the compact iscc field and nonce for cryptographic replay protection. Update descriptions on iscc, iscc_id, and iscc_code to recommend including at least one identifier.

@type

- Give iscc_id and media_id distinct ISCC term IRIs to resolve triple collision on schema.org/identifier - Emit @type: @id for URI-typed fields so JSON-LD processors recognize values as IRIs - Add missing mode field to JSON-LD context - Update generator reference schema from stale 0.3.2 to unversioned URLs

Add x-iscc-standard extension field to 22 YAML schema properties that are part of ISO 24138:2024. Surface the annotation in generated schema docs, vocabulary page, and terms includes.

Introduce three-category schema framework: ISCC Metadata (core vocabulary), Seed Metadata (industry-specific Meta-Code generation input), and Service Metadata (use-case-specific schemas for ISCC registries/gateways). - Add ISBN/ISRC seed metadata schemas with full build pipeline integration - Add TDM service metadata schema for AI content usage rights declarations (train, inference, derive, search, analyze — each reserved or open) - Generate per-schema docs pages, JSON Schema, JSON-LD context terms - Refactor build scripts to share standalone schema generation logic - Fix misleading ISCC Metadata description (content vocabulary, not declaration)

Add optional tdm object field to iscc-embeddable schema alongside other rights fields (license, acquire, credit, rights). When present, all five reservation properties are required (train, inference, derive, search, analyze), each accepting "reserved" or "open".

…nal fields Align TDM schema with EU DSM Directive Art. 4 opt-out semantics. Fields are now optional (omitted = undetermined), additionalProperties is enforced in both Pydantic model and JSON Schema, and legally loaded terminology is replaced with neutral descriptions (e.g., "derivative works" → "content transformation").

@context

…ext() Embed @context directly into JSON Schema output so each schema is a self-contained artifact for validation and semantic mapping. Add recover_context() for runtime JSON-LD context recovery from plain JSON data.

Serialized ISCC data now carries versioned URLs so consumers can identify which schema version produced it. Unversioned URLs in YAML sources are patched to versioned form during code generation.

Limit push triggers to main/develop branches so feature branch pushes only fire via pull_request, preventing two identical workflow runs from racing to save the same setup-uv cache key.

… types Introduces an optional `form` field on IsccMeta that classifies what the content *is* (book, article, movie, photograph, etc.) using a curated set of 22 Schema.org CreativeWork subtypes. Maps to `schema.org/additionalType` in JSON-LD context with IRI mappings for all enum values.

Add GenAI Service Metadata schema for generative AI disclosure signals with graduated involvement levels (human, ai_assisted, human_supervised, ai_generated), AI system identification, and IPTC Digital Source Type interoperability. Move tdm and genai fields from iscc-embeddable to iscc-extended since structured objects belong in the gateway/registry metadata layer, not in media-embeddable metadata.

Add datasize field to iscc-technical for cases where the ISCC is computed over data that is not a standalone file, such as individual planes within bioimages or scenes within multi-scene containers.

Add stable/draft status to all fields across YAML schemas and surface the status in generated schema docs, vocabulary page, and terms includes.

Replace mkdocs-material with zensical for documentation generation, matching the ISCC theme and features established in iscc-usearch. - Add zensical.toml config with ISCC brand theming and sidebar navigation - Add copy-page split-button for LLM-friendly markdown export - Add ISCC-AI copilot widget integration - Add llms.txt, robots.txt, and per-page markdown generation (gen_llms_full.py) - Add "For Coding Agents" dense reference page - Add GitHub Pages deployment workflow (docs.yml) - Add Open Graph/Twitter Card meta tags and Plausible analytics - Add navigation icons and short titles to all doc pages - Add ISCC wordmark logo and favicon assets - Update README badges to match iscc-usearch convention - Update CHANGELOG with documentation migration entries - Delete mkdocs.yml and analytics partial override

Rewrite README.md and docs/index.md from verbose prose to a concise overview with install, quick start, schema categories, and artifacts table. Replace NFT-only examples with comprehensive samples covering all schema categories. Add guide.md to navigation.

…ctions Group orphan top-level pages under section headings for clearer navigation. Guide and Examples under Getting Started, all schema pages plus Vocabulary and Versioning under Reference, Contributing/Agents/Changelog under Project.

- Remove unused Formatter import from build_code.py - Add genai.json/genai.yaml to standalone schema test assertions

titusz added 16 commits March 16, 2026 20:15

Update .gitignore to comprehensive Python template

47a2842

Require Python >=3.10,<3.15

6af119f

- Update requires-python, classifiers, and black target-version - Update datamodel-code-generator target to PY_310 - Update CI test matrix to 3.10, 3.11, 3.12

Suppress datamodel-code-generator warnings and clean up build output

b5a7155

- Set explicit input_file_type=JsonSchema for schema generation - Set formatters=[] to suppress FutureWarning about black/isort deprecation - Run black on generated files in build_code.py so poe formatcode is a no-op

Add Python 3.13 and 3.14 to CI test matrix

fa37bd8

Update changelog with unreleased changes summary

d91469f

Bump version to 0.5.0 and sync schema version

f9187db

Bump package version from 0.4.1 to 0.5.0 and align schema/context URLs from 0.3.7 to 0.5.0. Regenerate all derived artifacts.

Remove upper version pin on mkdocs-material

14e5328

Update CI actions to latest versions

c95cebc

checkout v4→v6, setup-uv v6→v7, setup-python v5→v6

style: scope link color override to dark theme selector

35eae50

Move --md-typeset-a-color into a [data-md-color-scheme="slate"] selector so it only applies in dark mode, preventing it from overriding the default link color in light mode.

feat: Add signature field for iscc-crypto compatibility

abf42a2

Add optional `signature` object to IsccMeta for compatibility with iscc-crypto's EdDSA/JCS signing protocol. The nested object supports version, controller, keyid, pubkey, and proof sub-fields.

fix: Remove image field from iscc-basic example

128d191

This was referenced Mar 16, 2026

Add Python 3.14 support (migrate from Pydantic v1 to v2) #36

Closed

Add signature field #10

Closed

Add credentials field #33

Closed

Create a single schema model with tags for autogenerated subsets #26

Closed

titusz added 10 commits March 17, 2026 00:04

feat: Add iscc_code and nonce fields, clarify ISCC identifier guidance

de4d2bb

Add iscc_code as an explicit alternative to the compact iscc field and nonce for cryptographic replay protection. Update descriptions on iscc, iscc_id, and iscc_code to recommend including at least one identifier.

feat: Mark ISO 24138:2024 fields and surface in generated docs

87b474f

Add x-iscc-standard extension field to 22 YAML schema properties that are part of ISO 24138:2024. Surface the annotation in generated schema docs, vocabulary page, and terms includes.

feat: Emit versioned $schema and @context URLs from Pydantic models

685843b

Serialized ISCC data now carries versioned URLs so consumers can identify which schema version produced it. Unversioned URLs in YAML sources are patched to versioned form during code generation.

fix: Avoid duplicate CI runs causing uv cache conflicts

e838430

Limit push triggers to main/develop branches so feature branch pushes only fire via pull_request, preventing two identical workflow runs from racing to save the same setup-uv cache key.

titusz added 13 commits March 17, 2026 16:54

feat: Add datasize field for sub-file ISCC generation

e28ccff

Add datasize field to iscc-technical for cases where the ISCC is computed over data that is not a standalone file, such as individual planes within bioimages or scenes within multi-scene containers.

feat: Add x-iscc-status annotations and render in generated docs

bceff15

Add stable/draft status to all fields across YAML schemas and surface the status in generated schema docs, vocabulary page, and terms includes.

chore: Move docs/images to docs/assets and update references

36e9884

chore: Move gen_llms_full.py from scripts/ to tools/ for consistency

90997a0

docs: Add contributing guide and fix formatyaml task name typo

9284a1b

docs: Clarify JSON Schema and JSON-LD roles in guide

436f3b9

docs: Simplify context recovery intro in guide

c8079a3

fix: Remove unused import and add genai.json to test coverage

8c94832

- Remove unused Formatter import from build_code.py - Add genai.json/genai.yaml to standalone schema test assertions

ci: Add tag-triggered release workflow for PyPI publishing

a5f1bae

titusz merged commit d483023 into main Mar 17, 2026
29 of 30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-generate JSON Schema from individual YAML files#37

Auto-generate JSON Schema from individual YAML files#37
titusz merged 39 commits intomainfrom
develop

titusz commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

titusz commented Mar 16, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant