Skip to content

Auto-generate JSON Schema from individual YAML files#37

Merged
titusz merged 39 commits intomainfrom
develop
Mar 17, 2026
Merged

Auto-generate JSON Schema from individual YAML files#37
titusz merged 39 commits intomainfrom
develop

Conversation

@titusz
Copy link
Copy Markdown
Member

@titusz titusz commented Mar 16, 2026

Summary

  • Eliminate iscc-collection.yaml (534 lines of manually maintained duplication) by auto-flattening individual schema files in build_json_schema.py
  • Fix 16 fields of accumulated drift (missing x-iscc-status, x-iscc-context, stale keywords description)
  • Individual YAML schema files are now the single source of truth for all derived artifacts

Test plan

  • poe all pipeline passes (24/24 tests)
  • Generated JSON Schema property order matches previous output
  • All content differences are drift fixes (verified via semantic diff)

titusz added 16 commits March 16, 2026 20:15
- Convert pyproject.toml from [tool.poetry] to PEP 621 [project] format
- Move dev dependencies to [dependency-groups]
- Switch build backend from poetry.core to hatchling
- Replace poetry.lock with uv.lock
- Update CI workflow to use astral-sh/setup-uv
- Update poe update task to use uv sync
- Update requires-python, classifiers, and black target-version
- Update datamodel-code-generator target to PY_310
- Update CI test matrix to 3.10, 3.11, 3.12
- Rewrite base.py: model_validator, ConfigDict, model_dump/model_dump_json
- Rewrite fields.py: AnyUrl with __get_pydantic_core_schema__ and
  __get_pydantic_json_schema__ for v2 custom type protocol
- Update build_code.py: robust regex-based import patching, remove
  pydantic.v1 compat shim from generated code
- Regenerate schema.py and generator.py with datamodel-code-generator 0.55
- Update build_json_ld_context.py for v2 JSON schema key format
- Update tests: remove pydantic.v1 imports, adapt to compact JSON output,
  use model_construct, ValidationError from pydantic
- Pin pydantic>=2 in dependencies
- Set explicit input_file_type=JsonSchema for schema generation
- Set formatters=[] to suppress FutureWarning about black/isort deprecation
- Run black on generated files in build_code.py so poe formatcode is a no-op
Bump package version from 0.4.1 to 0.5.0 and align schema/context
URLs from 0.3.7 to 0.5.0. Regenerate all derived artifacts.
checkout v4→v6, setup-uv v6→v7, setup-python v5→v6
Move --md-typeset-a-color into a [data-md-color-scheme="slate"]
selector so it only applies in dark mode, preventing it from
overriding the default link color in light mode.
Add optional `signature` object to IsccMeta for compatibility with
iscc-crypto's EdDSA/JCS signing protocol. The nested object supports
version, controller, keyid, pubkey, and proof sub-fields.
- Add `units` field for individual ISCC-UNITs in composite ISCC-CODEs
- Add `text` field for extracted plaintext of digital content
- Widen `parts` items to accept both strings and objects
- Add `minLength: 1` to `name` and `description` fields
- Sync iscc-collection.yaml with individual schema files
- Fix `keywords` description typo ("sting" → "string")
- Align `credentials` description across schema files
- Remove `x-iscc-status: stable` from chain, wallet, and NFT fields
- declerations → declarations, Field → Fields (iscc-declaration)
- one ore more → one or more (iscc-declaration, iscc-collection)
- automaticaly → automatically (iscc-technical)
- secondes → seconds (iscc-technical, iscc-collection, iscc-generator)
- URI an → URI of an (iscc-technical, iscc-collection)
Eliminate iscc-collection.yaml (534 lines of manually maintained duplication)
by auto-flattening individual schema files in build_json_schema.py. This fixes
16 fields of accumulated drift (missing x-iscc-status, x-iscc-context, stale
keywords description) and prevents future sync issues.
titusz added 10 commits March 17, 2026 00:04
Add iscc_code as an explicit alternative to the compact iscc field and
nonce for cryptographic replay protection. Update descriptions on iscc,
iscc_id, and iscc_code to recommend including at least one identifier.
- Give iscc_id and media_id distinct ISCC term IRIs to resolve triple
  collision on schema.org/identifier
- Emit @type: @id for URI-typed fields so JSON-LD processors recognize
  values as IRIs
- Add missing mode field to JSON-LD context
- Update generator reference schema from stale 0.3.2 to unversioned URLs
Add x-iscc-standard extension field to 22 YAML schema properties that
are part of ISO 24138:2024. Surface the annotation in generated schema
docs, vocabulary page, and terms includes.
Introduce three-category schema framework: ISCC Metadata (core vocabulary),
Seed Metadata (industry-specific Meta-Code generation input), and Service
Metadata (use-case-specific schemas for ISCC registries/gateways).

- Add ISBN/ISRC seed metadata schemas with full build pipeline integration
- Add TDM service metadata schema for AI content usage rights declarations
  (train, inference, derive, search, analyze — each reserved or open)
- Generate per-schema docs pages, JSON Schema, JSON-LD context terms
- Refactor build scripts to share standalone schema generation logic
- Fix misleading ISCC Metadata description (content vocabulary, not declaration)
Add optional tdm object field to iscc-embeddable schema alongside other
rights fields (license, acquire, credit, rights). When present, all five
reservation properties are required (train, inference, derive, search,
analyze), each accepting "reserved" or "open".
…nal fields

Align TDM schema with EU DSM Directive Art. 4 opt-out semantics. Fields are
now optional (omitted = undetermined), additionalProperties is enforced in
both Pydantic model and JSON Schema, and legally loaded terminology is replaced
with neutral descriptions (e.g., "derivative works" → "content transformation").
…ext()

Embed @context directly into JSON Schema output so each schema is a
self-contained artifact for validation and semantic mapping. Add
recover_context() for runtime JSON-LD context recovery from plain JSON data.
Serialized ISCC data now carries versioned URLs so consumers can identify
which schema version produced it. Unversioned URLs in YAML sources are
patched to versioned form during code generation.
Limit push triggers to main/develop branches so feature branch pushes
only fire via pull_request, preventing two identical workflow runs from
racing to save the same setup-uv cache key.
… types

Introduces an optional `form` field on IsccMeta that classifies what the
content *is* (book, article, movie, photograph, etc.) using a curated set
of 22 Schema.org CreativeWork subtypes. Maps to `schema.org/additionalType`
in JSON-LD context with IRI mappings for all enum values.
titusz added 13 commits March 17, 2026 16:54
Add GenAI Service Metadata schema for generative AI disclosure signals
with graduated involvement levels (human, ai_assisted, human_supervised,
ai_generated), AI system identification, and IPTC Digital Source Type
interoperability. Move tdm and genai fields from iscc-embeddable to
iscc-extended since structured objects belong in the gateway/registry
metadata layer, not in media-embeddable metadata.
Add datasize field to iscc-technical for cases where the ISCC is
computed over data that is not a standalone file, such as individual
planes within bioimages or scenes within multi-scene containers.
Add stable/draft status to all fields across YAML schemas and surface
the status in generated schema docs, vocabulary page, and terms includes.
Replace mkdocs-material with zensical for documentation generation,
matching the ISCC theme and features established in iscc-usearch.

- Add zensical.toml config with ISCC brand theming and sidebar navigation
- Add copy-page split-button for LLM-friendly markdown export
- Add ISCC-AI copilot widget integration
- Add llms.txt, robots.txt, and per-page markdown generation (gen_llms_full.py)
- Add "For Coding Agents" dense reference page
- Add GitHub Pages deployment workflow (docs.yml)
- Add Open Graph/Twitter Card meta tags and Plausible analytics
- Add navigation icons and short titles to all doc pages
- Add ISCC wordmark logo and favicon assets
- Update README badges to match iscc-usearch convention
- Update CHANGELOG with documentation migration entries
- Delete mkdocs.yml and analytics partial override
Rewrite README.md and docs/index.md from verbose prose to a concise
overview with install, quick start, schema categories, and artifacts
table. Replace NFT-only examples with comprehensive samples covering
all schema categories. Add guide.md to navigation.
…ctions

Group orphan top-level pages under section headings for clearer navigation.
Guide and Examples under Getting Started, all schema pages plus Vocabulary
and Versioning under Reference, Contributing/Agents/Changelog under Project.
- Remove unused Formatter import from build_code.py
- Add genai.json/genai.yaml to standalone schema test assertions
@titusz titusz merged commit d483023 into main Mar 17, 2026
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant