Skip to content

Redesign of repo tooling for mixed URI prefixes in 7.1#729

Open
tychonievich wants to merge 7 commits intov7.1from
explicit-uri
Open

Redesign of repo tooling for mixed URI prefixes in 7.1#729
tychonievich wants to merge 7 commits intov7.1from
explicit-uri

Conversation

@tychonievich
Copy link
Collaborator

@tychonievich tychonievich commented Jan 30, 2026

Closes #727
Unblocks #722 and #723

There's a lot going on here, and I spent several weeks trying to minimize this PR, split it into pieces, etc, but couldn't find a way that worked. So here's an outline of what's changed.

  1. In the spec, many tables (notably enumsets) now have an explicit URI entry.

    I made the URIs their own column, not the two-line cells used for URIs of events and attributes earlier, mostly because I thought we have space. We can refactor that into using <br> if we prefer.

  2. I re-wrote build/uri-def.py from the ground up as build/extract-yaml-tsv.py.

    uri-def.py was a 50-line script I wrote that I slowly changed into a 500-line monstristy that I find almost impossible to read or maintain. And it had many assumptions about the spec's format not only in its code (that's inevitable when parsing human text into machine-readable files) but also in its design. Trying to add the more versatile URIs was such a headache I decided to use it as an excuse for a long-overdue re-write.

    extract-yaml-tsv.py uses classes to have the code mirror the files it is designed to produce. It uses type annotations (which pass mypy) to avoid various typing bugs. It tries to do one thing in one place instead of doing all things possible everywhere. It's 100 lines longer than uri-def.py but faster to run and easier to maintain.

    I spent some effort to match idiosyncrasies of uri-def.py, like when to use strings, 'strings', or "strings" in the emitted YAML; but sometimes dong that was not possible without replicating uri-def.py's worse parts, so there are some trivial changes (notably in the order of lists) in the files it creates.

  3. I (accidentally) fixed various bugs of omission. uri-def.py was sloppy in how it handled the textual explanations in sections that had either gedstruct rules with alternatives or tables. Fixing that with a more principled approach means adding extra specification: entries for around half of all YAML files.

  4. I tried but failed to match the sorting of the various .tsv files from uri-def.py, which was based on some kind of sorting of structures before they were used to generate multiple lines each. I was tempted to scrap that and simply sort the final lines, which is easy and well-defined, but this commit does not do that, instead doing a half-hearted effort at mimicking the old sorting.

I did all this on 7.1 because that was the original motivation, but if we like it I can port it back to the 7.0 branch (either by adding explicit URIs to the 7.0 spec or by adding special-case handling of tag-to-URI) with relative ease.

Closes #727
Unblocks #722 and #723

There's a lot going on here, and I spent several weeks trying to minimize this PR, split it into pieces, etc, but couldn't find a way that worked. So here's an outline of what's changed.

1. In the spec, many tables (notably enumsets) now have an explicit URI entry.

    I made the URIs their own column, not the two-line cells used for URIs of events and attributes earlier, mostly because I thought we have space. We can refactor that into using <br> if we prefer.

2. I re-wrote `build/uri-def.py` from the ground up as `build/extract-yaml-tsv.py`.

    `uri-def.py` was a 50-line script I wrote that I slowly changed into a 500-line monstristy that I find almost impossible to read or maintain. And it had many assumptions about the spec's format not only in its code (that's inevitable when parsing human text into machine-readable files) ubt also in its design. Trying to add the more versatile URIs was such a headache I decided to use it as an excuse for a long-verdue re-write.

    `extract-yaml-tsv.py` uses classes to have the code mirror the files it is designed to produce. It uses type annotations (which pass `mypy`) to avoid various typing bugs. It tries to do one thing in one place instead of doing all things possible everywhere. It's 100 lines longer than `uri-def.py` but faster to run and easier to maintain.

    I spent some effort to match ideosynracies of `uri-def.py`, like when to use `strings`, `'strings'`, or `"strings"` in the emitted YAML; but sometimes dong that was not possible without replicating `uri-def.py`'s worse parts, so there are some trivial changes (notably in the order of lists) in the files it creates.

3. I (accidentally) fixed various bugs of omission. `uri-def.py` was sloppy in how it handled the textual explantions in sections that had either gedstruct rules with alternatives or tables. Fixing that with a more principled approach means adding extra `specification:` entries for around half of all YAML files.

4. I tried but failed to match the sorting of the various .tsv files from `uri-def.py`, which was based on some kind of sorting of structures before they were used to generate multiple lines each. I was tempted to scrap that and simply sort the final lines, which is easy and well-defined, but this commit does not do that, instead doing a half-hearted effort at mimickiing the old sorting.

I did all this on 7.1 because that was the original motivation, but if we like it I can port it back to the 7.0 branch (either by adding explicit URIs to the 7.0 spec or by adding special-case handling of tag-to-URI) with relative ease.
@tychonievich tychonievich changed the title Redesign for mixed URI prefixes in 7.1 Redesign of repo tooling for mixed URI prefixes in 7.1 Jan 30, 2026

type: structure

uri: https://gedcom.io/terms/v7/REPO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the payload URI (line 23) is a v7.1 URI, shouldn't the structure URI also be v7.1?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion in GEDCOM Steering Committee meeting:

  • The script should detect this and report it as an error
  • Luther will update this PR to do so

@dthaler dthaler requested review from Copilot and removed request for clarkegj February 20, 2026 17:39
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR redesigns the repository tooling to support mixed URI prefixes (g7: and g7.1:) in version 7.1 of the GEDCOM specification. The changes address issue #727 by making URI handling more explicit and flexible.

Changes:

  • Adds explicit URI columns to enumeration and month tables in specification markdown files
  • Completely rewrites uri-def.py as extract-yaml-tsv.py with better structure using classes and type annotations
  • Fixes bugs where textual explanations were omitted from YAML files

Reviewed changes

Copilot reviewed 140 out of 140 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
specification/gedcom-6-appendix-calendars.md Added URI column to month tables for all calendars
specification/gedcom-3-structures-4-enumerations.md Added URI column to all enumeration value tables and removed concatenation text
specification/gedcom-3-structures-3-meaning.md Added deprecation notice for HEAD.SOUR.DATA
build/extract-yaml-tsv.py New Python script with class-based design to extract YAML/TSV files from spec
build/uri-def.py Deleted (replaced by extract-yaml-tsv.py)
build/Makefile Updated to use new extract-yaml-tsv.py script
extracted-files/* Updated YAML and TSV files with new URIs, labels, and specification text

Comment on lines +41 to +42
out.count('[')

Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 41 has a statement out.count('[') that has no effect. This appears to be leftover debug code that should be removed.

Suggested change
out.count('[')

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +30
for k,v in pfx.items():
md = re.sub(rf'\b{k}', v, md)
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable pfx is used on line 29 but is not defined in the function signature or within the function. This appears to be a global variable reference that should be passed as a parameter to maintain proper function design.

Copilot uses AI. Check for mistakes.
tychonievich and others added 6 commits February 24, 2026 11:06
Co-authored-by: Dave Thaler <dthaler1968@gmail.com>
Co-authored-by: Dave Thaler <dthaler1968@gmail.com>
Co-authored-by: Dave Thaler <dthaler1968@gmail.com>
Co-authored-by: Dave Thaler <dthaler1968@gmail.com>
Co-authored-by: Dave Thaler <dthaler1968@gmail.com>
Co-authored-by: Dave Thaler <dthaler1968@gmail.com>
dthaler pushed a commit to dthaler/GEDCOM-registries that referenced this pull request Mar 10, 2026
Per committee discussion summarized at
FamilySearch/GEDCOM#729 (comment)

Signed-off-by: Dave Thaler <dthaler@armidalesoftware.com>
dthaler added a commit to FamilySearch/GEDCOM-registries that referenced this pull request Mar 10, 2026
* substructures.tsv can have empty string in first column

Per committee discussion summarized at
FamilySearch/GEDCOM#729 (comment)

Signed-off-by: Dave Thaler <dthaler@armidalesoftware.com>

* Apply suggestion from @dthaler

---------

Signed-off-by: Dave Thaler <dthaler@armidalesoftware.com>
Co-authored-by: Dave Thaler <dthaler@armidalesoftware.com>
@tychonievich
Copy link
Collaborator Author

Based on FamilySearch/GEDCOM.io#187, this should be refactored to split out the YAML and TSV processing, extracting the YAML first and then creating the TSV from the YAML, using the same YAML-to-TSV script for this repo and the registries repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants