Skip to content

fix(copyright): clean up Boost markup regressions#565

Merged
mstykow merged 1 commit intomainfrom
fix/boost-markup-copyright-regressions
Apr 4, 2026
Merged

fix(copyright): clean up Boost markup regressions#565
mstykow merged 1 commit intomainfrom
fix/boost-markup-copyright-regressions

Conversation

@mstykow
Copy link
Copy Markdown
Owner

@mstykow mstykow commented Apr 4, 2026

Summary

  • clean up shared copyright/author detection regressions exposed by the boostorg/boost compare-outputs --profile common run
  • add durable copyright golden regressions for XML author attributes, DocBook authorgroup extraction, and Boost CSS selector noise, and update stale golden expectations that were treating file names/scripts as authors
  • record the verified boostorg/boost result in the C++ parser verification scorecard

Scope and exclusions

  • Included:
    • markup author extraction and entity decoding improvements in the shared copyright detector
    • junk filtering for CSS/prose/path-like author and holder noise
    • focused detector/refiner tests, narrow copyright golden suite coverage, docs scorecard update, and repeated Boost compare validation
  • Explicit exclusions:
    • no parser-family feature expansion beyond the shared detection fixes surfaced by the compare run
    • no changes to the kept Provenant-better normalization/improvement deltas unless they were clear regressions

Intentional differences from Python

  • keep cleaner Provenant-better results from the Boost compare run, including the extra real Boost copyright/author detections and cleaner normalization/deduplication in markup-heavy files
  • drop bogus golden “authors” that were actually file names or generator scripts (DynamicClockGatingTable.ctb, EnableASIC_StaticPwrMgtTable.ctb, EnableDispPowerGatingTable.ctb, createinit.py)

Expected-output fixture changes

  • Files changed: testdata/copyright-golden/authors/boost_xml_author_attr_entities.xml.yml, testdata/copyright-golden/authors/boost_docbook_authorgroup.html.yml, testdata/copyright-golden/copyrights/boostbook_css_noise.css.yml, testdata/copyright-golden/copyrights/misco4/linux-copyrights/drivers/gpu/drm/amd/include/atombios.h.yml, testdata/copyright-golden/copyrights/misco4/linux-copyrights/drivers/gpu/drm/radeon/atombios.h.yml, testdata/copyright-golden/copyrights/misco4/linux-copyrights/drivers/media/usb/dvb-usb/af9005-script.h.yml
  • Why the new expected output is correct:
    • the new Boost fixtures lock in the compare-run regressions this branch fixed
    • the updated Linux fixture expectations now match the improved shared author filter, which correctly treats those prior “authors” as code/path noise rather than people

Improve shared copyright and author detection exposed by the Boost compare run by extracting markup authors directly, suppressing CSS and prose noise, and adding durable golden regressions. Record the verified C++ scorecard result so the compare evidence and intentional differences stay documented.
@mstykow mstykow merged commit 08cb04a into main Apr 4, 2026
13 checks passed
@mstykow mstykow deleted the fix/boost-markup-copyright-regressions branch April 4, 2026 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant