fix: faulty TOC import/export (SD-2183) by luccas-harbour · Pull Request #2371 · superdoc-dev/superdoc

luccas-harbour · 2026-03-11T19:42:12Z

Summary

Fixes the DOCX export/import issues behind SD-2183 for table of contents content controls.

This PR addresses two related problems:

exported TOC content controls could emit invalid/empty w:id values
TOC field instructions stored inside a content control were not reliably round-tripped, which caused the exported document to lose the w:fldChar / w:instrText structure Word expects

What Changed

1. Prevent empty SDT IDs during export

stop emitting w:id for document part SDTs when the value is empty
sanitize passthrough sdtPr data so empty w:id nodes are not re-exported

2. Handle complex fields stored in a single run

update preProcessNodesForFldChar to correctly process fields when begin, instrText, separate, and end are all stored in the same w:r
preserve unknown-field fallback behavior for those compressed single-run cases
add regression coverage for:
- TOC fields in a single run
- unknown fields in a single run
- drawing/pict content inside active field collection
- nested/recursive field boundaries

3. Preserve TOC structure inside document part objects

update TOC docPartObject import to hoist sd:tableOfContents out of wrapper paragraphs
keep the imported PM structure aligned with what the layout/export pipeline expects:
- heading paragraph
- nested tableOfContents block
avoid creating empty wrapper paragraphs when a paragraph only contains pPr plus the TOC block
relax the tableOfContents node content model from paragraph+ to paragraph* so empty imported TOCs remain valid

Why

Word expects the TOC field to round-trip as a real complex field sequence. Without that, exported files can lose the TOC field markers and fail to behave correctly when reopened in Word.

linear · 2026-03-11T19:42:16Z

SD-2183 BUG: File becomes corrupted on export

github-actions · 2026-03-11T19:44:05Z

Status: PASS

The OOXML elements and attributes in this PR are all spec-compliant. Here's what I checked:

w:id (§17.5.2.18) — The val attribute is typed as ST_DecimalNumber. The old code was emitting <w:id w:val=""/> when the id was empty, which is invalid. The new sanitizeId helper correctly omits the element rather than writing an empty value. Per spec, omission of w:id is explicitly allowed — the processor will assign a new unique ID on open. Good fix.

One minor note: sanitizeId accepts any non-empty string, not just decimals. If upstream data ever contains a non-numeric id (e.g. "abc"), the emitted <w:id w:val="abc"/> would still be schema-invalid. That said, this is a pre-existing concern — the PR only tightens the empty-string guard, not introduces a new hole.

w:docPartObj / w:docPartGallery / w:docPartUnique (§17.5.2.13, §17.5.2.11, §17.5.2.14) — All used correctly. The boolean presence-equals-true pattern for w:docPartUnique matches spec, and the w:docPartGallery val is ST_String in the SDT context (not the restricted ST_DocPartGallery enum, which applies to glossary entries).

w:fldChar / w:fldCharType (§17.16.18, §17.18.29) — The three type values begin, separate, end are exactly right. The spec's canonical examples show each w:fldChar in its own w:r, but nothing prohibits co-location in a single run. expandNodeForFieldProcessing handling this case is a reasonable robustness measure for real-world documents.

w:instrText — Correctly named and used within w:r.

caio-pizzol

@luccas-harbour nice fix — splitting runs with multiple field markers and pulling TOC blocks out of wrapper paragraphs handles the corruption well.

one edge case to be aware of in the field splitter, and one spot where we should copy instead of modify in place — left inline comments on both.

on tests: the existing visual/layout test data covers general TOC and fldChar rendering, but nothing exercises the specific case this PR fixes (all field markers in one run). two unit tests worth adding: one where a single paragraph has both regular content and a TOC element, and one that checks the schema accepts a TOC with no children. a behavior test importing a doc with single-run fldChar fields would also be a nice regression guard but not blocking.

packages/super-editor/src/core/super-converter/field-references/preProcessNodesForFldChar.js

...per-editor/src/core/super-converter/v3/handlers/w/sdt/helpers/translate-document-part-obj.js

harbournick

LGTM

superdoc-bot · 2026-03-13T17:49:24Z

🎉 This PR is included in superdoc-cli v0.2.0-next.131

The release is available on GitHub release

superdoc-bot · 2026-03-13T17:49:26Z

🎉 This PR is included in superdoc v1.18.0-next.56

The release is available on GitHub release

luccas-harbour added 3 commits March 11, 2026 16:32

fix: prevent empty id for sdtPr during export

4cdebcd

fix(super-editor): handle fldChar fields stored in a single run

df15f2d

fix(super-editor): hoist TOC blocks out of doc part paragraphs

352c6ca

superdoc-bot bot added the risk: critical label Mar 11, 2026

luccas-harbour self-assigned this Mar 11, 2026

superdoc-bot bot added risk: sensitive and removed risk: critical labels Mar 11, 2026

luccas-harbour marked this pull request as ready for review March 11, 2026 19:47

luccas-harbour requested review from VladaHarbour, caio-pizzol and harbournick March 11, 2026 19:57

caio-pizzol reviewed Mar 11, 2026

View reviewed changes

packages/super-editor/src/core/super-converter/field-references/preProcessNodesForFldChar.js Show resolved Hide resolved

...per-editor/src/core/super-converter/v3/handlers/w/sdt/helpers/translate-document-part-obj.js Outdated Show resolved Hide resolved

missysuperdoc added the priority: high label Mar 12, 2026

luccas-harbour added 3 commits March 12, 2026 11:44

fix: scope unknown fldChar fallback to the active run slice

62ee19a

fix: avoid mutating passthrough sdtPr in doc part export

0fe0df0

test: add TOC regression coverage for mixed wrappers and empty content

3762bda

luccas-harbour requested a review from caio-pizzol March 12, 2026 14:52

harbournick reviewed Mar 13, 2026

View reviewed changes

harbournick merged commit 45b4452 into main Mar 13, 2026
7 checks passed

harbournick deleted the luccas/sd-2183-bug-file-becomes-corrupted-on-export branch March 13, 2026 17:46

superdoc-bot bot added the released on @next label Mar 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: faulty TOC import/export (SD-2183)#2371

fix: faulty TOC import/export (SD-2183)#2371
harbournick merged 6 commits intomainfrom
luccas/sd-2183-bug-file-becomes-corrupted-on-export

luccas-harbour commented Mar 11, 2026

Uh oh!

linear bot commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

caio-pizzol left a comment

Uh oh!

Uh oh!

Uh oh!

harbournick left a comment

Uh oh!

Uh oh!

superdoc-bot bot commented Mar 13, 2026

Uh oh!

superdoc-bot bot commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

luccas-harbour commented Mar 11, 2026

Summary

What Changed

1. Prevent empty SDT IDs during export

2. Handle complex fields stored in a single run

3. Preserve TOC structure inside document part objects

Why

Uh oh!

linear bot commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

caio-pizzol left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

harbournick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

superdoc-bot bot commented Mar 13, 2026

Uh oh!

superdoc-bot bot commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants