fix: import of w:sdt nested in paragraph (SD-2190) by luccas-harbour · Pull Request #2380 · superdoc-dev/superdoc

luccas-harbour · 2026-03-12T19:49:09Z

Summary

Fix DOCX import failures caused by w:sdt w:docPartObj nodes being parsed as block content inside paragraph nodes.

This change makes paragraph import resilient when a block-level docPart SDT appears inside w:p, especially for cases like Table of Figures / Table of Contents placeholders at the start of a document.

Problem

Some documents contain XML like:

<w:p>
  <w:sdt>
    <w:sdtPr>
      <w:docPartObj>
        <w:docPartGallery w:val="Table of Figures"/>
      </w:docPartObj>
    </w:sdtPr>
    <w:sdtContent>...</w:sdtContent>
  </w:sdt>
</w:p>

Our importer translated that w:sdt into a documentPartObject block node, but still attempted to keep it inside paragraph.content.

Since paragraph only allows inline content, import could fail with:

Invalid content for node type paragraph

What changed

Added a shared isInlineNode() helper for schema-aware inline/block classification.
Strengthened the fallback inline detection used when schema metadata is unavailable.
Updated paragraph import normalization so block-producing children are hoisted out of w:p instead of being inserted into paragraph.content.
Split mixed paragraph content into sibling nodes when needed:
- paragraph fragments for inline runs
- block nodes for hoisted content like documentPartObject
Adjusted paragraph attribute distribution for split results:
- paraId / textId stay on only one paragraph fragment
- rsid* attrs are still copied to paragraph fragments
Preserved wrapper paragraph metadata on block-only hoisted documentPartObject nodes so paragraph formatting can round-trip through export.
Re-export wrapped w:p > w:sdt when wrapper paragraph metadata is present.
Preserved sectPr placement for hoisted trailing/block-only docPart cases.

Tests

Added and updated regressions for:

block-only docPartObj inside w:p
mixed inline text around docPartObj
no-schema import paths
split paragraph identity handling
block-only wrapper paragraph formatting
docPart export with preserved paragraph wrapper metadata

Limitations

The following limitations are known and accepted for now:

Wrapper paragraph metadata preserved on documentPartObject is primarily used for round-trip/export and is not fully reflected in rendering behavior.
Some paragraph-level attrs may still be duplicated across split paragraph fragments in edge cases.
The no-schema inline fallback is still heuristic-based and may not cover every inline media type.

Allow paragraph translators to return fragment arrays during import and normalize them in the v2 paragraph importer. When the legacy paragraph handler returns mixed fragment output, apply encoded paragraph attributes only to paragraph nodes so embedded documentPartObject fragments remain unchanged. Add coverage for the array-return path in the paragraph translator tests.

…etadata

…cParts

linear · 2026-03-12T19:49:11Z

SD-2190 BUG: Import fails when a list of figures/tables is the first node in the document

github-actions · 2026-03-12T19:52:40Z

Status: FAIL

One spec violation in the export path, plus a minor default value issue.

1. translate-document-part-obj.js — inline SDT containing block content

The new wrapDocumentPartInParagraph function produces:

<w:p w:rsidRDefault="...">
  <w:pPr>...</w:pPr>
  <w:sdt>
    <w:sdtContent>
      <w:p>...</w:p>   <!-- paragraph child! -->
    </w:sdtContent>
  </w:sdt>
</w:p>

Per §17.5.2.31, a w:sdt inside a w:p is an inline-level SDT (CT_SdtRun) whose sdtContent must hold run-level content. A w:docPartObj SDT is explicitly defined (§17.5.2.13) as a block-level construct for tagging "a set of block-level objects." Nesting it inside w:p violates both CT_SdtRun's content model and docPartObj's semantics. See https://ooxml.dev/spec?q=sdtContent+inline+block and https://ooxml.dev/spec?q=docPartObj.

This is clearly intentional — the PR is round-tripping documents that already had this unusual structure — but the export output is technically non-conformant OOXML. If Word generated the original file, it probably placed the docPartObj SDT at body level. The safer export would be to emit the block SDT at body level (with the paragraph attributes forwarded as a preceding w:p if needed) rather than re-creating the malformed wrapping.

2. document-part-object.js — docPartUnique default

docPartUnique: {
  default: true,   // pre-existing, not introduced here
},

This is pre-existing and not changed by this PR, but it interacts directly with the new wrapperParagraph machinery: any documentPartObject node created without an explicit docPartUnique: false will export <w:docPartUnique/>. Per §17.5.2.14, the element being absent is meaningful (the SDT is not a built-in document part container). Defaulting to true causes the exporter to assert built-in status on every SDT, including ones where the original <w:docPartUnique/> was absent. The default should be false or null. This PR doesn't introduce the problem but it does widen the surface where it matters since more nodes now go through this export path.

Everything else checks out — w:id/@w:val as ST_DecimalNumber (§17.5.2.18), w:docPartGallery/@w:val as ST_String (§17.5.2.11), w:docPartUnique as a toggle (§17.17.4 — present-with-no-val implies true), and w:keepNext/@w:val="true" are all handled correctly.

luccas-harbour · 2026-03-12T19:58:06Z

About the OOXML Spec Review failure above, the docx file that originated this PR has exactly the structure that the report is saying isn't valid but was created with Word:

<w:p w14:paraId="41964671" w14:textId="04598795" w:rsidR="00233D7B" w:rsidRPr="003104CE" w:rsidRDefault="00233D7B" w:rsidP="003104CE">
  <w:sdt>
    <w:sdtPr>
      <w:id w:val="123456789"/>
      <w:docPartObj>
        <w:docPartGallery w:val="Table of Figures"/>
        <w:docPartUnique/>
      </w:docPartObj>
    </w:sdtPr>
    <w:sdtContent>
      <w:p w14:paraId="11111111" w14:textId="11111111">
        <w:r>
          <w:t>Table of Figures</w:t>
        </w:r>
      </w:p>
      <w:p w14:paraId="22222222" w14:textId="22222222">
        <w:r>
          <w:t>Figure 1</w:t>
        </w:r>
        <w:r>
          <w:tab/>
        </w:r>
        <w:r>
          <w:t>1</w:t>
        </w:r>
      </w:p>
    </w:sdtContent>
  </w:sdt>
</w:p>

Therefore I don't think the analysis result is valid.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2772dfcb84

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

...per-editor/src/core/super-converter/v3/handlers/w/sdt/helpers/translate-document-part-obj.js

github-actions · 2026-03-12T20:13:01Z

Status: PASS

The OOXML handling in this PR is spec-compliant. Here's what I checked:

w:docPartObj / w:docPartGallery / w:docPartUnique (§17.5.2.13–14): All three are used correctly. w:docPartGallery takes a string w:val, w:docPartUnique is treated as a presence-based boolean (element present → true), and both must be children of w:docPartObj inside w:sdtPr. The test fixtures match this structure exactly.

w:id on w:sdtPr (§17.5.2.18): Used with w:val="123456789" — a valid ST_DecimalNumber integer. ✓

w:keepNext serialization: The export test expects { name: 'w:keepNext', attributes: {} } (i.e., <w:keepNext/>). Per the common boolean property definition (§17.17.4), omitting w:val on a present element implies true. ✓

w:spacing attributes: w:after, w:line as integers and w:lineRule="auto" are all valid. ✓

Paragraph rsid attributes (w:rsidRDefault, w:rsidR, etc.) and w14:paraId / w14:textId: All valid w:p attributes. The identity-attribute partitioning (IDENTITY_ATTR_NAMES) correctly avoids duplicating paraId/textId across synthetic paragraph fragments, since each w:p must have a unique paraId per the Word 2010 namespace spec.

The wrapping behavior (wrapperParagraph): The round-trip produces <w:p><w:sdt w:docPartObj>...</w:sdt></w:p> — this is technically an inline-level SDT (§17.5.2.31) containing block content, which is non-standard. But this mirrors the structure of real-world documents being imported, not a new violation introduced by this PR. The code is faithfully preserving what was there.

No non-existent attributes, no incorrect defaults, and no new spec violations introduced. The wrapperParagraph: { default: null } addition is purely an internal PM attribute with no OOXML representation of its own.

…port

luccas-harbour added 8 commits March 12, 2026 16:43

refactor: extract helper for identifying inline nodes

12e13fc

fix(super-editor): hoist docPart SDTs out of paragraph inline content

f76dc1a

fix(super-editor): preserve inline paragraph content without schema m…

b049345

…etadata

fix(super-editor): keep sectPr on the last split paragraph fragment

29bf838

fix(super-editor): keep paraId/textId on only one split paragraph

77064a2

fix(super-editor): preserve sectPr after hoisted docPart blocks

415e412

fix(super-editor): preserve wrapper paragraph formatting for block do…

2772dfc

…cParts

superdoc-bot bot added the risk: critical label Mar 12, 2026

luccas-harbour marked this pull request as ready for review March 12, 2026 19:59

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

...per-editor/src/core/super-converter/v3/handlers/w/sdt/helpers/translate-document-part-obj.js Outdated Show resolved Hide resolved

fix(super-editor): preserve paragraph XML attrs on wrapped docPart ex…

d01e38d

…port

luccas-harbour requested review from VladaHarbour, caio-pizzol, harbournick and tupizz March 12, 2026 20:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: import of w:sdt nested in paragraph (SD-2190)#2380

fix: import of w:sdt nested in paragraph (SD-2190)#2380
luccas-harbour wants to merge 9 commits intomainfrom
luccas/sd-2190-bug-import-fails-when-a-list-of-figurestables-is-the-first

luccas-harbour commented Mar 12, 2026

Uh oh!

linear bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

luccas-harbour commented Mar 12, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

luccas-harbour commented Mar 12, 2026

Summary

Problem

What changed

Tests

Limitations

Uh oh!

linear bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

luccas-harbour commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

luccas-harbour commented Mar 12, 2026 •

edited

Loading