Skip to content

Conversation

@BS-yzl-KubilaydelRioFernandez

Summary

This PR implements semantic tree normalization for PDF/A-1a documents by removing non-standard structural nodes and promoting their children to the correct hierarchy level. The normalization is applied immediately before the semantic tree is finalized and assigned to the document canvas, ensuring PDF/A-1a compliance without affecting other conformance modes.

Key Features

  • Removes non-standard structural nodes (TBody, THead, TFoot) from the semantic tree
  • Preserves document order by promoting child nodes in place
  • Applies normalization only for PDF/A-1a conformance
  • Performs in-place tree mutation with no additional allocations
  • Safely handles deeply nested semantic structures
  • Keeps semantic tree lifecycle and normalization logic encapsulated within SemanticTreeManager

Technical Implementation

The semantic tree normalization logic is implemented directly inside SemanticTreeManager via the RemoveNonStandardStructureTypes method. This method performs a depth-first traversal of the tree and removes structural-only nodes that are not allowed under PDF/A-1a. When such a node is encountered, it is removed and its children are reinserted into the parent at the same position, preserving the original ordering.

The normalization is invoked from ConfigureWithSemanticTree only when PDF/A-1a conformance is enabled. This ensures that stricter structural requirements are applied conditionally and do not affect other PDF standards.

After normalization, the semantic tree is retrieved, the manager state is reset, and the finalized tree is passed to the document canvas. This guarantees that:

  • The tree is fully normalized before rendering
  • The manager does not retain state beyond document configuration
  • The canvas receives a stable, compliant semantic structure

Structural node types subject to removal are centralized in a static lookup to ensure fast evaluation and easy extensibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant