Skip to content

EnaiaInc/redlines

Repository files navigation

Hex.pm Hexdocs.pm Github.com

Redlines

Extract and normalize tracked changes ("redlines") from DOC, DOCX, and PDF documents into a single unified shape.

Redlines parses legacy .doc changes via doc_redlines, parses <w:ins> and <w:del> elements from DOCX files, and uses pdf_redlines (precompiled Rust/MuPDF NIF) for PDF extraction. All changes are normalized into Redlines.Change structs regardless of source format.

Installation

Add :redlines to your dependencies:

def deps do
  [
    {:redlines, "~> 0.9.2"}
  ]
end

PDF support is included out of the box via the precompiled pdf_redlines NIF -- no Rust toolchain required. DOC support is included via the bundled doc_redlines dependency.

Usage

Extracting Changes

# DOCX - extracts <w:ins> and <w:del> from word/document.xml
{:ok, %Redlines.Result{changes: changes, source: :docx}} =
  Redlines.extract("contract_v2.docx")

# DOC - extracts tracked insertions/deletions from legacy Word binary format
{:ok, %Redlines.Result{changes: changes, source: :doc}} =
  Redlines.extract("contract_v2.doc")

# DOCX - accept track changes and get cleaned DOCX bytes
{:ok, cleaned_docx} = Redlines.clean_docx("contract_v2.docx")
File.write!("contract_v2_clean.docx", cleaned_docx)

# DOCX - accept track changes and get informational warnings about other revision markup seen
{:ok, cleaned_docx, warnings} = Redlines.clean_docx_with_warnings("contract_v2.docx")

# PDF
{:ok, %Redlines.Result{changes: changes, source: :pdf}} =
  Redlines.extract("contract_v2.pdf")

# Override type inference
{:ok, result} = Redlines.extract("document.bin", type: :docx)

The Change Struct

Every tracked change is normalized into a Redlines.Change:

%Redlines.Change{
  type: :deletion | :insertion | :paired,
  deletion: "removed text" | nil,
  insertion: "added text" | nil,
  location: "page 3, paragraph 2" | nil,
  meta: %{"source" => "docx", "author" => "Alice", "date" => "2026-01-15T10:00:00Z"}
}
  • :deletion - Text was removed
  • :insertion - Text was added
  • :paired - A deletion and insertion that represent a replacement

Formatting for LLM Prompts

format_for_llm/2 produces a structured text summary suitable for including in LLM prompts:

Redlines.format_for_llm(changes)
# DELETIONS (removed content):
#   - "the old clause"
#
#
# INSERTIONS (new content):
#   + "the new clause"
#
#
# DELETED → INSERTED:
#   "old term" → "new term"

Options:

  • :pair_separator - Separator between deleted/inserted pairs (default "→")
  • :max_len - Truncation length for long text (default 150)

Accepts a Redlines.Result, a list of Redlines.Change structs, a raw DOCX track-changes map, or a list of PDF redline entries.

Performance

PDF extraction uses a precompiled Rust NIF and finishes under 700 ms even on large scanned documents (35 MB+). DOCX parsing is pure Elixir XML and is effectively instant.

License

MIT - see LICENSE.

About

Document change tracking library for Elixir

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages