Skip to content

embedding files via attach() should not compress by default or at least mark it as such #1760

@tcurdt

Description

@tcurdt

What were you trying to do?

I was trying to embed files in a PDF and then read them back, expecting the embedded file content to match the original data exactly.

How did you attempt to do it?

I used the standard pdfDoc.attach() method to embed a file, then read it back:

  import { PDFDocument } from 'pdf-lib';

  const pdfDoc = await PDFDocument.create();
  const originalContent = 'Hello World - this is test content';
  const fileBytes = new TextEncoder().encode(originalContent);

  await pdfDoc.attach(fileBytes, 'test.txt', {
    mimeType: 'text/plain',
  });

  const pdfBytes = await pdfDoc.save();
  const readPdf = await PDFDocument.load(pdfBytes);
  const embeddedFiles = await readPdf.getEmbeddedFiles();
  const retrievedBytes = embeddedFiles['test.txt'];
  const retrievedContent = new TextDecoder().decode(retrievedBytes);

  console.log('Original:', originalContent);
  console.log('Retrieved:', retrievedContent);

What actually happened?

The retrieved content shows compressed binary data instead of the original text. Examining the raw bytes shows the embedded file starts with 78 9c (zlib compression header), but the PDF doesn't declare any compression filters in the stream dictionary. This creates malformed PDF objects where the stream is compressed but no Filter entry indicates this.

What did you expect to happen?

I expected one of two correct behaviors:

  1. The embedded file should be stored uncompressed, so retrieved content matches original exactly
  2. OR if compression is used, the PDF stream should properly declare Filter: ['FlateDecode'] to indicate zlib compression

Currently pdf-lib creates malformed PDF streams - they're compressed but not marked as such in the PDF structure.

How can we reproduce the issue?

  import { PDFDocument } from 'pdf-lib';
  import fs from 'fs';

  async function testEmbedding() {
    const pdfDoc = await PDFDocument.create();
    const testContent = 'This is plain text content that should be readable';
    const fileBytes = new TextEncoder().encode(testContent);

    await pdfDoc.attach(fileBytes, 'test.txt');
    const pdfBytes = await pdfDoc.save();
    fs.writeFileSync('test.pdf', pdfBytes);

    const readPdf = await PDFDocument.load(pdfBytes);
    const embeddedFiles = await readPdf.getEmbeddedFiles();
    const retrieved = embeddedFiles['test.txt'];

    console.log('Original bytes:', Array.from(fileBytes.slice(0, 10)));
    console.log('Retrieved bytes:', Array.from(retrieved.slice(0, 10)));
    console.log('Retrieved as text:', new TextDecoder().decode(retrieved));

    if (retrieved[0] === 0x78 && retrieved[1] === 0x9c) {
      console.log('ERROR: File is zlib compressed but the PDF did not declare this');
    }
  }

  testEmbedding();

Steps:

  1. Run the code above
  2. Observe that retrieved bytes don't match original
  3. Note the 78 9c header indicating zlib compression
  4. The PDF stream lacks proper Filter declaration for the compression

Version

1.17.1

What environment are you running pdf-lib in?

Browser

Checklist

  • My report includes a Short, Self Contained, Correct (Compilable) Example.
  • I have attached all PDFs, images, and other files needed to run my SSCCE.

Additional Notes

The root cause is in FileEmbedder.ts line 62, which uses context.flateStream() to compress the data but doesn't properly set the stream's Filter dictionary entry. This creates malformed PDFs where streams are compressed but not declared as such. Either pdf-lib should disable compression by default or properly mark compressed streams in the PDF structure.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions