-
-
Couldn't load subscription status.
- Fork 817
Description
What were you trying to do?
I was trying to embed files in a PDF and then read them back, expecting the embedded file content to match the original data exactly.
How did you attempt to do it?
I used the standard pdfDoc.attach() method to embed a file, then read it back:
import { PDFDocument } from 'pdf-lib';
const pdfDoc = await PDFDocument.create();
const originalContent = 'Hello World - this is test content';
const fileBytes = new TextEncoder().encode(originalContent);
await pdfDoc.attach(fileBytes, 'test.txt', {
mimeType: 'text/plain',
});
const pdfBytes = await pdfDoc.save();
const readPdf = await PDFDocument.load(pdfBytes);
const embeddedFiles = await readPdf.getEmbeddedFiles();
const retrievedBytes = embeddedFiles['test.txt'];
const retrievedContent = new TextDecoder().decode(retrievedBytes);
console.log('Original:', originalContent);
console.log('Retrieved:', retrievedContent);
What actually happened?
The retrieved content shows compressed binary data instead of the original text. Examining the raw bytes shows the embedded file starts with 78 9c (zlib compression header), but the PDF doesn't declare any compression filters in the stream dictionary. This creates malformed PDF objects where the stream is compressed but no Filter entry indicates this.
What did you expect to happen?
I expected one of two correct behaviors:
- The embedded file should be stored uncompressed, so retrieved content matches original exactly
- OR if compression is used, the PDF stream should properly declare Filter: ['FlateDecode'] to indicate zlib compression
Currently pdf-lib creates malformed PDF streams - they're compressed but not marked as such in the PDF structure.
How can we reproduce the issue?
import { PDFDocument } from 'pdf-lib';
import fs from 'fs';
async function testEmbedding() {
const pdfDoc = await PDFDocument.create();
const testContent = 'This is plain text content that should be readable';
const fileBytes = new TextEncoder().encode(testContent);
await pdfDoc.attach(fileBytes, 'test.txt');
const pdfBytes = await pdfDoc.save();
fs.writeFileSync('test.pdf', pdfBytes);
const readPdf = await PDFDocument.load(pdfBytes);
const embeddedFiles = await readPdf.getEmbeddedFiles();
const retrieved = embeddedFiles['test.txt'];
console.log('Original bytes:', Array.from(fileBytes.slice(0, 10)));
console.log('Retrieved bytes:', Array.from(retrieved.slice(0, 10)));
console.log('Retrieved as text:', new TextDecoder().decode(retrieved));
if (retrieved[0] === 0x78 && retrieved[1] === 0x9c) {
console.log('ERROR: File is zlib compressed but the PDF did not declare this');
}
}
testEmbedding();
Steps:
- Run the code above
- Observe that retrieved bytes don't match original
- Note the 78 9c header indicating zlib compression
- The PDF stream lacks proper Filter declaration for the compression
Version
1.17.1
What environment are you running pdf-lib in?
Browser
Checklist
- My report includes a Short, Self Contained, Correct (Compilable) Example.
- I have attached all PDFs, images, and other files needed to run my SSCCE.
Additional Notes
The root cause is in FileEmbedder.ts line 62, which uses context.flateStream() to compress the data but doesn't properly set the stream's Filter dictionary entry. This creates malformed PDFs where streams are compressed but not declared as such. Either pdf-lib should disable compression by default or properly mark compressed streams in the PDF structure.