Skip to content

Conversation

@skypher
Copy link

@skypher skypher commented Dec 21, 2025

Summary

This PR adds a comprehensive fuzzing infrastructure for OSS-Fuzz integration, covering all major attack surfaces in pdf.js.

What's included

  • 16 fuzz targets covering:

    • Core: PDF parser, crypto, colorspace
    • Fonts: CFF parser, Type1 parser, CMap parser
    • Images: JPEG, JBIG2, JPX decoders
    • Streams: Flate, CCITT, LZW decoders
    • XFA/XML: XFA parser, XML parser, FormCalc parser, PostScript parser
  • Format-specific dictionaries for guided fuzzing

  • Seed corpus with 202 test samples

  • OSS-Fuzz configuration (Dockerfile, build.sh, project.yaml)

  • Proper async handling for Jazzer.js compatibility

Coverage

Initial corpus testing shows:

  • 38.87% statement coverage
  • 74.41% branch coverage
  • Core parsers (CFF, Type1, stream) at 85%+ coverage

Testing

All 16 fuzzers tested and passing:

  • 202/202 corpus samples pass
  • Proper error handling and resource cleanup
  • Memory/timeout limits configured via .options files

OSS-Fuzz Configuration

  • Fuzzing engines: libfuzzer, AFL++, honggfuzz
  • Sanitizers: AddressSanitizer, UndefinedBehaviorSanitizer
  • Builds use lib-legacy for Node.js/CommonJS compatibility

Test plan

  • All 16 fuzzers execute successfully
  • All 202 corpus samples pass
  • Dictionaries contain valid format-specific tokens
  • OSS-Fuzz build configuration is correct

This commit adds 16 fuzz targets covering all major attack surfaces:

**Image Decoders:**
- jpeg_image: JPEG decoder (jpg.js)
- jbig2_image: JBIG2 binary image decoder
- jpx_image: JPEG2000 decoder

**Stream Decoders:**
- flate_stream: Zlib/Deflate decompression
- ccitt_stream: CCITT fax decoder (Group 3/4)
- lzw_stream: LZW decompression

**Font Parsers:**
- cff_parser: Compact Font Format parser
- type1_parser: Type1 PostScript font parser
- cmap_parser: Character map parser

**Core Parsing:**
- pdf_parser: Full PDF document parsing pipeline
- crypto: RC4, AES-128, AES-256 ciphers and hashes
- colorspace: Colorspace and ICC profile parsing

**XFA/XML:**
- xfa_parser: XFA form parsing
- xml_parser: XML/XMP metadata parsing
- formcalc_parser: FormCalc script language parser
- ps_parser: PostScript calculator functions

Each fuzzer includes:
- Format-specific dictionary for guided fuzzing
- Seed corpus with minimal valid samples
- Input size limits to prevent resource exhaustion
- Proper error handling (only re-throws OOM/stack overflow)

Also includes OSS-Fuzz configuration files (Dockerfile, build.sh,
project.yaml) with support for multiple sanitizers (ASan, UBSan)
and fuzzing engines (libfuzzer, AFL++, honggfuzz).
Add 183 additional corpus samples including:
- PDFs with annotations, fonts, XFA forms, and patterns
- CCITT, JBIG2, JPX, and LZW stream samples
- Colorspace test samples
- CFF and Type1 font samples
- FormCalc and PostScript function samples
- XML/XMP metadata samples
- Coverage runner script for testing

This expands coverage from ~36% to ~39% statement coverage
with 74% branch coverage across 202 test samples.
1. Remove --sync flag from all fuzzer compilations
   - All fuzzers use async functions that return Promises
   - Jazzer.js handles async fuzzers correctly without --sync
   - Prevents shallow fuzzing and uncaught promise rejections

2. Update repository URLs to upstream mozilla/pdf.js
   - Dockerfile: Clone from mozilla/pdf.js instead of fork
   - project.yaml: Point homepage and main_repo to upstream
npm install

# Install Jazzer.js for fuzzing
npm install --save-dev @jazzer.js/core
Copy link
Contributor

@timvandermeij timvandermeij Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to have OSS-Fuzz integration before, but removed it in #19307 because, among other reasons, this library is deprecated so we really don't want to rely on that as it brings all kinds of maintenance problems (see https://www.npmjs.com/package/@jazzer.js/core).

Moreover, we had limited to no visibility on the output of the fuzzers because they were ran in a different (the OSS-Fuzz) repository, and keeping cross-repository builds working was a bit difficult because we cannot easily verify that changes made here keep the builds at OSS-Fuzz working.

In short, how will this PR address the original concerns that led to the removal of the fuzzers?

@timvandermeij timvandermeij removed the request for review from calixteman December 21, 2025 13:22
@calixteman
Copy link
Contributor

I never have been convinced that fuzzing js code is so useful.

OSS-Fuzz Configuration
Fuzzing engines: libfuzzer, AFL++, honggfuzz
Sanitizers: AddressSanitizer, UndefinedBehaviorSanitizer

especially when I read something like that ^^.
I don't pretend that our various parsers are bug-free: they've for sure some bugs but I'd prefer having some real-life buggy pdfs which are valid in other viewers.
As far as I can tell, there is no risk that a buggy parser will lead to a bad crash of the browser. And if you manage to have such a bad bug, the problem is on the js side and not on the pdf.js one which would just an helper in such a case.

So we'd be happy to accept such a PR, but we really need to have a strong evidence that it's useful. As mentioned by @timvandermeij, we already tried in the past and we never had any feedback.
If you don't have any evidence, please explain us or give us some real-life examples where fuzzing have been really useful in order to catch bad bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants