Add comprehensive OSS-Fuzz integration #20511

skypher · 2025-12-21T08:42:18Z

Summary

This PR adds a comprehensive fuzzing infrastructure for OSS-Fuzz integration, covering all major attack surfaces in pdf.js.

What's included

16 fuzz targets covering:
- Core: PDF parser, crypto, colorspace
- Fonts: CFF parser, Type1 parser, CMap parser
- Images: JPEG, JBIG2, JPX decoders
- Streams: Flate, CCITT, LZW decoders
- XFA/XML: XFA parser, XML parser, FormCalc parser, PostScript parser
Format-specific dictionaries for guided fuzzing
Seed corpus with 202 test samples
OSS-Fuzz configuration (Dockerfile, build.sh, project.yaml)
Proper async handling for Jazzer.js compatibility

Coverage

Initial corpus testing shows:

38.87% statement coverage
74.41% branch coverage
Core parsers (CFF, Type1, stream) at 85%+ coverage

Testing

All 16 fuzzers tested and passing:

202/202 corpus samples pass
Proper error handling and resource cleanup
Memory/timeout limits configured via .options files

OSS-Fuzz Configuration

Fuzzing engines: libfuzzer, AFL++, honggfuzz
Sanitizers: AddressSanitizer, UndefinedBehaviorSanitizer
Builds use lib-legacy for Node.js/CommonJS compatibility

Test plan

All 16 fuzzers execute successfully
All 202 corpus samples pass
Dictionaries contain valid format-specific tokens
OSS-Fuzz build configuration is correct

This commit adds 16 fuzz targets covering all major attack surfaces: **Image Decoders:** - jpeg_image: JPEG decoder (jpg.js) - jbig2_image: JBIG2 binary image decoder - jpx_image: JPEG2000 decoder **Stream Decoders:** - flate_stream: Zlib/Deflate decompression - ccitt_stream: CCITT fax decoder (Group 3/4) - lzw_stream: LZW decompression **Font Parsers:** - cff_parser: Compact Font Format parser - type1_parser: Type1 PostScript font parser - cmap_parser: Character map parser **Core Parsing:** - pdf_parser: Full PDF document parsing pipeline - crypto: RC4, AES-128, AES-256 ciphers and hashes - colorspace: Colorspace and ICC profile parsing **XFA/XML:** - xfa_parser: XFA form parsing - xml_parser: XML/XMP metadata parsing - formcalc_parser: FormCalc script language parser - ps_parser: PostScript calculator functions Each fuzzer includes: - Format-specific dictionary for guided fuzzing - Seed corpus with minimal valid samples - Input size limits to prevent resource exhaustion - Proper error handling (only re-throws OOM/stack overflow) Also includes OSS-Fuzz configuration files (Dockerfile, build.sh, project.yaml) with support for multiple sanitizers (ASan, UBSan) and fuzzing engines (libfuzzer, AFL++, honggfuzz).

Add 183 additional corpus samples including: - PDFs with annotations, fonts, XFA forms, and patterns - CCITT, JBIG2, JPX, and LZW stream samples - Colorspace test samples - CFF and Type1 font samples - FormCalc and PostScript function samples - XML/XMP metadata samples - Coverage runner script for testing This expands coverage from ~36% to ~39% statement coverage with 74% branch coverage across 202 test samples.

1. Remove --sync flag from all fuzzer compilations - All fuzzers use async functions that return Promises - Jazzer.js handles async fuzzers correctly without --sync - Prevents shallow fuzzing and uncaught promise rejections 2. Update repository URLs to upstream mozilla/pdf.js - Dockerfile: Clone from mozilla/pdf.js instead of fork - project.yaml: Point homepage and main_repo to upstream

timvandermeij · 2025-12-21T13:22:37Z

test/fuzz/ossfuzz/build.sh

+npm install
+
+# Install Jazzer.js for fuzzing
+npm install --save-dev @jazzer.js/core


We used to have OSS-Fuzz integration before, but removed it in #19307 because, among other reasons, this library is deprecated so we really don't want to rely on that as it brings all kinds of maintenance problems (see https://www.npmjs.com/package/@jazzer.js/core).

Moreover, we had limited to no visibility on the output of the fuzzers because they were ran in a different (the OSS-Fuzz) repository, and keeping cross-repository builds working was a bit difficult because we cannot easily verify that changes made here keep the builds at OSS-Fuzz working.

In short, how will this PR address the original concerns that led to the removal of the fuzzers?

calixteman · 2025-12-21T18:43:27Z

I never have been convinced that fuzzing js code is so useful.

OSS-Fuzz Configuration
Fuzzing engines: libfuzzer, AFL++, honggfuzz
Sanitizers: AddressSanitizer, UndefinedBehaviorSanitizer

especially when I read something like that ^^.
I don't pretend that our various parsers are bug-free: they've for sure some bugs but I'd prefer having some real-life buggy pdfs which are valid in other viewers.
As far as I can tell, there is no risk that a buggy parser will lead to a bad crash of the browser. And if you manage to have such a bad bug, the problem is on the js side and not on the pdf.js one which would just an helper in such a case.

So we'd be happy to accept such a PR, but we really need to have a strong evidence that it's useful. As mentioned by @timvandermeij, we already tried in the past and we never had any feedback.
If you don't have any evidence, please explain us or give us some real-life examples where fuzzing have been really useful in order to catch bad bugs.

skypher added 3 commits December 21, 2025 08:16

timvandermeij added the test label Dec 21, 2025

timvandermeij requested a review from calixteman December 21, 2025 13:18

timvandermeij reviewed Dec 21, 2025

View reviewed changes

timvandermeij removed the request for review from calixteman December 21, 2025 13:22

Remove ASan/UBSan sanitizers (not applicable to JavaScript)

182549e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add comprehensive OSS-Fuzz integration #20511

Add comprehensive OSS-Fuzz integration #20511

Uh oh!

skypher commented Dec 21, 2025

Uh oh!

timvandermeij Dec 21, 2025 •

edited

Loading

Uh oh!

calixteman commented Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add comprehensive OSS-Fuzz integration #20511

Are you sure you want to change the base?

Add comprehensive OSS-Fuzz integration #20511

Uh oh!

Conversation

skypher commented Dec 21, 2025

Summary

What's included

Coverage

Testing

OSS-Fuzz Configuration

Test plan

Uh oh!

timvandermeij Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

calixteman commented Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

timvandermeij Dec 21, 2025 •

edited

Loading