|
| 1 | +## Overview |
| 2 | + |
| 3 | +PDF.js is a Portable Document Format (PDF) viewer built with JavaScript, HTML5 Canvas, and CSS. It's a Mozilla project that provides a general-purpose, web standards-based platform for parsing and rendering PDFs without requiring native code or plugins. |
| 4 | + |
| 5 | +## Common Commands |
| 6 | + |
| 7 | +### Development Server |
| 8 | +```bash |
| 9 | +npx gulp server |
| 10 | +``` |
| 11 | +Then open http://localhost:8888/web/viewer.html to view the PDF viewer. Test PDFs are available at http://localhost:8888/test/pdfs/?frame |
| 12 | + |
| 13 | +### Building |
| 14 | + |
| 15 | +Build for modern browsers: |
| 16 | +```bash |
| 17 | +npx gulp generic |
| 18 | +``` |
| 19 | + |
| 20 | +This generates `pdf.js` and `pdf.worker.js` in `build/generic/build/`. |
| 21 | + |
| 22 | +Build for distribution (creates pdfjs-dist package): |
| 23 | +```bash |
| 24 | +npx gulp dist |
| 25 | +npx gulp dist-install # Build and install locally |
| 26 | +``` |
| 27 | + |
| 28 | +### Testing |
| 29 | + |
| 30 | +Run all tests: |
| 31 | +```bash |
| 32 | +npx gulp test |
| 33 | +``` |
| 34 | + |
| 35 | +Run unit tests only: |
| 36 | +```bash |
| 37 | +npx gulp unittest |
| 38 | +``` |
| 39 | + |
| 40 | +Run integration tests (browser-based tests using Puppeteer): |
| 41 | +```bash |
| 42 | +npx gulp integrationtest |
| 43 | +``` |
| 44 | + |
| 45 | +Run font tests: |
| 46 | +```bash |
| 47 | +npx gulp fonttest |
| 48 | +``` |
| 49 | + |
| 50 | +Run a single test file by modifying test/test_manifest.json or using test runner options. |
| 51 | + |
| 52 | +### Linting and Formatting |
| 53 | + |
| 54 | +Lint JavaScript: |
| 55 | +```bash |
| 56 | +npx gulp lint |
| 57 | +``` |
| 58 | + |
| 59 | +Format code (uses Prettier and ESLint): |
| 60 | +```bash |
| 61 | +npx eslint --fix <file> |
| 62 | +``` |
| 63 | + |
| 64 | +### Type Checking |
| 65 | + |
| 66 | +Run TypeScript type checking: |
| 67 | +```bash |
| 68 | +npx gulp typestest |
| 69 | +``` |
| 70 | + |
| 71 | +## Architecture |
| 72 | + |
| 73 | +### High-Level Structure |
| 74 | + |
| 75 | +PDF.js has a multi-layer architecture that separates concerns between PDF parsing, rendering, and UI: |
| 76 | + |
| 77 | +#### 1. Core Layer (`src/core/`) |
| 78 | +The core layer handles PDF parsing and interpretation. Key responsibilities: |
| 79 | +- **PDF parsing**: Parsing PDF structure, cross-reference tables, streams |
| 80 | +- **Font handling**: CFF, TrueType, Type1 font parsing and conversion (`font.js`, `fonts.js`, `cff_*.js`, `type1_*.js`) |
| 81 | +- **Image decoding**: JPEG, JBIG2, JPX/JPEG2000 decoders |
| 82 | +- **Operators**: Processing PDF drawing operators (`operator_list.js`, `evaluator.js`) |
| 83 | +- **XFA Forms**: XML Forms Architecture support (`src/core/xfa/`) |
| 84 | +- **Color spaces**: ICC profiles, device color spaces (`colorspace.js`, `icc_colorspace.js`) |
| 85 | +- Runs in a Web Worker for performance isolation |
| 86 | + |
| 87 | +Entry point: `src/pdf.worker.js` |
| 88 | + |
| 89 | +#### 2. Display Layer (`src/display/`) |
| 90 | +The display layer provides the API for rendering PDFs to canvas and managing documents. Key components: |
| 91 | +- **API**: Main public API (`api.js`) - `PDFDocumentProxy`, `PDFPageProxy`, `getDocument()` |
| 92 | +- **Canvas rendering**: Renders PDF operations to HTML5 canvas (`canvas.js`) |
| 93 | +- **Text layer**: Extracts and positions text for selection/search (`text_layer.js`) |
| 94 | +- **Annotation layer**: Renders and handles PDF annotations (`annotation_layer.js`) |
| 95 | +- **Editor layer**: Supports PDF editing (annotations, highlights, stamps) (`editor/`) |
| 96 | +- **Metadata**: Parses XMP metadata (`metadata.js`) |
| 97 | +- **Streams**: Handles PDF data fetching (fetch, network, node) (`fetch_stream.js`, `network.js`, `node_stream.js`) |
| 98 | + |
| 99 | +Entry point: `src/pdf.js` |
| 100 | + |
| 101 | +#### 3. Scripting Layer (`src/scripting_api/`) |
| 102 | +Implements JavaScript execution for interactive PDFs (form calculations, validations, button actions). |
| 103 | +- Sandboxed execution environment |
| 104 | +- Implements Acrobat JavaScript API objects (App, Doc, Field, etc.) |
| 105 | + |
| 106 | +Entry points: `src/pdf.scripting.js`, `src/pdf.sandbox.js` |
| 107 | + |
| 108 | +#### 4. Web Viewer (`web/`) |
| 109 | +The complete PDF viewer application with UI. Key components: |
| 110 | +- **Main app**: Application orchestration (`app.js`) |
| 111 | +- **Viewer**: Page rendering and layout (`pdf_viewer.js`, `pdf_page_view.js`) |
| 112 | +- **Toolbar**: Zoom, page navigation, print, download controls |
| 113 | +- **Sidebar**: Thumbnails, outlines, attachments (`pdf_sidebar.js`, `pdf_thumbnail_view.js`, `pdf_outline_viewer.js`) |
| 114 | +- **Find controller**: Text search functionality (`pdf_find_controller.js`) |
| 115 | +- **Annotation editors**: UI for creating/editing annotations (`annotation_editor_layer_builder.js`) |
| 116 | +- **Presentation mode**: Full-screen presentation (`pdf_presentation_mode.js`) |
| 117 | + |
| 118 | +Entry point: `web/viewer.html` + `web/viewer.mjs` |
| 119 | + |
| 120 | +#### 5. Shared Utilities (`src/shared/`) |
| 121 | +Common utilities used across layers: |
| 122 | +- **Message handling**: Worker communication (`message_handler.js`) |
| 123 | +- **Utilities**: Common functions and constants (`util.js`) |
| 124 | +- **Image utilities**: Image processing helpers (`image_utils.js`) |
| 125 | + |
| 126 | +### Worker Communication |
| 127 | + |
| 128 | +PDF.js uses a Web Worker architecture: |
| 129 | +- Main thread (`display` layer) communicates with worker thread (`core` layer) via `MessageHandler` |
| 130 | +- Keeps PDF parsing off the main thread for better performance |
| 131 | +- Messages include: page rendering requests, text content extraction, metadata queries |
| 132 | + |
| 133 | +### Build System |
| 134 | + |
| 135 | +- Uses **Gulp** for build orchestration (`gulpfile.mjs`) |
| 136 | +- **Webpack** bundles modules into browser-compatible formats |
| 137 | +- **Babel** transpiles for browser compatibility (configurable targets in gulpfile) |
| 138 | +- Preprocessor replaces build-time constants (e.g., `typeof PDFJSDev !== "undefined"` checks) |
| 139 | +- Multiple build targets: generic, components, minified, legacy (older browser support) |
| 140 | + |
| 141 | +### External Dependencies |
| 142 | + |
| 143 | +Located in `external/`: |
| 144 | +- **bcmaps**: Binary CMaps for CJK fonts |
| 145 | +- **standard_fonts**: Core 14 PDF fonts metrics |
| 146 | +- **cmapscompress**: Tools for compressing CMaps |
| 147 | +- **openjpeg**: JPEG2000 decoder (WASM) |
| 148 | +- **quickjs**: JavaScript engine for sandboxed execution |
| 149 | + |
| 150 | +### Translations |
| 151 | + |
| 152 | +Translations in `l10n/` are imported from Mozilla Firefox Nightly. Only the file l10n/en-US/viewer.ftl can be updated. |
| 153 | + |
| 154 | +## Development Notes |
| 155 | + |
| 156 | +### Adding New Features |
| 157 | + |
| 158 | +When adding features that span multiple layers: |
| 159 | +1. Start with the `core` layer if parsing/interpretation changes are needed |
| 160 | +2. Update the `display` layer API if new capabilities need exposure |
| 161 | +3. Modify the `web` viewer if UI changes are required |
| 162 | +4. Ensure worker communication handles new message types |
| 163 | + |
| 164 | +### Preprocessor Directives |
| 165 | + |
| 166 | +Code uses preprocessor checks for build-time conditionals: |
| 167 | +```javascript |
| 168 | +if (typeof PDFJSDev !== "undefined" && PDFJSDev.test("GENERIC")) { |
| 169 | + // Generic build-specific code |
| 170 | +} |
| 171 | +``` |
| 172 | + |
| 173 | +Common flags: `GENERIC`, `MOZCENTRAL`, `CHROME`, `MINIFIED`, `TESTING`, `LIB`, `SKIP_BABEL`, `IMAGE_DECODERS` |
| 174 | + |
| 175 | +### Testing |
| 176 | + |
| 177 | +- Unit tests use Jasmine framework (`test/unit/`) |
| 178 | +- Integration tests use Puppeteer for browser automation (`test/integration/`) |
| 179 | +- Test PDFs downloaded from manifest (`test/test_manifest.json`) |
| 180 | +- Reference images for visual regression testing (`test/ref/`) |
| 181 | + |
| 182 | +### Code Style |
| 183 | + |
| 184 | +- Uses ESLint with custom configuration (`eslint.config.mjs`) |
| 185 | +- Prettier for formatting |
| 186 | +- Stylelint for CSS |
| 187 | +- No semicolons required (ASI enabled) |
| 188 | +- Single quotes for strings |
| 189 | + |
| 190 | +### Pull Request Process |
| 191 | + |
| 192 | +- Keep PRs focused on a single issue |
| 193 | +- Provide a test PDF if the issue is PDF-specific |
| 194 | +- Ensure tests pass (`npx gulp test`) |
| 195 | +- Run linting (`npx gulp lint`) |
| 196 | +- Follow existing code patterns |
| 197 | +- Don't modify translations directly (they come from Firefox) |
| 198 | + |
| 199 | +### Performance Considerations |
| 200 | + |
| 201 | +- Core parsing runs in a Web Worker - keep main thread work minimal |
| 202 | +- Canvas rendering can be expensive - use appropriate scale factors |
| 203 | +- Text layer generation is separate from rendering - can be deferred |
| 204 | +- Annotation layer is optional - only enable when needed |
0 commit comments