Command line tool that splits a PDF file by its table of contents. Written in Python.
First, make sure you have the required libraries installed.
pip install -r requirements.txtpython pdf_splitter.py <OPTIONS> file.pdfThere are five options:
--dry-run: Simulates the split. Prints the filenames, but does not create any files. Useful when checking if the options have been set correctly.--depth INTEGER: The level of depth at which the splits will occur. See figure 1 for a visual explanation. Default value is set to 1.--regex TEXT: Selects outline items that match a RegEx pattern. For example--regex "^Chapter"will only select outline items that start with the stringChapter.--overlap: Overlaps split points. By default, if Chapter 1 starts at page 1 and Chapter 2 starts at page 10,Chapter 1.pdfwill contain pages 1–9, andChapter 2.pdfwill contain pages 10–.... By including the--overlapoption,Chapter 1.pdfwill now contain pages 1–10, andChapter 2.pdfwill contain pages 10..., etc. This is a useful option, if a section can start in the middle of a page.--prefix TEXT: Adds a prefix to the output filenames. For example,--prefix "CLRS "will result in output files namedCLRS <outline element>.pdf.
Licensed under AGPL.

