Split ebook PDFs by chapter using PDF bookmarks, then upload chapters to Google NotebookLM for audio/video overview generation.
- How It Works
- Installation
- Prerequisites
- Usage
- Syllabus Workflow — Automated Chunked Generation
- Typical Workflow
- Options Reference
- How Chapter Detection Works
- Troubleshooting
- Acknowledgements
- License
Most ebook PDFs contain a Table of Contents (TOC) stored as PDF bookmarks — structured markers that map chapter titles to page numbers. This tool:
- Reads those bookmark entries via PyMuPDF's
get_toc()API - Splits the PDF at chapter boundaries into individual files
- Preserves the internal TOC structure within each chapter file
- Uploads the chapter files to Google NotebookLM (one notebook per book)
- Lets you generate deep-dive audio overviews and whiteboard video explainers for any chapter range on demand
Output files are named {book}_chapter_{nn}_{title}.pdf and written to the output directory.
Requires Python 3.11+.
# From local checkout
uv tool install .
# From git
uv tool install git+https://github.com/NetDevAutomate/notebooklm-pdf-by-chapters.gitThe split command works out of the box — no extra setup needed.
For NotebookLM features (process, list, generate, download), authenticate first:
pip install notebooklm-py[browser]
notebooklm loginThis opens a browser for Google cookie-based auth. Credentials are stored locally.
Split a single PDF into per-chapter files:
pdf-by-chapters split "my_ebook.pdf"Specify an output directory:
pdf-by-chapters split "my_ebook.pdf" -o ./chaptersSplit all PDFs in a directory:
pdf-by-chapters split ./ebooks/ -o ./chaptersSplit at a different TOC level (e.g., level 2 for sub-chapters):
pdf-by-chapters split "my_ebook.pdf" -l 2Split a PDF and upload all chapters to a new NotebookLM notebook:
pdf-by-chapters process "my_ebook.pdf"If a notebook with the same book title already exists, it reuses it instead of creating a duplicate.
Process a directory of PDFs — each book gets its own subdirectory and notebook:
pdf-by-chapters process ./ebooks/ -o ./chaptersThis creates chapters/{book_name}/ for each PDF and a separate notebook per book.
Upload to an existing notebook by ID:
pdf-by-chapters process "my_ebook.pdf" -n NOTEBOOK_IDOn completion, a summary table is displayed:
┌──────────────────────────────────────┬──────────────────────────────────────┬──────────┐
│ Notebook Name │ ID │ Chapters │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────┤
│ Fundamentals of Data Engineering │ ba6fa92e-f174-4a77-8fc6-fc4fc12a625d │ 19 │
│ Designing Data-Intensive Apps │ c7d8e9f0-a1b2-4c3d-9e8f-7a6b5c4d3e2f │ 12 │
└──────────────────────────────────────┴──────────────────────────────────────┴──────────┘
List all your NotebookLM notebooks:
pdf-by-chapters listList the sources (chapters) within a specific notebook:
pdf-by-chapters list -n NOTEBOOK_IDThis shows numbered chapters so you know which range to pass to generate.
Generate audio and video overviews for a specific chapter range:
pdf-by-chapters generate -n NOTEBOOK_ID -c 1-3Audio only:
pdf-by-chapters generate -n NOTEBOOK_ID -c 1-3 --no-videoVideo only:
pdf-by-chapters generate -n NOTEBOOK_ID -c 4-6 --no-audioThe chapter range is 1-indexed and inclusive on both ends — -c 1-3 covers chapters 1, 2, and 3.
Download all audio and video artifacts from a notebook:
pdf-by-chapters download -n NOTEBOOK_ID -o ./overviewsFiles are saved as audio_01.mp3, audio_02.mp3, video_01.mp4, etc.
Delete a notebook and all its contents:
pdf-by-chapters delete -n NOTEBOOK_IDYou will be prompted for confirmation before deletion.
Instead of manually choosing chapter ranges, let NotebookLM's AI create a podcast syllabus that groups chapters into logical episodes, then step through generation one episode at a time.
Ask NotebookLM to analyse all chapters and create an episode plan:
pdf-by-chapters syllabus -n NOTEBOOK_ID -o ./chapters --no-videoThis sends a structured prompt to NotebookLM's chat API, parses the response into a numbered syllabus, and saves it as a state file (syllabus_state.json). If parsing fails, it falls back to fixed-size chunks.
Customise the maximum chapters per episode:
pdf-by-chapters syllabus -n NOTEBOOK_ID --max-chapters 3Note: The
syllabuscommand uses NotebookLM's chat API, which may trigger Google's backend to auto-generate artifacts (audio overview, slide deck) as a side effect. These are created by NotebookLM's platform behaviour, not by this tool, and are separate from the artifacts created bygenerate-next.
Generate the next pending episode from the syllabus:
pdf-by-chapters generate-next -o ./chaptersThis reads the state file, picks the next pending episode, fires the generation request, and polls until complete. The notebook ID comes from the state file — no need to pass -n.
Generate all episodes in one command (recommended):
pdf-by-chapters generate-next -n NOTEBOOK_ID -o ./chapters --all --download --no-videoThe --all flag auto-creates the syllabus if missing, then generates every episode sequentially with a 30-second gap between episodes. On failure, it retries with exponential backoff (60s, 180s, 300s) and deletes failed artifacts before each retry. The --download flag downloads each completed audio to <output_dir>/downloads/01-episode_title.mp3.
For non-blocking mode (returns immediately, ideal for scripting or agent workflows):
pdf-by-chapters generate-next -o ./chapters --no-waitTarget a specific episode:
pdf-by-chapters generate-next -o ./chapters --episode 3If interrupted with Ctrl+C, task IDs are already saved to the state file. Resume with status --poll.
Reset and start over:
rm ./chapters/syllabus_state.json
pdf-by-chapters generate-next -n NOTEBOOK_ID -o ./chapters --all --download --no-videoView the syllabus and generation status:
pdf-by-chapters status -o ./chaptersPoll the NotebookLM API to update in-progress artifacts:
pdf-by-chapters status -o ./chapters --pollLive-updating display that polls until all generating chunks complete:
pdf-by-chapters status -o ./chapters --tail# 1. Split and upload a book
pdf-by-chapters process "Fundamentals of Data Engineering.pdf"
# 2. Find the notebook ID
pdf-by-chapters list
# 3. Generate audio/video for chapters 1-3
pdf-by-chapters generate -n NOTEBOOK_ID -c 1-3
# 4. Generate for the next batch
pdf-by-chapters generate -n NOTEBOOK_ID -c 4-6
# 5. Download everything
pdf-by-chapters download -n NOTEBOOK_ID -o ./overviews# 1. Split and upload
pdf-by-chapters process "Fundamentals of Data Engineering.pdf"
export NOTEBOOK_ID=<id from output>
# 2. Generate ALL episodes with auto-download (one command does everything)
pdf-by-chapters generate-next -n $NOTEBOOK_ID -o ./chapters --all --download --no-videoThis single command creates the syllabus, generates each episode sequentially (with retry on failure), and downloads the audio files to ./chapters/downloads/.
For more control, run step-by-step:
# Generate syllabus separately
pdf-by-chapters syllabus -n $NOTEBOOK_ID -o ./chapters --no-video
# Generate episodes one at a time
pdf-by-chapters generate-next -o ./chapters --no-wait
pdf-by-chapters status -o ./chapters --poll # check when ready
pdf-by-chapters generate-next -o ./chapters --no-wait
# ... repeat for each episode| Option | Command | Description | Default |
|---|---|---|---|
source |
split, process | PDF file or directory of PDFs (positional arg) | — |
-o, --output-dir |
split, process, download, syllabus, generate-next, status | Output directory | ./chapters / ./overviews |
-l, --level |
split, process | TOC level to split on (1 = top-level chapters) | 1 |
-n, --notebook-id |
process, list, generate, download, delete, syllabus | NotebookLM notebook ID | — |
-c, --chapters |
generate, download | Chapter range, e.g. 1-3 (1-indexed, inclusive) |
— |
--no-audio |
generate, syllabus, generate-next | Skip audio overview generation | — |
--no-video |
generate, syllabus, generate-next | Skip video overview generation | — |
-t, --timeout |
generate, generate-next | Timeout in seconds for generation polling | 900 (15 min) |
-m, --max-chapters |
syllabus | Maximum chapters per episode | 2 |
-b, --book-name |
syllabus | Book name for state file | output dir name |
--force |
syllabus | Overwrite existing syllabus with in-progress chunks | — |
-e, --episode |
generate-next | Target a specific episode by number | — |
--no-wait |
generate-next | Start generation and return immediately | — |
-a, --all |
generate-next | Generate all episodes sequentially with retry | — |
-d, --download |
generate-next | Download audio after each completed episode | — |
--poll |
status | Check API for status of generating chunks | — |
--tail |
status | Live-updating display until generation completes | — |
PDF files can embed a Table of Contents as a tree of bookmarks. Each bookmark entry has three fields:
[level, title, page_number]
level— depth in the TOC hierarchy (1 = top-level chapter, 2 = sub-chapter, etc.)title— the chapter/section namepage_number— the 1-indexed page where the chapter starts
This tool calls PyMuPDF's doc.get_toc() to retrieve these entries, filters to the requested --level, and uses the page numbers to determine chapter boundaries. Each chapter runs from its start page to the page before the next chapter begins (or end of document for the last chapter).
The split chapter files also get a rebuilt TOC containing only the entries that fall within their page range.
ValueError: 'my_ebook.pdf' has no bookmarks/TOC. Cannot split without chapter markers.
The PDF doesn't contain embedded bookmarks. This is common with scanned PDFs or older ebooks. Options:
- Open the PDF in a reader that shows bookmarks (e.g., Adobe Acrobat, PDF Expert) to verify
- Some PDF editors can add bookmarks manually
- Consider using a different source file — most publisher ebooks include TOC bookmarks
ValueError: No TOC entries at level 2. Available levels: {1, 3}
The TOC doesn't have entries at the level you specified. The error message shows which levels are available — try one of those with -l.
When running process, the tool checks if a notebook with the same book title already exists. If found, it uploads chapters to the existing notebook instead of creating a duplicate. To force a new notebook, use -n with a specific ID or rename the PDF.
If any NotebookLM command fails with auth errors:
# Re-authenticate
notebooklm loginCookie-based auth expires periodically. Re-running notebooklm login refreshes the session.
Generation times out after 900 seconds (15 minutes) by default. Both audio and video share the same timeout. If you hit timeouts:
- Increase the timeout:
generate -c 1-3 --timeout 1800(30 minutes) - Use smaller chapter ranges with
generate -c 1-1(single chapter at a time) - Skip video (which takes longer) with
--no-videoand generate audio first - Check your network connection — uploads and polling require a stable connection
If a chapter has 0 pages, the TOC bookmarks may be inaccurate or the PDF has unusual page numbering. Try:
- Inspect the TOC:
python -c "import pymupdf; print(pymupdf.open('file.pdf').get_toc())" - Try a different
--levelto see if sub-chapters split more cleanly
Special thanks to Teng Lin for creating the excellent notebooklm-py library, which powers all NotebookLM integration in this tool. His work in reverse-engineering and wrapping the NotebookLM API made this project possible.
MIT
🔍 Explore this project — AI-generated overviews via Google NotebookLM
| 🎧 Listen to the Audio Overview | Two AI hosts discuss the project — great for commutes |
| 🎬 Watch the Video Overview | Visual walkthrough of architecture and concepts |
| 🖼️ View the Infographic | Architecture and flow at a glance |
| 📊 Browse the Slide Deck | Presentation-ready project overview |
Generated by notebooklm-repo-artefacts