feat: syllabus-driven chunked audio/video generation by NetDevAutomate · Pull Request #1 · NetDevAutomate/notebooklm-pdf-by-chapters

NetDevAutomate · 2026-03-10T19:58:09Z

Summary

Add syllabus workflow for generating NotebookLM audio/video overviews of entire eBooks, broken into logical chapter episodes
Three new CLI commands: syllabus, generate-next, status with --no-wait, --poll, --tail modes
Two new modules: models.py (shared dataclasses), syllabus.py (pure-logic syllabus parsing and state management)
Auto-syllabus via NotebookLM chat API with fixed-size fallback, stateful stepping with atomic JSON state persistence, priority-based chunk selection (GENERATING > FAILED > PENDING)

Test plan

118 unit tests passing (syllabus parsing, state round-trip, CLI commands, NotebookLM integration)
Ruff lint clean
Pre-commit hooks pass
Dead code removed (generate_chunk() and ChunkResult)
generate-next blocking/non-blocking paths de-duplicated

🤖 Generated with Claude Code

Add a syllabus workflow that automates generating NotebookLM audio/video overviews for an entire eBook, broken into logical chapter episodes. New modules: - models.py: shared dataclasses (extracted from notebooklm.py) - syllabus.py: pure-logic syllabus parsing, state management, chunk selection New CLI commands (rich_help_panel="Syllabus"): - syllabus: send structured prompt to NotebookLM chat API, parse response into numbered episode plan, save as syllabus_state.json - generate-next: fire generation for next pending episode, poll to completion, persist task IDs for Ctrl+C recovery, --no-wait for fire-and-forget mode - status: display progress table, --poll to check API, --tail for live-updating display Key design decisions: - Auto-syllabus via NotebookLM chat with fixed-size fallback on parse failure - Stateful next-chunk stepping (accommodates rate limits and session breaks) - Atomic state writes (mkstemp + fsync + os.replace) for crash safety - Priority-based chunk selection: GENERATING > FAILED > PENDING - Episode titles sanitized before filesystem/API use (LLM output is adversarial input) Includes brainstorm and plan documents, comprehensive tests (118 passing), and updated README with full workflow documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Introduces a syllabus-driven workflow for generating NotebookLM audio/video in chapter-based “episodes”, with JSON state persistence and new CLI commands to create a plan, generate the next chunk, and monitor progress.

Changes:

Added syllabus.py for prompt building, LLM response parsing, chunk selection, and atomic state read/write.
Added new CLI commands: syllabus, generate-next (with --no-wait), and status (--poll, --tail).
Refactored shared dataclasses into models.py and extended NotebookLM integration with syllabus/poll helpers; expanded unit tests and docs.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
tests/unit/test_syllabus.py	New unit tests for syllabus parsing, chunk building, state I/O, and selection logic
tests/unit/test_notebooklm.py	Adds coverage for `create_syllabus()` behavior
tests/unit/test_cli.py	Adds coverage for new CLI commands and state flows
tests/conftest.py	Extends NotebookLM client mock with `chat.ask` and `artifacts.rename`
src/pdf_by_chapters/syllabus.py	New pure-logic module for syllabus parsing and state management
src/pdf_by_chapters/notebooklm.py	Moves dataclasses to `models.py` and adds syllabus + chunk polling helpers
src/pdf_by_chapters/models.py	New shared dataclasses module (UploadResult/NotebookInfo/SourceInfo)
src/pdf_by_chapters/cli.py	Adds syllabus workflow commands and stateful generation/polling behavior
docs/use-cases.md	Documents syllabus-driven workflow and resume behavior
docs/troubleshooting.md	Adds troubleshooting for syllabus parsing, duplicate content, stuck state
docs/plans/2026-03-10-feat-chunked-audio-video-generation-plan.md	Adds implementation plan document for the feature
docs/guide-study-workflow.md	Adds “Syllabus Mode” usage guidance
docs/guide-generate-overviews.md	Adds end-to-end syllabus workflow steps
docs/codemap.md	Updates architecture/codemap with new modules and workflow
docs/brainstorms/2026-03-10-chunked-audio-video-generation-brainstorm.md	Adds brainstorming notes for the feature
README.md	Adds README section documenting the syllabus workflow and new CLI flags

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T20:07:20Z

src/pdf_by_chapters/cli.py

+    tasks = asyncio.run(_start())
+    if not tasks:
+        console.print("[red]Failed to start any generation requests.[/red]")
+        chunk.status = ChunkStatus.FAILED
+        write_state(state, state_path)
+        raise typer.Exit(1)
+
+    for label, task_id in tasks.items():
+        chunk.artifacts[label] = ChunkArtifact(task_id=task_id, status="in_progress")
+    chunk.status = ChunkStatus.GENERATING
+    write_state(state, state_path)


If start_chunk_generation() fails to start one requested artifact (e.g. video) but starts the other, tasks will be non-empty and the chunk will proceed with only a subset of artifacts. Later, all_done = all(...) can mark the chunk COMPLETED even though a requested artifact never started. Consider validating that tasks contains every label implied by gen_audio/gen_video (and mark the chunk failed if any requested artifact didn’t get a task_id), or have start_chunk_generation() surface partial-start failures explicitly.

Copilot · 2026-03-10T20:07:21Z

src/pdf_by_chapters/cli.py

+                        # Update chunk-level status
+                        all_done = all(a.status == "completed" for a in chunk.artifacts.values())
+                        any_failed = any(a.status == "failed" for a in chunk.artifacts.values())
+                        if all_done:
+                            chunk.status = ChunkStatus.COMPLETED
+                            # Best-effort rename
+                            safe_title = sanitize_filename(chunk.title)[:100]
+                            if safe_title:
+                                for _label, art in chunk.artifacts.items():
+                                    if art.task_id and art.status == "completed":
+                                        with contextlib.suppress(Exception):
+                                            await client.artifacts.rename(
+                                                state.notebook_id,
+                                                art.task_id,
+                                                safe_title,
+                                            )
+                        elif any_failed:
+                            chunk.status = ChunkStatus.FAILED


When polling, this code sets chunk.status = FAILED as soon as any artifact is failed. That can prematurely remove the chunk from the GENERATING set on subsequent status --poll runs (and cause --tail to exit) even if another artifact is still in_progress and will later complete. Consider keeping the chunk in GENERATING until all artifact statuses are terminal, and only set FAILED once everything is either completed or failed (or introduce a derived/partial state for display).

Copilot · 2026-03-10T20:07:21Z

src/pdf_by_chapters/cli.py

+                        tasks = {
+                            label: art.task_id
+                            for label, art in chunk.artifacts.items()
+                            if art.task_id and art.status != "completed"
+                        }
+                        if not tasks:
+                            continue
+                        statuses = await poll_chunk_status(client, state.notebook_id, tasks)
+                        for label, new_status in statuses.items():
+                            chunk.artifacts[label].status = new_status
+


The polling task filter includes artifacts with status failed (art.status != "completed"), so failed artifacts will be re-polled every time. Also, combined with the chunk-level status update, this can cause inconsistent progress tracking. Consider filtering to only non-terminal statuses (e.g. not in {"completed","failed"}) and treating "unknown"/poll errors explicitly (either keep generating but show an error, or mark failed after N errors).

Copilot · 2026-03-10T20:07:21Z

tests/unit/test_cli.py

+    def test_no_sources_error(self, patch_notebooklm, tmp_path):
+        client, _ = patch_notebooklm
+        client.sources.list.return_value = []
+        result = runner.invoke(app, ["syllabus", "-n", "nb-123", "-o", str(tmp_path)])
+        assert result.exit_code != 0
+        assert "No sources" in result.stdout
+
+    def test_creates_state_file(self, patch_notebooklm, tmp_path):
+        _client, _ = patch_notebooklm
+        result = runner.invoke(app, ["syllabus", "-n", "nb-123", "-o", str(tmp_path)])
+        assert result.exit_code == 0


These CLI tests rely on patch_notebooklm, which patches pdf_by_chapters.notebooklm.NotebookLMClient.from_storage. However, the syllabus command imports NotebookLMClient from the external notebooklm package and calls NotebookLMClient.from_storage() directly, so this fixture won’t intercept those calls. Consider patching notebooklm.NotebookLMClient.from_storage for these tests (or refactoring the CLI to avoid direct client usage) so unit tests don’t require real credentials/network.

Copilot · 2026-03-10T20:07:21Z

src/pdf_by_chapters/cli.py

+    source_tuples = [(s.id, s.title) for s in sources_list]
+    source_map, title_map = map_sources_to_chapters(source_tuples)
+
+    # Build and send prompt
+    prompt = build_prompt(source_tuples, max_chapters)


source_tuples is built in the raw order returned by list_sources(), but parse_syllabus_response() interprets chapter numbers against source_map derived from filenames (e.g. chapter_10 -> 10). If the API returns sources out of chapter order (or uploads were out of order), the prompt numbering and source_map will disagree, causing episodes to map to the wrong source_ids. Consider sorting sources by parsed chapter number (when parseable) before building the prompt and maps, or change the prompt to explicitly label each line with the actual chapter number (e.g. "Chapter 10: …") so the response numbers match source_map keys.

Suggested change

source_tuples = [(s.id, s.title) for s in sources_list]

source_map, title_map = map_sources_to_chapters(source_tuples)

# Build and send prompt

prompt = build_prompt(source_tuples, max_chapters)

# Build initial (id, title) tuples from sources

source_tuples = [(s.id, s.title) for s in sources_list]

source_map, title_map = map_sources_to_chapters(source_tuples)

# Order sources by chapter number so prompt numbering matches source_map

ordered_source_tuples: list[tuple[str, str]] = []

for chapter_num in sorted(source_map.keys()):

source_id = source_map[chapter_num]

title = title_map[source_id]

ordered_source_tuples.append((source_id, title))

# Build and send prompt

prompt = build_prompt(ordered_source_tuples, max_chapters)

Copilot · 2026-03-10T20:07:22Z

src/pdf_by_chapters/notebooklm.py

+    instructions = _build_instructions(episode_title, chapter_titles)
+    tasks: dict[str, str] = {}
+    for label, should_gen in [("audio", generate_audio), ("video", generate_video)]:
+        if not should_gen:
+            continue
+        try:
+            logger.info("Requesting %s for '%s'...", label, episode_title)
+            tasks[label] = await _request_chapter_artifact(
+                client, notebook_id, label, source_ids, instructions[label]
+            )
+        except Exception as e:
+            logger.error("Failed to request %s: %s", label, e)
+    return tasks


start_chunk_generation() logs and suppresses exceptions per artifact type, returning a partial tasks dict. Callers can’t distinguish “video was requested but failed to start” from “video was intentionally disabled”, which can lead to chunks being marked completed with missing artifacts. Consider either (a) raising if any requested artifact fails to start, (b) returning an explicit per-label result that includes failures, or (c) accepting an expected_labels set and ensuring the return includes all expected labels (with failures surfaced).

Copilot · 2026-03-10T20:07:22Z

tests/unit/test_cli.py

+    @patch("pdf_by_chapters.notebooklm.asyncio.sleep")
+    def test_all_completed(self, _mock_sleep, patch_notebooklm, tmp_path):
+        _make_state(
+            tmp_path,
+            chunks={
+                1: SyllabusChunk(
+                    episode=1,
+                    title="Done",
+                    chapters=[1],
+                    source_ids=["s1"],
+                    status=ChunkStatus.COMPLETED,
+                ),
+            },
+        )
+        result = runner.invoke(app, ["generate-next", "-o", str(tmp_path)])
+        assert result.exit_code == 0
+        assert "completed" in result.stdout.lower()
+
+    @patch("pdf_by_chapters.notebooklm.asyncio.sleep")
+    def test_generates_pending_chunk(self, _mock_sleep, patch_notebooklm, tmp_path):
+        _make_state(tmp_path)
+        result = runner.invoke(app, ["generate-next", "-o", str(tmp_path)])
+        assert result.exit_code == 0
+        # Verify state file updated
+        data = json.loads((tmp_path / "syllabus_state.json").read_text())
+        assert data["chunks"][0]["status"] == "completed"
+
+    @patch("pdf_by_chapters.notebooklm.asyncio.sleep")
+    def test_episode_targeting(self, _mock_sleep, patch_notebooklm, tmp_path):
+        _make_state(


These tests patch pdf_by_chapters.notebooklm.asyncio.sleep, but the generate-next CLI polling loop sleeps via pdf_by_chapters.cli.asyncio.sleep. As written, tests that execute the blocking path can incur real 30s sleeps. Patch pdf_by_chapters.cli.asyncio.sleep (or run generate-next with --no-wait in unit tests) to keep the test suite fast and deterministic.

Copilot · 2026-03-10T20:07:22Z

docs/codemap.md

+### models.py — Shared Dataclasses
+
+Holds dataclasses shared between modules, preventing circular imports and keeping `syllabus.py` testable without `notebooklm-py` installed.
+
+- `UploadResult`, `NotebookInfo`, `SourceInfo` — API result types
+- `ChunkResult` — per-artifact-type generation result
+
+### syllabus.py — Syllabus State & Parsing
+
+Pure logic module (no Rich, no Typer, no async). Follows the `splitter.py` pattern.
+
+- `ChunkStatus(StrEnum)` — 4-state machine: `PENDING → GENERATING → COMPLETED | FAILED`
+- `SyllabusState`, `SyllabusChunk`, `ChunkArtifact` — state file dataclasses
+- `build_prompt()` — constructs the syllabus generation prompt with numbered source titles
+- `parse_syllabus_response()` — regex parsing of LLM response into episodes (binary success/fallback)
+- `build_fixed_size_chunks()` — deterministic fallback when LLM parsing fails
+- `map_sources_to_chapters()` — maps chapter numbers to source IDs via title regex
+- `read_state()` / `write_state()` — atomic JSON state persistence with `fsync` + `os.replace()`
+- `get_next_chunk()` — priority selection: GENERATING (resume) > FAILED (retry) > PENDING (new)
+
+```mermaid
+stateDiagram-v2
+    [*] --> PENDING
+    PENDING --> GENERATING: generate-next
+    GENERATING --> COMPLETED: poll detects completion
+    GENERATING --> FAILED: poll detects failure / timeout
+    FAILED --> GENERATING: generate-next (retry)
+    COMPLETED --> PENDING: --episode N (regenerate)
+```
+
 ## Interfaces

 | Module | Exports | Used By |
 |--------|---------|---------|
-| `splitter` | `split_pdf_by_chapters()`, `sanitize_filename()` | `cli.split`, `cli.process` |
-| `notebooklm` | `upload_chapters()`, `generate_for_chapters()`, `download_artifacts()`, `list_notebooks()`, `list_sources()`, `delete_notebook()` | `cli.*` |
+| `models` | `UploadResult`, `NotebookInfo`, `SourceInfo`, `ChunkResult` | `notebooklm`, `cli` |
+| `splitter` | `split_pdf_by_chapters()`, `sanitize_filename()` | `cli.split`, `cli.process`, `notebooklm` |
+| `notebooklm` | `upload_chapters()`, `generate_for_chapters()`, `download_artifacts()`, `list_notebooks()`, `list_sources()`, `delete_notebook()`, `create_syllabus()`, `start_chunk_generation()`, `poll_chunk_status()`, `generate_chunk()` | `cli.*` |
+| `syllabus` | `SyllabusState`, `SyllabusChunk`, `ChunkStatus`, `build_prompt()`, `parse_syllabus_response()`, `read_state()`, `write_state()`, `get_next_chunk()` | `cli.syllabus`, `cli.generate_next`, `cli.status` |


docs/codemap.md references ChunkResult and generate_chunk(), and shows syllabus.py depending on models.py. In the current code, models.py does not define ChunkResult, notebooklm.py does not export generate_chunk(), and syllabus.py doesn’t import models.py. Please update the codemap/interface tables and diagrams to match the actual module exports/dependencies so the documentation stays accurate.

Copilot · 2026-03-10T20:07:23Z

src/pdf_by_chapters/cli.py

+                                if all_done:
+                                    chunk.status = ChunkStatus.COMPLETED
+                                elif any_failed:


--tail has the same chunk-level status issue as --poll: it sets chunk.status = FAILED as soon as any artifact fails, even if other artifacts are still in progress. That can make the live loop exit early and stop updating the remaining artifact statuses. Consider keeping the chunk in GENERATING until all artifacts are terminal, and only then marking FAILED/COMPLETED.

Suggested change

if all_done:

chunk.status = ChunkStatus.COMPLETED

elif any_failed:

has_non_terminal = any(

a.status not in ("completed", "failed")

for a in chunk.artifacts.values()

)

if all_done:

chunk.status = ChunkStatus.COMPLETED

elif any_failed and not has_non_terminal:

Copilot · 2026-03-10T20:07:23Z

src/pdf_by_chapters/cli.py

+    from notebooklm import NotebookLMClient
+
+    from pdf_by_chapters.notebooklm import create_syllabus as _create_syllabus
+    from pdf_by_chapters.notebooklm import list_sources


These new commands import and use notebooklm.NotebookLMClient directly. Existing tests patch pdf_by_chapters.notebooklm.NotebookLMClient.from_storage, so the new CLI paths won’t be mocked and may hit real auth/network during unit tests. To keep testability consistent, consider moving the client context management back into pdf_by_chapters.notebooklm (provide sync wrappers or async functions that open the client internally) or update the tests/fixtures to patch notebooklm.NotebookLMClient.from_storage instead.

Suggested change

from notebooklm import NotebookLMClient

from pdf_by_chapters.notebooklm import create_syllabus as _create_syllabus

from pdf_by_chapters.notebooklm import list_sources

from pdf_by_chapters.notebooklm import (

NotebookLMClient,

create_syllabus as _create_syllabus,

list_sources,

)

Copilot AI review requested due to automatic review settings March 10, 2026 19:58

NetDevAutomate merged commit de6f888 into main Mar 10, 2026
4 checks passed

NetDevAutomate deleted the feat/chunked-audio-video-generation branch March 10, 2026 19:58

Copilot started reviewing on behalf of NetDevAutomate March 10, 2026 19:59 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: syllabus-driven chunked audio/video generation#1

feat: syllabus-driven chunked audio/video generation#1
NetDevAutomate merged 1 commit intomainfrom
feat/chunked-audio-video-generation

NetDevAutomate commented Mar 10, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    source_tuples = [(s.id, s.title) for s in sources_list]
-    source_map, title_map = map_sources_to_chapters(source_tuples)
-    # Build and send prompt
-    prompt = build_prompt(source_tuples, max_chapters)
+    # Build initial (id, title) tuples from sources
+    source_tuples = [(s.id, s.title) for s in sources_list]
+    source_map, title_map = map_sources_to_chapters(source_tuples)
+    # Order sources by chapter number so prompt numbering matches source_map
+    ordered_source_tuples: list[tuple[str, str]] = []
+    for chapter_num in sorted(source_map.keys()):
+        source_id = source_map[chapter_num]
+        title = title_map[source_id]
+        ordered_source_tuples.append((source_id, title))
+    # Build and send prompt
+    prompt = build_prompt(ordered_source_tuples, max_chapters)

-    from notebooklm import NotebookLMClient
-    from pdf_by_chapters.notebooklm import create_syllabus as _create_syllabus
-    from pdf_by_chapters.notebooklm import list_sources
+    from pdf_by_chapters.notebooklm import (
+        NotebookLMClient,
+        create_syllabus as _create_syllabus,
+        list_sources,
+    )

Conversation

NetDevAutomate commented Mar 10, 2026

Summary

Test plan

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants