Skip to content

feat: syllabus-driven chunked audio/video generation#1

Merged
NetDevAutomate merged 1 commit intomainfrom
feat/chunked-audio-video-generation
Mar 10, 2026
Merged

feat: syllabus-driven chunked audio/video generation#1
NetDevAutomate merged 1 commit intomainfrom
feat/chunked-audio-video-generation

Conversation

@NetDevAutomate
Copy link
Copy Markdown
Owner

Summary

  • Add syllabus workflow for generating NotebookLM audio/video overviews of entire eBooks, broken into logical chapter episodes
  • Three new CLI commands: syllabus, generate-next, status with --no-wait, --poll, --tail modes
  • Two new modules: models.py (shared dataclasses), syllabus.py (pure-logic syllabus parsing and state management)
  • Auto-syllabus via NotebookLM chat API with fixed-size fallback, stateful stepping with atomic JSON state persistence, priority-based chunk selection (GENERATING > FAILED > PENDING)

Test plan

  • 118 unit tests passing (syllabus parsing, state round-trip, CLI commands, NotebookLM integration)
  • Ruff lint clean
  • Pre-commit hooks pass
  • Dead code removed (generate_chunk() and ChunkResult)
  • generate-next blocking/non-blocking paths de-duplicated

🤖 Generated with Claude Code

Add a syllabus workflow that automates generating NotebookLM audio/video
overviews for an entire eBook, broken into logical chapter episodes.

New modules:
- models.py: shared dataclasses (extracted from notebooklm.py)
- syllabus.py: pure-logic syllabus parsing, state management, chunk selection

New CLI commands (rich_help_panel="Syllabus"):
- syllabus: send structured prompt to NotebookLM chat API, parse response
  into numbered episode plan, save as syllabus_state.json
- generate-next: fire generation for next pending episode, poll to
  completion, persist task IDs for Ctrl+C recovery, --no-wait for
  fire-and-forget mode
- status: display progress table, --poll to check API, --tail for
  live-updating display

Key design decisions:
- Auto-syllabus via NotebookLM chat with fixed-size fallback on parse failure
- Stateful next-chunk stepping (accommodates rate limits and session breaks)
- Atomic state writes (mkstemp + fsync + os.replace) for crash safety
- Priority-based chunk selection: GENERATING > FAILED > PENDING
- Episode titles sanitized before filesystem/API use (LLM output is
  adversarial input)

Includes brainstorm and plan documents, comprehensive tests (118 passing),
and updated README with full workflow documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 10, 2026 19:58
@NetDevAutomate NetDevAutomate merged commit de6f888 into main Mar 10, 2026
4 checks passed
@NetDevAutomate NetDevAutomate deleted the feat/chunked-audio-video-generation branch March 10, 2026 19:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a syllabus-driven workflow for generating NotebookLM audio/video in chapter-based “episodes”, with JSON state persistence and new CLI commands to create a plan, generate the next chunk, and monitor progress.

Changes:

  • Added syllabus.py for prompt building, LLM response parsing, chunk selection, and atomic state read/write.
  • Added new CLI commands: syllabus, generate-next (with --no-wait), and status (--poll, --tail).
  • Refactored shared dataclasses into models.py and extended NotebookLM integration with syllabus/poll helpers; expanded unit tests and docs.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/unit/test_syllabus.py New unit tests for syllabus parsing, chunk building, state I/O, and selection logic
tests/unit/test_notebooklm.py Adds coverage for create_syllabus() behavior
tests/unit/test_cli.py Adds coverage for new CLI commands and state flows
tests/conftest.py Extends NotebookLM client mock with chat.ask and artifacts.rename
src/pdf_by_chapters/syllabus.py New pure-logic module for syllabus parsing and state management
src/pdf_by_chapters/notebooklm.py Moves dataclasses to models.py and adds syllabus + chunk polling helpers
src/pdf_by_chapters/models.py New shared dataclasses module (UploadResult/NotebookInfo/SourceInfo)
src/pdf_by_chapters/cli.py Adds syllabus workflow commands and stateful generation/polling behavior
docs/use-cases.md Documents syllabus-driven workflow and resume behavior
docs/troubleshooting.md Adds troubleshooting for syllabus parsing, duplicate content, stuck state
docs/plans/2026-03-10-feat-chunked-audio-video-generation-plan.md Adds implementation plan document for the feature
docs/guide-study-workflow.md Adds “Syllabus Mode” usage guidance
docs/guide-generate-overviews.md Adds end-to-end syllabus workflow steps
docs/codemap.md Updates architecture/codemap with new modules and workflow
docs/brainstorms/2026-03-10-chunked-audio-video-generation-brainstorm.md Adds brainstorming notes for the feature
README.md Adds README section documenting the syllabus workflow and new CLI flags

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +470 to +480
tasks = asyncio.run(_start())
if not tasks:
console.print("[red]Failed to start any generation requests.[/red]")
chunk.status = ChunkStatus.FAILED
write_state(state, state_path)
raise typer.Exit(1)

for label, task_id in tasks.items():
chunk.artifacts[label] = ChunkArtifact(task_id=task_id, status="in_progress")
chunk.status = ChunkStatus.GENERATING
write_state(state, state_path)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If start_chunk_generation() fails to start one requested artifact (e.g. video) but starts the other, tasks will be non-empty and the chunk will proceed with only a subset of artifacts. Later, all_done = all(...) can mark the chunk COMPLETED even though a requested artifact never started. Consider validating that tasks contains every label implied by gen_audio/gen_video (and mark the chunk failed if any requested artifact didn’t get a task_id), or have start_chunk_generation() surface partial-start failures explicitly.

Copilot uses AI. Check for mistakes.
Comment on lines +608 to +625
# Update chunk-level status
all_done = all(a.status == "completed" for a in chunk.artifacts.values())
any_failed = any(a.status == "failed" for a in chunk.artifacts.values())
if all_done:
chunk.status = ChunkStatus.COMPLETED
# Best-effort rename
safe_title = sanitize_filename(chunk.title)[:100]
if safe_title:
for _label, art in chunk.artifacts.items():
if art.task_id and art.status == "completed":
with contextlib.suppress(Exception):
await client.artifacts.rename(
state.notebook_id,
art.task_id,
safe_title,
)
elif any_failed:
chunk.status = ChunkStatus.FAILED
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When polling, this code sets chunk.status = FAILED as soon as any artifact is failed. That can prematurely remove the chunk from the GENERATING set on subsequent status --poll runs (and cause --tail to exit) even if another artifact is still in_progress and will later complete. Consider keeping the chunk in GENERATING until all artifact statuses are terminal, and only set FAILED once everything is either completed or failed (or introduce a derived/partial state for display).

Copilot uses AI. Check for mistakes.
Comment on lines +597 to +607
tasks = {
label: art.task_id
for label, art in chunk.artifacts.items()
if art.task_id and art.status != "completed"
}
if not tasks:
continue
statuses = await poll_chunk_status(client, state.notebook_id, tasks)
for label, new_status in statuses.items():
chunk.artifacts[label].status = new_status

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polling task filter includes artifacts with status failed (art.status != "completed"), so failed artifacts will be re-polled every time. Also, combined with the chunk-level status update, this can cause inconsistent progress tracking. Consider filtering to only non-terminal statuses (e.g. not in {"completed","failed"}) and treating "unknown"/poll errors explicitly (either keep generating but show an error, or mark failed after N errors).

Copilot uses AI. Check for mistakes.
Comment on lines +197 to +207
def test_no_sources_error(self, patch_notebooklm, tmp_path):
client, _ = patch_notebooklm
client.sources.list.return_value = []
result = runner.invoke(app, ["syllabus", "-n", "nb-123", "-o", str(tmp_path)])
assert result.exit_code != 0
assert "No sources" in result.stdout

def test_creates_state_file(self, patch_notebooklm, tmp_path):
_client, _ = patch_notebooklm
result = runner.invoke(app, ["syllabus", "-n", "nb-123", "-o", str(tmp_path)])
assert result.exit_code == 0
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These CLI tests rely on patch_notebooklm, which patches pdf_by_chapters.notebooklm.NotebookLMClient.from_storage. However, the syllabus command imports NotebookLMClient from the external notebooklm package and calls NotebookLMClient.from_storage() directly, so this fixture won’t intercept those calls. Consider patching notebooklm.NotebookLMClient.from_storage for these tests (or refactoring the CLI to avoid direct client usage) so unit tests don’t require real credentials/network.

Copilot uses AI. Check for mistakes.
Comment on lines +331 to +335
source_tuples = [(s.id, s.title) for s in sources_list]
source_map, title_map = map_sources_to_chapters(source_tuples)

# Build and send prompt
prompt = build_prompt(source_tuples, max_chapters)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source_tuples is built in the raw order returned by list_sources(), but parse_syllabus_response() interprets chapter numbers against source_map derived from filenames (e.g. chapter_10 -> 10). If the API returns sources out of chapter order (or uploads were out of order), the prompt numbering and source_map will disagree, causing episodes to map to the wrong source_ids. Consider sorting sources by parsed chapter number (when parseable) before building the prompt and maps, or change the prompt to explicitly label each line with the actual chapter number (e.g. "Chapter 10: …") so the response numbers match source_map keys.

Suggested change
source_tuples = [(s.id, s.title) for s in sources_list]
source_map, title_map = map_sources_to_chapters(source_tuples)
# Build and send prompt
prompt = build_prompt(source_tuples, max_chapters)
# Build initial (id, title) tuples from sources
source_tuples = [(s.id, s.title) for s in sources_list]
source_map, title_map = map_sources_to_chapters(source_tuples)
# Order sources by chapter number so prompt numbering matches source_map
ordered_source_tuples: list[tuple[str, str]] = []
for chapter_num in sorted(source_map.keys()):
source_id = source_map[chapter_num]
title = title_map[source_id]
ordered_source_tuples.append((source_id, title))
# Build and send prompt
prompt = build_prompt(ordered_source_tuples, max_chapters)

Copilot uses AI. Check for mistakes.
Comment on lines +320 to +332
instructions = _build_instructions(episode_title, chapter_titles)
tasks: dict[str, str] = {}
for label, should_gen in [("audio", generate_audio), ("video", generate_video)]:
if not should_gen:
continue
try:
logger.info("Requesting %s for '%s'...", label, episode_title)
tasks[label] = await _request_chapter_artifact(
client, notebook_id, label, source_ids, instructions[label]
)
except Exception as e:
logger.error("Failed to request %s: %s", label, e)
return tasks
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start_chunk_generation() logs and suppresses exceptions per artifact type, returning a partial tasks dict. Callers can’t distinguish “video was requested but failed to start” from “video was intentionally disabled”, which can lead to chunks being marked completed with missing artifacts. Consider either (a) raising if any requested artifact fails to start, (b) returning an explicit per-label result that includes failures, or (c) accepting an expected_labels set and ensuring the return includes all expected labels (with failures surfaced).

Copilot uses AI. Check for mistakes.
Comment on lines +257 to +286
@patch("pdf_by_chapters.notebooklm.asyncio.sleep")
def test_all_completed(self, _mock_sleep, patch_notebooklm, tmp_path):
_make_state(
tmp_path,
chunks={
1: SyllabusChunk(
episode=1,
title="Done",
chapters=[1],
source_ids=["s1"],
status=ChunkStatus.COMPLETED,
),
},
)
result = runner.invoke(app, ["generate-next", "-o", str(tmp_path)])
assert result.exit_code == 0
assert "completed" in result.stdout.lower()

@patch("pdf_by_chapters.notebooklm.asyncio.sleep")
def test_generates_pending_chunk(self, _mock_sleep, patch_notebooklm, tmp_path):
_make_state(tmp_path)
result = runner.invoke(app, ["generate-next", "-o", str(tmp_path)])
assert result.exit_code == 0
# Verify state file updated
data = json.loads((tmp_path / "syllabus_state.json").read_text())
assert data["chunks"][0]["status"] == "completed"

@patch("pdf_by_chapters.notebooklm.asyncio.sleep")
def test_episode_targeting(self, _mock_sleep, patch_notebooklm, tmp_path):
_make_state(
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests patch pdf_by_chapters.notebooklm.asyncio.sleep, but the generate-next CLI polling loop sleeps via pdf_by_chapters.cli.asyncio.sleep. As written, tests that execute the blocking path can incur real 30s sleeps. Patch pdf_by_chapters.cli.asyncio.sleep (or run generate-next with --no-wait in unit tests) to keep the test suite fast and deterministic.

Copilot uses AI. Check for mistakes.
Comment on lines +126 to +163
### models.py — Shared Dataclasses

Holds dataclasses shared between modules, preventing circular imports and keeping `syllabus.py` testable without `notebooklm-py` installed.

- `UploadResult`, `NotebookInfo`, `SourceInfo` — API result types
- `ChunkResult` — per-artifact-type generation result

### syllabus.py — Syllabus State & Parsing

Pure logic module (no Rich, no Typer, no async). Follows the `splitter.py` pattern.

- `ChunkStatus(StrEnum)` — 4-state machine: `PENDING → GENERATING → COMPLETED | FAILED`
- `SyllabusState`, `SyllabusChunk`, `ChunkArtifact` — state file dataclasses
- `build_prompt()` — constructs the syllabus generation prompt with numbered source titles
- `parse_syllabus_response()` — regex parsing of LLM response into episodes (binary success/fallback)
- `build_fixed_size_chunks()` — deterministic fallback when LLM parsing fails
- `map_sources_to_chapters()` — maps chapter numbers to source IDs via title regex
- `read_state()` / `write_state()` — atomic JSON state persistence with `fsync` + `os.replace()`
- `get_next_chunk()` — priority selection: GENERATING (resume) > FAILED (retry) > PENDING (new)

```mermaid
stateDiagram-v2
[*] --> PENDING
PENDING --> GENERATING: generate-next
GENERATING --> COMPLETED: poll detects completion
GENERATING --> FAILED: poll detects failure / timeout
FAILED --> GENERATING: generate-next (retry)
COMPLETED --> PENDING: --episode N (regenerate)
```

## Interfaces

| Module | Exports | Used By |
|--------|---------|---------|
| `splitter` | `split_pdf_by_chapters()`, `sanitize_filename()` | `cli.split`, `cli.process` |
| `notebooklm` | `upload_chapters()`, `generate_for_chapters()`, `download_artifacts()`, `list_notebooks()`, `list_sources()`, `delete_notebook()` | `cli.*` |
| `models` | `UploadResult`, `NotebookInfo`, `SourceInfo`, `ChunkResult` | `notebooklm`, `cli` |
| `splitter` | `split_pdf_by_chapters()`, `sanitize_filename()` | `cli.split`, `cli.process`, `notebooklm` |
| `notebooklm` | `upload_chapters()`, `generate_for_chapters()`, `download_artifacts()`, `list_notebooks()`, `list_sources()`, `delete_notebook()`, `create_syllabus()`, `start_chunk_generation()`, `poll_chunk_status()`, `generate_chunk()` | `cli.*` |
| `syllabus` | `SyllabusState`, `SyllabusChunk`, `ChunkStatus`, `build_prompt()`, `parse_syllabus_response()`, `read_state()`, `write_state()`, `get_next_chunk()` | `cli.syllabus`, `cli.generate_next`, `cli.status` |
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/codemap.md references ChunkResult and generate_chunk(), and shows syllabus.py depending on models.py. In the current code, models.py does not define ChunkResult, notebooklm.py does not export generate_chunk(), and syllabus.py doesn’t import models.py. Please update the codemap/interface tables and diagrams to match the actual module exports/dependencies so the documentation stays accurate.

Copilot uses AI. Check for mistakes.
Comment on lines +721 to +723
if all_done:
chunk.status = ChunkStatus.COMPLETED
elif any_failed:
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--tail has the same chunk-level status issue as --poll: it sets chunk.status = FAILED as soon as any artifact fails, even if other artifacts are still in progress. That can make the live loop exit early and stop updating the remaining artifact statuses. Consider keeping the chunk in GENERATING until all artifacts are terminal, and only then marking FAILED/COMPLETED.

Suggested change
if all_done:
chunk.status = ChunkStatus.COMPLETED
elif any_failed:
has_non_terminal = any(
a.status not in ("completed", "failed")
for a in chunk.artifacts.values()
)
if all_done:
chunk.status = ChunkStatus.COMPLETED
elif any_failed and not has_non_terminal:

Copilot uses AI. Check for mistakes.
Comment on lines +290 to +293
from notebooklm import NotebookLMClient

from pdf_by_chapters.notebooklm import create_syllabus as _create_syllabus
from pdf_by_chapters.notebooklm import list_sources
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new commands import and use notebooklm.NotebookLMClient directly. Existing tests patch pdf_by_chapters.notebooklm.NotebookLMClient.from_storage, so the new CLI paths won’t be mocked and may hit real auth/network during unit tests. To keep testability consistent, consider moving the client context management back into pdf_by_chapters.notebooklm (provide sync wrappers or async functions that open the client internally) or update the tests/fixtures to patch notebooklm.NotebookLMClient.from_storage instead.

Suggested change
from notebooklm import NotebookLMClient
from pdf_by_chapters.notebooklm import create_syllabus as _create_syllabus
from pdf_by_chapters.notebooklm import list_sources
from pdf_by_chapters.notebooklm import (
NotebookLMClient,
create_syllabus as _create_syllabus,
list_sources,
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants