Skip to content

docs: add FileSystemSeedReader authoring guide and Markdown recipe#425

Open
eric-tramel wants to merge 5 commits intomainfrom
docs/filesystem-seed-reader-markdown-recipe
Open

docs: add FileSystemSeedReader authoring guide and Markdown recipe#425
eric-tramel wants to merge 5 commits intomainfrom
docs/filesystem-seed-reader-markdown-recipe

Conversation

@eric-tramel
Copy link
Contributor

Summary

This PR documents how to author FileSystemSeedReader plugins on top of the 1:N hydration contract introduced in #424.

  • add a dedicated FileSystemSeedReader plugin authoring guide
  • add a new Plugin Development recipe with a runnable Markdown section seed reader scaffold
  • add a focused tests_e2e smoke test for manifest-based selection with fanout hydration
  • wire the new docs into plugin and recipe navigation

Why

#424 makes hydrate_row() capable of returning either one record or many records per manifest row. This follow-up PR explains that contract for plugin authors and gives them a concrete example that splits Markdown files into section rows.

Blocked By

This PR is intentionally stacked on top of feat/filesystem-seed-reader-fanout so the review only contains the docs/example changes. After #424 merges, retarget this PR to main.

Testing

  • uv run ruff check docs/assets/recipes/plugin_development/markdown_seed_reader tests_e2e/src/data_designer_e2e_tests/plugins/markdown_seed_reader tests_e2e/tests/test_e2e.py
  • uv run pytest tests/test_e2e.py -k markdown_section_seed_reader_plugin_fanout_respects_manifest_selection (from tests_e2e/)
  • UV_CACHE_DIR=/tmp/uv-cache uv run --group docs mkdocs build --strict (currently still fails on pre-existing repo-wide docs warnings unrelated to this branch)

@eric-tramel eric-tramel requested a review from a team as a code owner March 17, 2026 01:50
@eric-tramel eric-tramel self-assigned this Mar 17, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 17, 2026

Greptile Summary

This PR adds documentation for authoring FileSystemSeedReader plugins on top of the 1:N hydration contract introduced in #424. It introduces a dedicated guide (docs/plugins/filesystem_seed_reader.md), a runnable single-file recipe (docs/assets/recipes/plugin_development/markdown_seed_reader.py) that splits Markdown files into per-section seed rows, and wires both into the MkDocs navigation.

Key changes:

  • New FileSystemSeedReader authoring guide covering build_manifest, hydrate_row, manifest-based selection semantics, and packaging steps
  • Self-contained Markdown section seed reader recipe demonstrating the 1:N fanout pattern with DirectorySeedSource
  • docs/plugins/example.md and docs/plugins/overview.md updated to cross-link the new guide and reflect the three-plugin-type model
  • New Plugin Development nav group added to mkdocs.yml

Issues found:

  • The guide's output_columns code snippet omits the ClassVar[list[str]] type annotation that the recipe file uses correctly — authors copying the snippet may get unexpected Pydantic behavior or type-checker warnings
  • An empty or whitespace-only .md file causes hydrate_row to return []; whether the framework silently drops those manifest rows or raises an error is undocumented and untested, which could lead to silent data loss

Confidence Score: 4/5

  • Safe to merge after addressing the ClassVar annotation inconsistency and clarifying 0-row hydration behavior.
  • All seven changed files are documentation and a self-contained recipe script. The recipe logic is correct for the documented sample inputs, navigation wiring is accurate, and mkdocs snippet paths resolve correctly. Two minor issues — a missing ClassVar annotation in a guide snippet and an undocumented/untested empty-file edge case — prevent a perfect score but do not block the PR.
  • docs/plugins/filesystem_seed_reader.md (ClassVar annotation in code snippet) and docs/assets/recipes/plugin_development/markdown_seed_reader.py (empty-file hydration edge case)

Important Files Changed

Filename Overview
docs/assets/recipes/plugin_development/markdown_seed_reader.py Self-contained recipe implementing MarkdownSectionDirectorySeedReader with 1:N hydration. Logic is sound for the documented sample files; minor concern about how the framework handles an empty-file manifest row that hydrates to zero records.
docs/plugins/filesystem_seed_reader.md New authoring guide covering build_manifest/hydrate_row contract, manifest-based selection semantics, and packaging steps. The guide's output_columns code snippet omits the ClassVar annotation that the recipe file uses correctly, creating a subtle inconsistency for copy-pasting authors.
docs/recipes/plugin_development/markdown_seed_reader.md Short recipe landing page that embeds the Python file via mkdocs snippets. Path reference and download link look correct.
docs/plugins/overview.md Adds FileSystemSeedReader mention to the seed reader implementation list and updates the closing navigation links. Changes are accurate and consistent with the rest of the PR.
docs/plugins/example.md Updates the supported plugin-type count from two to three and adds a cross-link to the new FileSystemSeedReader guide. Small, correct change.
docs/recipes/cards.md Appends the Markdown Section Seed Reader card to the recipes gallery with correct relative links and download path.
mkdocs.yml Adds a new Plugin Development nav section for the recipe and wires in the FileSystemSeedReader Plugins page under the Plugins nav group. Navigation is correctly ordered.

Sequence Diagram

sequenceDiagram
    participant U as User / Recipe
    participant DD as DataDesigner
    participant R as MarkdownSectionDirectorySeedReader
    participant FS as FileSystem

    U->>DD: preview(config_builder, num_records=N)
    DD->>R: build_manifest(context)
    R->>FS: get_matching_relative_paths("*.md")
    FS-->>R: ["faq.md", "guide.md"]
    R-->>DD: manifest [{relative_path, file_name}, ...]

    note over DD: Apply IndexRange / shuffle on manifest rows

    loop for each selected manifest row
        DD->>R: hydrate_row(manifest_row, context)
        R->>FS: open(relative_path)
        FS-->>R: markdown text
        R->>R: extract_markdown_sections(markdown_text)
        R-->>DD: [section_row_1, section_row_2, ...]
    end

    DD->>DD: flatten hydrated rows, validate output_columns
    DD-->>U: preview.dataset (one row per section)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: docs/plugins/filesystem_seed_reader.md
Line: 37-44

Comment:
**`output_columns` missing `ClassVar` annotation in guide snippet**

The docs guide declares `output_columns` without a type annotation, while the actual recipe file (`docs/assets/recipes/plugin_development/markdown_seed_reader.py`, line 40) correctly uses `ClassVar[list[str]]`. Without `ClassVar`, type-checkers (mypy, pyright) will treat this as a regular instance variable rather than a class variable, and Pydantic may inadvertently include it as a model field. Keeping the guide snippet consistent with the recipe avoids confusion for authors who copy it.

```suggestion
    output_columns: ClassVar[list[str]] = [
        "relative_path",
        "file_name",
        "section_index",
        "section_header",
        "section_content",
    ]
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: docs/assets/recipes/plugin_development/markdown_seed_reader.py
Line: 108-110

Comment:
**Empty/whitespace-only file produces zero hydrated rows**

When a matched `.md` file is empty or contains only whitespace, `extract_markdown_sections` returns `[]`, which causes `hydrate_row` to return an empty list. Whether the framework silently drops a manifest row that yields zero hydrated records — or raises an error — is not described in this PR, and no test covers the empty-file path.

If the framework does silently skip 0-row hydrations, that could result in unexpected data loss (e.g. a Markdown file quietly disappearing from the seed dataset). Consider either:
- documenting this behavior explicitly in the guide, or
- adding an early guard that returns a single "empty document" row when no sections are found (consistent with the `fallback_header` contract already used for headerless files).

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "Merge branch 'main' ..."

@eric-tramel eric-tramel force-pushed the docs/filesystem-seed-reader-markdown-recipe branch from 09cf5d5 to 3580fba Compare March 18, 2026 01:14
@eric-tramel eric-tramel changed the base branch from feat/filesystem-seed-reader-fanout to main March 18, 2026 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant