feat(sync): pre-compile dbt docs for repo sources to improve centralized docs context#213
feat(sync): pre-compile dbt docs for repo sources to improve centralized docs context#213Flamki wants to merge 5 commits intogetnao:mainfrom
Conversation
|
Mentioned this on #209 because both are related and we should find the best combination of both to make it understandable for end users. |
6dc041f to
8994347
Compare
|
Thanks for the feedback and for linking #209. I cleaned up #213 and force-pushed a focused branch:
Scope of #213 remains intentionally minimal:
I see #213 as complementary to #209, not competing:
Happy to align on whether to keep both paths, pick one as default, or gate both behind config flags. |
There was a problem hiding this comment.
1 issue found across 13 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="cli/nao_core/commands/sync/providers/repositories/provider.py">
<violation number="1" location="cli/nao_core/commands/sync/providers/repositories/provider.py:114">
P1: Missing exception handling contradicts the "best-effort" contract. Unlike `clone_or_pull_repo` which wraps its work in `try/except`, this function lets exceptions from `subprocess.run` (e.g., `OSError`, `PermissionError`) propagate up, which would crash the sync loop and skip all remaining repositories. Wrap the body in a try/except to match the sibling function's pattern.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
Be careful you added the Trino code to this PR. |
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="cli/nao_core/commands/sync/providers/repositories/provider.py">
<violation number="1" location="cli/nao_core/commands/sync/providers/repositories/provider.py:115">
P1: `subprocess.run` for `dbt docs generate` has no `timeout`, which can hang the sync process indefinitely if the database is unreachable or slow. Consider adding a reasonable timeout (e.g., 300 seconds) to keep this best-effort and non-blocking.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
Hey @Flamki you added the Trino code from your other PR to this one, could you remove it? |
Bl3f
left a comment
There was a problem hiding this comment.
Regarding the dbt, its LGTM but remove the Trino code if you can pls.
| from .postgres import PostgresConfig | ||
| from .redshift import RedshiftConfig | ||
| from .snowflake import SnowflakeConfig | ||
| from .trino import TrinoConfig |
| dbt_project_file = repo_path / "dbt_project.yml" | ||
|
|
||
| if not dbt_project_file.exists(): | ||
| console.print( |
There was a problem hiding this comment.
Use the ui.py methods to display stuff in the console.
| cmd.extend(["--profiles-dir", repo.dbt_profiles_dir]) | ||
|
|
||
| console.print(f" [dim]Pre-compiling dbt docs for[/dim] {repo.name}") | ||
| result = subprocess.run(cmd, capture_output=True, text=True, check=False, timeout=300) |
There was a problem hiding this comment.
The thing I'm the most worried about is that a user will have to forward all the necessary dbt env variables to compile correctly the dbt docs (esp. if we want it in the "production" context). It adds a small complexity, but the feature is interesting, so we can add it, but might be complex for users to make it work.
| ## Recent Updates | ||
|
|
||
| - `nao sync` now supports optional dbt docs pre-compilation for repository sources (`compile_dbt_docs: true`), which | ||
| generates compiled artifacts such as `target/catalog.json` and `target/manifest.json` for richer agent context. |
There was a problem hiding this comment.
One last thing I'm afraid about is the size of the manifest.json for large project that might make it unusable and pollute the context window. Nonetheless, as it generate also the compile version of each query in the folder structure that's interesting.
611c65b to
2e894a6
Compare
|
Thanks for the review — addressed. Updates made in this PR:
Current status:
Validation:
Happy to adjust any wording in docs around env var complexity / manifest size if you want before merge. |
Summary
Adds optional dbt docs pre-compilation during nao sync for repository sources, so the agent can use compiled dbt artifacts (catalog.json, manifest.json) when projects use centralized docs references in YAML.
Closes #190.
Changes
• Added repo config options:
• compile_dbt_docs: bool (default false)
• dbt_profiles_dir: Optional[str]
• Updated repository sync flow:
• after clone/pull, when enabled, run:
dbt docs generate --project-dir [--profiles-dir ...]
• Added best-effort behavior:
• if repo is not dbt, dbt is missing, or generation fails, sync continues with warning logs
• Added unit tests for:
• skip when disabled
• skip when non-dbt repo
• skip when dbt binary missing
• execute command path with profiles dir
• integration into repo sync success flow
• Updated docs:
• README.md (usage + config example + behavior notes)
• README.md (“Recent Updates” note)
• nao_config.yaml enabling this for example dbt repo
Why
Raw dbt YAML with centralized doc() references can be hard for the agent to interpret directly. Compiled docs artifacts provide richer, resolved metadata.
Backward compatibility
• No behavior change unless compile_dbt_docs: true is explicitly set.
• Existing configs continue to work unchanged.
Validation
• test_repository_provider.py
• Result: 17 passed