bordumb · bordumb · Jan 29, 2026 · Jan 28, 2026 · Jan 28, 2026 · Jan 28, 2026
@@ -0,0 +1,13 @@
+{
+  "branch_name": "feat/fn-49-ghost-state-hydrator",
+  "created_at": "2026-01-28T22:11:05.146952Z",
+  "depends_on_epics": [],
+  "id": "fn-49",
+  "next_task": 1,
+  "plan_review_status": "unknown",
+  "plan_reviewed_at": null,
+  "spec_path": ".flow/specs/fn-49.md",
+  "status": "open",
+  "title": "Ghost State Hydrator - Remote Investigation State Hydration",
+  "updated_at": "2026-01-28T22:13:14.583012Z"
+}
@@ -0,0 +1,13 @@
+{
+  "branch_name": "feat/fn-50-fix-it-button",
+  "created_at": "2026-01-28T22:11:12.693638Z",
+  "depends_on_epics": [],
+  "id": "fn-50",
+  "next_task": 1,
+  "plan_review_status": "unknown",
+  "plan_reviewed_at": null,
+  "spec_path": ".flow/specs/fn-50.md",
+  "status": "open",
+  "title": "Fix-It Button - Agent-Proposed Code/SQL Fixes",
+  "updated_at": "2026-01-28T22:13:14.787518Z"
+}
@@ -0,0 +1,13 @@
+{
+  "branch_name": "feat/fn-51-trace-to-test",
+  "created_at": "2026-01-28T22:11:12.881243Z",
+  "depends_on_epics": [],
+  "id": "fn-51",
+  "next_task": 1,
+  "plan_review_status": "unknown",
+  "plan_reviewed_at": null,
+  "spec_path": ".flow/specs/fn-51.md",
+  "status": "open",
+  "title": "Trace-to-Test Codify - Generate Regression Tests from Investigations",
+  "updated_at": "2026-01-28T22:13:15.008304Z"
+}
@@ -0,0 +1,13 @@
+{
+  "branch_name": "feat/fn-52-dataing-blame",
+  "created_at": "2026-01-28T22:11:13.104974Z",
+  "depends_on_epics": [],
+  "id": "fn-52",
+  "next_task": 1,
+  "plan_review_status": "unknown",
+  "plan_reviewed_at": null,
+  "spec_path": ".flow/specs/fn-52.md",
+  "status": "open",
+  "title": "dataing blame - Git Blame for Data",
+  "updated_at": "2026-01-28T22:13:15.284674Z"
+}
@@ -0,0 +1,13 @@
+{
+  "branch_name": "feat/fn-53-dataing-sniff",
+  "created_at": "2026-01-28T22:11:13.316451Z",
+  "depends_on_epics": [],
+  "id": "fn-53",
+  "next_task": 1,
+  "plan_review_status": "unknown",
+  "plan_reviewed_at": null,
+  "spec_path": ".flow/specs/fn-53.md",
+  "status": "open",
+  "title": "dataing sniff - Pre-Merge Impact Analysis",
+  "updated_at": "2026-01-28T22:13:15.482151Z"
+}
@@ -0,0 +1,13 @@
+{
+  "branch_name": "feat/fn-54-dataing-mock",
+  "created_at": "2026-01-28T22:11:13.504079Z",
+  "depends_on_epics": [],
+  "id": "fn-54",
+  "next_task": 1,
+  "plan_review_status": "unknown",
+  "plan_reviewed_at": null,
+  "spec_path": ".flow/specs/fn-54.md",
+  "status": "open",
+  "title": "dataing mock - Local Mock Data Environments",
+  "updated_at": "2026-01-28T22:13:15.681845Z"
+}
@@ -0,0 +1,58 @@
+# Ghost State Hydrator
+
+## Overview
+
+Enable engineers to pull a remote production investigation state into their local Jupyter notebook, fully hydrated with DataFrames, schema snapshots, and lineage context.
+
+## User Value
+
+"It works on my machine" becomes "I can reproduce the exact production state on my machine." The notebook becomes a forensic time machine.
+
+## Competitor Gap
+
+Nobody does this. Monte Carlo investigations live in their UI. GX has no investigation concept. Soda has no state capture.
+
+## Approach
+
+1. Bond agent serializes investigation state (DataFrames, schema, context) to cloud storage during workflow execution
+2. JupyterLab widget fetches and deserializes state into notebook kernel
+3. User gets populated variables matching the production failure moment
+
+## Scope
+
+- InvestigationSnapshot Pydantic model with version field
+- Snapshot capture in Temporal workflow at key checkpoints
+- Snapshot download API with streaming and auth
+- SDK deserializer for Python objects
+- JupyterLab "Hydrate" widget button and magic command
+- Snapshot diff comparison
+
+## Quick Commands
+
+```bash
+# Run unit tests for snapshot module
+uv run pytest python-packages/dataing/tests/unit/core/test_snapshot.py -v
+
+# Run SDK tests
+uv run pytest python-packages/dataing/tests/unit/sdk/ -v
+
+# Type check
+uv run mypy python-packages/dataing/src/dataing/core/snapshot.py
+```
+
+## Acceptance
+
+- [ ] InvestigationSnapshot model defined with all required fields
+- [ ] Snapshots captured at investigation start, after hypothesis testing, on completion
+- [ ] Snapshots stored to configurable backend (local, S3, GCS)
+- [ ] Download API with streaming, gzip, auth, tenant isolation
+- [ ] SDK `load_snapshot()` returns HydratedState with DataFrames
+- [ ] JupyterLab widget shows investigations and "Hydrate" button
+- [ ] Magic command `%dataing hydrate inv_abc123` works
+- [ ] Snapshot diff comparison shows changes between checkpoints
+
+## References
+
+- Existing investigation workflow: `python-packages/dataing/src/dataing/temporal/workflows/investigation.py`
+- Domain types: `python-packages/dataing/src/dataing/core/domain_types.py`
+- JupyterLab extension: `frontend/` (React + TypeScript)
@@ -0,0 +1,58 @@
+# Fix-It Button
+
+## Overview
+
+After investigation identifies root cause, agent proposes a specific code/SQL fix that the user can apply with one click.
+
+## User Value
+
+Closes the loop from "here's the problem" to "here's the fix." No more copy-pasting SQL from ChatGPT. The agent that investigated also remediates.
+
+## Competitor Gap
+
+Every observability tool stops at diagnosis. None propose fixes. This makes Dataing an agent, not a dashboard.
+
+## Approach
+
+1. Agent generates fix proposal during synthesis phase
+2. Fix proposals are typed (SQL DDL, SQL DML, dbt patch, Python patch)
+3. Widget renders fix with "Apply" button
+4. Apply executes in transaction with rollback on failure
+
+## Scope
+
+- FixProposal Pydantic model with fix_type, code, risks, rollback
+- Synthesis agent extended to generate fixes when confidence > 0.7
+- SQL fix validator using sqlglot
+- Fix preview widget with syntax highlighting
+- Fix execution engine with transaction safety
+- Feedback loop for fix effectiveness tracking
+
+## Quick Commands
+
+```bash
+# Run fix proposal tests
+uv run pytest python-packages/dataing/tests/unit/agents/test_fix_proposal.py -v
+
+# Run validator tests
+uv run pytest python-packages/dataing/tests/unit/safety/test_fix_validator.py -v
+
+# Type check
+uv run mypy python-packages/dataing/src/dataing/agents/models.py
+```
+
+## Acceptance
+
+- [ ] FixProposal model supports sql_ddl, sql_dml, dbt_patch, python_patch, manual_instruction
+- [ ] Synthesis prompt generates fixes when confidence > 0.7
+- [ ] Fixes validated for syntax and safety (no DROP TABLE, etc.)
+- [ ] Widget shows fix with syntax highlighting and risk warnings
+- [ ] "Apply" button executes fix in transaction
+- [ ] Rollback available for 5 minutes after apply
+- [ ] Feedback collected on fix effectiveness
+
+## References
+
+- Synthesis agent: `python-packages/dataing/src/dataing/agents/prompts/synthesis.py`
+- SQL validation: `python-packages/dataing/src/dataing/safety/`
+- Agent models: `python-packages/dataing/src/dataing/agents/models.py`
@@ -0,0 +1,60 @@
+# Trace-to-Test Codify
+
+## Overview
+
+Generate a regression test from a solved investigation that can be added to CI/CD to prevent recurrence.
+
+## User Value
+
+Every bug fixed becomes a permanent test. Institutional knowledge compounds. The platform gets stickier and the data gets safer over time.
+
+## Competitor Gap
+
+GX has tests but you write them manually. Soda has checks but no auto-generation. Nobody turns investigations into tests.
+
+## Approach
+
+1. Investigation synthesis includes test-relevant assertions
+2. "Codify" action generates test in user's preferred format (GX, dbt, Soda, SQL)
+3. Test is scoped to the specific failure mode discovered
+4. User can customize and export to their test framework
+
+## Scope
+
+- DataQualityTest abstract model (framework-agnostic)
+- Test extraction from synthesis results
+- Renderers for GX, dbt, Soda, SQL formats
+- JupyterLab "Codify Test" widget button
+- CLI `dataing codify` command
+- Test adoption and effectiveness tracking
+
+## Quick Commands
+
+```bash
+# Run test codify tests
+uv run pytest python-packages/dataing/tests/unit/core/test_codify.py -v
+
+# Run renderer tests
+uv run pytest python-packages/dataing/tests/unit/renderers/ -v
+
+# Type check
+uv run mypy python-packages/dataing/src/dataing/core/codify.py
+```
+
+## Acceptance
+
+- [ ] DataQualityTest model captures assertion, scope, threshold, description
+- [ ] Extraction maps root cause types to test types (null → NOT_NULL, etc.)
+- [ ] GX renderer outputs expectation suite JSON
+- [ ] dbt renderer outputs schema.yml test definition
+- [ ] Soda renderer outputs SodaCL check YAML
+- [ ] SQL renderer outputs assertion query
+- [ ] Widget shows format selector and copy/download buttons
+- [ ] CLI `dataing codify <id> --format=dbt` works
+- [ ] Test adoption tracked in analytics
+
+## References
+
+- Synthesis response: `python-packages/dataing/src/dataing/agents/models.py`
+- CLI: `python-packages/dataing/src/dataing/entrypoints/cli/`
+- Domain types: `python-packages/dataing/src/dataing/core/domain_types.py`
@@ -0,0 +1,59 @@
+# dataing blame - Git Blame for Data
+
+## Overview
+
+Provide "git blame for data"—show the history of changes affecting a table/column, linking to PRs, authors, and correlated anomalies.
+
+## User Value
+
+When paged at 3am, run one command to see what changed, who changed it, and when. Instant context for debugging.
+
+## Competitor Gap
+
+Nobody connects data state to code history in a single CLI command. This is the purest expression of "causal debugging."
+
+## Approach
+
+1. Scan git history to find commits that affected a specific table or column
+2. Use dataset-to-repo mapping to know which files to scan
+3. Correlate commits with anomaly history within ±24h window
+4. Output rich CLI table with PR links and anomaly correlations
+
+## Scope
+
+- BlameResult and BlameEntry data models
+- Git history scanner with file-to-table mapping
+- Anomaly correlation with strength indicators
+- Rich CLI output with colors and emoji
+- Deep-dive mode for full commit details
+- JSON output for scripting
+
+## Quick Commands
+
+```bash
+# Run blame command tests
+uv run pytest python-packages/dataing/tests/unit/cli/test_blame.py -v
+
+# Run integration tests
+uv run pytest python-packages/dataing/tests/integration/cli/test_blame_integration.py -v
+
+# Type check
+uv run mypy python-packages/dataing/src/dataing/entrypoints/cli/blame.py
+```
+
+## Acceptance
+
+- [ ] `dataing blame <table>` shows table-level blame
+- [ ] `dataing blame <table>.<column>` shows column-level blame
+- [ ] `--since=<date>` filters to changes after date
+- [ ] `--limit=N` limits to N most recent changes
+- [ ] `--json` outputs structured JSON
+- [ ] PR links are clickable in supported terminals
+- [ ] Anomaly correlations shown with strength (strong/medium/weak)
+- [ ] `--expand=<sha>` shows full commit details
+
+## References
+
+- Git integration: `python-packages/dataing/src/dataing/adapters/git/`
+- CLI framework: `python-packages/dataing/src/dataing/entrypoints/cli/`
+- Anomaly history: `python-packages/dataing/src/dataing/adapters/db/`
@@ -0,0 +1,60 @@
+# dataing sniff - Pre-Merge Impact Analysis
+
+## Overview
+
+Pre-merge impact analysis—before merging a dbt/SQL change, predict what downstream data will break.
+
+## User Value
+
+Shift-left data quality. Catch problems before they reach production. CI/CD integration makes Dataing un-churnable.
+
+## Competitor Gap
+
+Datafold does "data diff" but not anomaly prediction. Monte Carlo's "Circuit Breaker" is post-merge. Nobody predicts anomalies pre-merge.
+
+## Approach
+
+1. Parse code changes to understand what's being modified (tables, columns, transformations)
+2. Trace lineage to find all downstream tables that could be affected
+3. Predict which anomalies are likely based on change type and historical patterns
+4. Output CI/CD-friendly format with exit codes
+
+## Scope
+
+- ImpactPrediction model with affected tables, predicted anomalies
+- Change parser for dbt/SQL using sqlglot
+- Downstream impact analyzer using lineage
+- Anomaly predictor with rule-based and historical patterns
+- CLI with CI/CD output formats (GitHub Actions, GitLab CI)
+- GitHub Action wrapper for easy integration
+
+## Quick Commands
+
+```bash
+# Run sniff command tests
+uv run pytest python-packages/dataing/tests/unit/cli/test_sniff.py -v
+
+# Run change parser tests
+uv run pytest python-packages/dataing/tests/unit/parsers/test_change_parser.py -v
+
+# Type check
+uv run mypy python-packages/dataing/src/dataing/entrypoints/cli/sniff.py
+```
+
+## Acceptance
+
+- [ ] `dataing sniff` analyzes uncommitted changes
+- [ ] `dataing sniff --ref=HEAD~1..HEAD` analyzes commit range
+- [ ] `dataing sniff --pr=123` fetches PR diff from GitHub
+- [ ] Change parser detects: column added/removed, join changed, filter modified
+- [ ] Downstream impact shows affected tables with distance
+- [ ] Anomaly predictor outputs likelihood and reasoning
+- [ ] `--exit-code` returns 1 on high/critical risk
+- [ ] `--format=github` outputs GitHub Actions annotations
+- [ ] GitHub Action published to marketplace
+
+## References
+
+- Lineage API: `python-packages/dataing/src/dataing/adapters/lineage/`
+- Git integration: `python-packages/dataing/src/dataing/adapters/git/`
+- CLI framework: `python-packages/dataing/src/dataing/entrypoints/cli/`