Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .flow/epics/fn-49.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"branch_name": "feat/fn-49-ghost-state-hydrator",
"created_at": "2026-01-28T22:11:05.146952Z",
"depends_on_epics": [],
"id": "fn-49",
"next_task": 1,
"plan_review_status": "unknown",
"plan_reviewed_at": null,
"spec_path": ".flow/specs/fn-49.md",
"status": "open",
"title": "Ghost State Hydrator - Remote Investigation State Hydration",
"updated_at": "2026-01-28T22:13:14.583012Z"
}
13 changes: 13 additions & 0 deletions .flow/epics/fn-50.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"branch_name": "feat/fn-50-fix-it-button",
"created_at": "2026-01-28T22:11:12.693638Z",
"depends_on_epics": [],
"id": "fn-50",
"next_task": 1,
"plan_review_status": "unknown",
"plan_reviewed_at": null,
"spec_path": ".flow/specs/fn-50.md",
"status": "open",
"title": "Fix-It Button - Agent-Proposed Code/SQL Fixes",
"updated_at": "2026-01-28T22:13:14.787518Z"
}
13 changes: 13 additions & 0 deletions .flow/epics/fn-51.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"branch_name": "feat/fn-51-trace-to-test",
"created_at": "2026-01-28T22:11:12.881243Z",
"depends_on_epics": [],
"id": "fn-51",
"next_task": 1,
"plan_review_status": "unknown",
"plan_reviewed_at": null,
"spec_path": ".flow/specs/fn-51.md",
"status": "open",
"title": "Trace-to-Test Codify - Generate Regression Tests from Investigations",
"updated_at": "2026-01-28T22:13:15.008304Z"
}
13 changes: 13 additions & 0 deletions .flow/epics/fn-52.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"branch_name": "feat/fn-52-dataing-blame",
"created_at": "2026-01-28T22:11:13.104974Z",
"depends_on_epics": [],
"id": "fn-52",
"next_task": 1,
"plan_review_status": "unknown",
"plan_reviewed_at": null,
"spec_path": ".flow/specs/fn-52.md",
"status": "open",
"title": "dataing blame - Git Blame for Data",
"updated_at": "2026-01-28T22:13:15.284674Z"
}
13 changes: 13 additions & 0 deletions .flow/epics/fn-53.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"branch_name": "feat/fn-53-dataing-sniff",
"created_at": "2026-01-28T22:11:13.316451Z",
"depends_on_epics": [],
"id": "fn-53",
"next_task": 1,
"plan_review_status": "unknown",
"plan_reviewed_at": null,
"spec_path": ".flow/specs/fn-53.md",
"status": "open",
"title": "dataing sniff - Pre-Merge Impact Analysis",
"updated_at": "2026-01-28T22:13:15.482151Z"
}
13 changes: 13 additions & 0 deletions .flow/epics/fn-54.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"branch_name": "feat/fn-54-dataing-mock",
"created_at": "2026-01-28T22:11:13.504079Z",
"depends_on_epics": [],
"id": "fn-54",
"next_task": 1,
"plan_review_status": "unknown",
"plan_reviewed_at": null,
"spec_path": ".flow/specs/fn-54.md",
"status": "open",
"title": "dataing mock - Local Mock Data Environments",
"updated_at": "2026-01-28T22:13:15.681845Z"
}
58 changes: 58 additions & 0 deletions .flow/specs/fn-49.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Ghost State Hydrator

## Overview

Enable engineers to pull a remote production investigation state into their local Jupyter notebook, fully hydrated with DataFrames, schema snapshots, and lineage context.

## User Value

"It works on my machine" becomes "I can reproduce the exact production state on my machine." The notebook becomes a forensic time machine.

## Competitor Gap

Nobody does this. Monte Carlo investigations live in their UI. GX has no investigation concept. Soda has no state capture.

## Approach

1. Bond agent serializes investigation state (DataFrames, schema, context) to cloud storage during workflow execution
2. JupyterLab widget fetches and deserializes state into notebook kernel
3. User gets populated variables matching the production failure moment

## Scope

- InvestigationSnapshot Pydantic model with version field
- Snapshot capture in Temporal workflow at key checkpoints
- Snapshot download API with streaming and auth
- SDK deserializer for Python objects
- JupyterLab "Hydrate" widget button and magic command
- Snapshot diff comparison

## Quick Commands

```bash
# Run unit tests for snapshot module
uv run pytest python-packages/dataing/tests/unit/core/test_snapshot.py -v

# Run SDK tests
uv run pytest python-packages/dataing/tests/unit/sdk/ -v

# Type check
uv run mypy python-packages/dataing/src/dataing/core/snapshot.py
```

## Acceptance

- [ ] InvestigationSnapshot model defined with all required fields
- [ ] Snapshots captured at investigation start, after hypothesis testing, on completion
- [ ] Snapshots stored to configurable backend (local, S3, GCS)
- [ ] Download API with streaming, gzip, auth, tenant isolation
- [ ] SDK `load_snapshot()` returns HydratedState with DataFrames
- [ ] JupyterLab widget shows investigations and "Hydrate" button
- [ ] Magic command `%dataing hydrate inv_abc123` works
- [ ] Snapshot diff comparison shows changes between checkpoints

## References

- Existing investigation workflow: `python-packages/dataing/src/dataing/temporal/workflows/investigation.py`
- Domain types: `python-packages/dataing/src/dataing/core/domain_types.py`
- JupyterLab extension: `frontend/` (React + TypeScript)
58 changes: 58 additions & 0 deletions .flow/specs/fn-50.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Fix-It Button

## Overview

After investigation identifies root cause, agent proposes a specific code/SQL fix that the user can apply with one click.

## User Value

Closes the loop from "here's the problem" to "here's the fix." No more copy-pasting SQL from ChatGPT. The agent that investigated also remediates.

## Competitor Gap

Every observability tool stops at diagnosis. None propose fixes. This makes Dataing an agent, not a dashboard.

## Approach

1. Agent generates fix proposal during synthesis phase
2. Fix proposals are typed (SQL DDL, SQL DML, dbt patch, Python patch)
3. Widget renders fix with "Apply" button
4. Apply executes in transaction with rollback on failure

## Scope

- FixProposal Pydantic model with fix_type, code, risks, rollback
- Synthesis agent extended to generate fixes when confidence > 0.7
- SQL fix validator using sqlglot
- Fix preview widget with syntax highlighting
- Fix execution engine with transaction safety
- Feedback loop for fix effectiveness tracking

## Quick Commands

```bash
# Run fix proposal tests
uv run pytest python-packages/dataing/tests/unit/agents/test_fix_proposal.py -v

# Run validator tests
uv run pytest python-packages/dataing/tests/unit/safety/test_fix_validator.py -v

# Type check
uv run mypy python-packages/dataing/src/dataing/agents/models.py
```

## Acceptance

- [ ] FixProposal model supports sql_ddl, sql_dml, dbt_patch, python_patch, manual_instruction
- [ ] Synthesis prompt generates fixes when confidence > 0.7
- [ ] Fixes validated for syntax and safety (no DROP TABLE, etc.)
- [ ] Widget shows fix with syntax highlighting and risk warnings
- [ ] "Apply" button executes fix in transaction
- [ ] Rollback available for 5 minutes after apply
- [ ] Feedback collected on fix effectiveness

## References

- Synthesis agent: `python-packages/dataing/src/dataing/agents/prompts/synthesis.py`
- SQL validation: `python-packages/dataing/src/dataing/safety/`
- Agent models: `python-packages/dataing/src/dataing/agents/models.py`
60 changes: 60 additions & 0 deletions .flow/specs/fn-51.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Trace-to-Test Codify

## Overview

Generate a regression test from a solved investigation that can be added to CI/CD to prevent recurrence.

## User Value

Every bug fixed becomes a permanent test. Institutional knowledge compounds. The platform gets stickier and the data gets safer over time.

## Competitor Gap

GX has tests but you write them manually. Soda has checks but no auto-generation. Nobody turns investigations into tests.

## Approach

1. Investigation synthesis includes test-relevant assertions
2. "Codify" action generates test in user's preferred format (GX, dbt, Soda, SQL)
3. Test is scoped to the specific failure mode discovered
4. User can customize and export to their test framework

## Scope

- DataQualityTest abstract model (framework-agnostic)
- Test extraction from synthesis results
- Renderers for GX, dbt, Soda, SQL formats
- JupyterLab "Codify Test" widget button
- CLI `dataing codify` command
- Test adoption and effectiveness tracking

## Quick Commands

```bash
# Run test codify tests
uv run pytest python-packages/dataing/tests/unit/core/test_codify.py -v

# Run renderer tests
uv run pytest python-packages/dataing/tests/unit/renderers/ -v

# Type check
uv run mypy python-packages/dataing/src/dataing/core/codify.py
```

## Acceptance

- [ ] DataQualityTest model captures assertion, scope, threshold, description
- [ ] Extraction maps root cause types to test types (null → NOT_NULL, etc.)
- [ ] GX renderer outputs expectation suite JSON
- [ ] dbt renderer outputs schema.yml test definition
- [ ] Soda renderer outputs SodaCL check YAML
- [ ] SQL renderer outputs assertion query
- [ ] Widget shows format selector and copy/download buttons
- [ ] CLI `dataing codify <id> --format=dbt` works
- [ ] Test adoption tracked in analytics

## References

- Synthesis response: `python-packages/dataing/src/dataing/agents/models.py`
- CLI: `python-packages/dataing/src/dataing/entrypoints/cli/`
- Domain types: `python-packages/dataing/src/dataing/core/domain_types.py`
59 changes: 59 additions & 0 deletions .flow/specs/fn-52.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# dataing blame - Git Blame for Data

## Overview

Provide "git blame for data"—show the history of changes affecting a table/column, linking to PRs, authors, and correlated anomalies.

## User Value

When paged at 3am, run one command to see what changed, who changed it, and when. Instant context for debugging.

## Competitor Gap

Nobody connects data state to code history in a single CLI command. This is the purest expression of "causal debugging."

## Approach

1. Scan git history to find commits that affected a specific table or column
2. Use dataset-to-repo mapping to know which files to scan
3. Correlate commits with anomaly history within ±24h window
4. Output rich CLI table with PR links and anomaly correlations

## Scope

- BlameResult and BlameEntry data models
- Git history scanner with file-to-table mapping
- Anomaly correlation with strength indicators
- Rich CLI output with colors and emoji
- Deep-dive mode for full commit details
- JSON output for scripting

## Quick Commands

```bash
# Run blame command tests
uv run pytest python-packages/dataing/tests/unit/cli/test_blame.py -v

# Run integration tests
uv run pytest python-packages/dataing/tests/integration/cli/test_blame_integration.py -v

# Type check
uv run mypy python-packages/dataing/src/dataing/entrypoints/cli/blame.py
```

## Acceptance

- [ ] `dataing blame <table>` shows table-level blame
- [ ] `dataing blame <table>.<column>` shows column-level blame
- [ ] `--since=<date>` filters to changes after date
- [ ] `--limit=N` limits to N most recent changes
- [ ] `--json` outputs structured JSON
- [ ] PR links are clickable in supported terminals
- [ ] Anomaly correlations shown with strength (strong/medium/weak)
- [ ] `--expand=<sha>` shows full commit details

## References

- Git integration: `python-packages/dataing/src/dataing/adapters/git/`
- CLI framework: `python-packages/dataing/src/dataing/entrypoints/cli/`
- Anomaly history: `python-packages/dataing/src/dataing/adapters/db/`
60 changes: 60 additions & 0 deletions .flow/specs/fn-53.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# dataing sniff - Pre-Merge Impact Analysis

## Overview

Pre-merge impact analysis—before merging a dbt/SQL change, predict what downstream data will break.

## User Value

Shift-left data quality. Catch problems before they reach production. CI/CD integration makes Dataing un-churnable.

## Competitor Gap

Datafold does "data diff" but not anomaly prediction. Monte Carlo's "Circuit Breaker" is post-merge. Nobody predicts anomalies pre-merge.

## Approach

1. Parse code changes to understand what's being modified (tables, columns, transformations)
2. Trace lineage to find all downstream tables that could be affected
3. Predict which anomalies are likely based on change type and historical patterns
4. Output CI/CD-friendly format with exit codes

## Scope

- ImpactPrediction model with affected tables, predicted anomalies
- Change parser for dbt/SQL using sqlglot
- Downstream impact analyzer using lineage
- Anomaly predictor with rule-based and historical patterns
- CLI with CI/CD output formats (GitHub Actions, GitLab CI)
- GitHub Action wrapper for easy integration

## Quick Commands

```bash
# Run sniff command tests
uv run pytest python-packages/dataing/tests/unit/cli/test_sniff.py -v

# Run change parser tests
uv run pytest python-packages/dataing/tests/unit/parsers/test_change_parser.py -v

# Type check
uv run mypy python-packages/dataing/src/dataing/entrypoints/cli/sniff.py
```

## Acceptance

- [ ] `dataing sniff` analyzes uncommitted changes
- [ ] `dataing sniff --ref=HEAD~1..HEAD` analyzes commit range
- [ ] `dataing sniff --pr=123` fetches PR diff from GitHub
- [ ] Change parser detects: column added/removed, join changed, filter modified
- [ ] Downstream impact shows affected tables with distance
- [ ] Anomaly predictor outputs likelihood and reasoning
- [ ] `--exit-code` returns 1 on high/critical risk
- [ ] `--format=github` outputs GitHub Actions annotations
- [ ] GitHub Action published to marketplace

## References

- Lineage API: `python-packages/dataing/src/dataing/adapters/lineage/`
- Git integration: `python-packages/dataing/src/dataing/adapters/git/`
- CLI framework: `python-packages/dataing/src/dataing/entrypoints/cli/`
Loading