Epic 0.2: Notebook Investigation Experience

### Epic 0.2: Notebook Investigation Experience
**Goal:** A data scientist can run investigations directly in Jupyter with rich output.  
**User Value:** Stay in the notebook environment for debugging—no context switching.  
**Competitor Weakness:** Monte Carlo has no notebook story; GX is validation-only; Soda is testing-only.

---

#### Task 0.2.1: Enhanced `%dataing ask` with Rich Output

**Title:** Upgrade notebook ask command with streaming rich widgets

**Description:**  
The existing `%dataing ask` magic should be enhanced with proper Jupyter widgets for streaming output. Hypotheses, queries, and evidence should render as interactive collapsible sections. The final synthesis should display with formatting.

**Why:** Notebooks are the natural home for data scientists. Rich output makes investigations feel native to the Jupyter experience.

**Acceptance Criteria:**
- [ ] `%dataing ask "<question>"` streams investigation to output
- [ ] Each hypothesis renders as collapsible accordion widget
- [ ] SQL queries render with syntax highlighting (pygments)
- [ ] Query results show as pandas DataFrames (truncated)
- [ ] Evidence sections show support/refute badges
- [ ] Final synthesis renders as styled HTML box
- [ ] Confidence displayed as progress bar
- [ ] Timeline visualization shows investigation flow
- [ ] Output is reproducible—re-running cell shows cached result
- [ ] `%%dataing ask` cell magic for multi-line questions

**Key Design Notes:**
- Use `ipywidgets` for interactive elements
- Fallback to plain text for non-widget environments (VS Code, etc.)
- Cache investigation results in notebook metadata for reproducibility

**Key APIs:**
- Existing investigation APIs
- SSE stream handling

**Dependencies:**
- Existing notebook magic infrastructure

**Risks + Mitigations:**
- Risk: Widget rendering varies across Jupyter environments → Mitigation: Feature detection, graceful fallback
- Risk: Large DataFrames crash output → Mitigation: Always truncate, show row count

**Effort:** M (4-5 days)

**Designation:** OSS

---

#### Task 0.2.2: Notebook Lineage Visualization

**Title:** Interactive lineage graph in notebook cells

**Description:**  
`%dataing lineage` should render an interactive lineage graph showing upstream and downstream dependencies of the current context. Clicking nodes should show dataset details. The graph should support pan/zoom.

**Why:** Understanding data flow is essential for debugging. An interactive graph in the notebook keeps engineers in their environment.

**Acceptance Criteria:**
- [ ] `%dataing lineage` renders graph for current context
- [ ] `%dataing lineage <dataset>` renders graph for specific dataset
- [ ] `--depth <n>` controls traversal depth (default 2)
- [ ] `--direction upstream|downstream|both` controls direction
- [ ] Nodes show dataset name, type (table/view), and data source
- [ ] Edges show job names and last run time
- [ ] Clicking node shows panel with: schema, metrics, recent investigations
- [ ] Jobs (transformations) shown as diamond nodes
- [ ] Pan/zoom/reset controls
- [ ] Export to PNG/SVG
- [ ] Fallback to ASCII art when widgets unavailable

**Key Design Notes:**
- Use `pyvis` or `ipycytoscape` for graph rendering
- Layout algorithm: dagre for hierarchy
- Color coding: tables=blue, views=green, current=highlighted

**Key APIs:**
- `GET /api/v1/lineage/graph` (exists)
- `GET /api/v1/lineage/job/{id}` (exists)

**Dependencies:**
- Task 0.1.4 (context management)

**Risks + Mitigations:**
- Risk: Large lineage graphs unreadable → Mitigation: Collapse distant nodes, expand on click
- Risk: Different notebook environments → Mitigation: Multiple backends (pyvis, graphviz, ASCII)

**Effort:** M (4-5 days)

**Designation:** OSS

---

#### Task 0.2.3: Investigation History and Replay

**Title:** Investigation history browser in notebook

**Description:**  
`%dataing history` should show past investigations with the ability to load and replay them. This enables comparing investigations over time and building institutional knowledge.

**Why:** Investigations are valuable artifacts. Being able to browse and replay them turns debugging sessions into reusable knowledge.

**Acceptance Criteria:**
- [ ] `%dataing history` shows list of recent investigations
- [ ] Each entry shows: dataset, goal, status, duration, root cause summary
- [ ] `%dataing history --dataset <id>` filters to specific dataset
- [ ] `%dataing history --days <n>` filters by recency
- [ ] Clicking entry loads investigation details
- [ ] `%dataing replay <investigation_id>` loads investigation into context
- [ ] Replayed investigation shows all evidence and synthesis
- [ ] Compare mode: `%dataing compare <id1> <id2>` shows diff
- [ ] Pagination for large history

**Key Design Notes:**
- Use existing `GET /api/v1/investigations` endpoint
- History widget as selectable list
- Compare mode highlights different hypotheses and findings

**Key APIs:**
- `GET /api/v1/investigations` (exists)
- `GET /api/v1/investigations/{id}` (exists)

**Dependencies:**
- Task 0.2.1 (rich output infrastructure)

**Risks + Mitigations:**
- Risk: Large history slows load → Mitigation: Pagination, lazy loading

**Effort:** S (3 days)

**Designation:** OSS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic 0.2: Notebook Investigation Experience #93