-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Epic 0.2: Notebook Investigation Experience
Goal: A data scientist can run investigations directly in Jupyter with rich output.
User Value: Stay in the notebook environment for debugging—no context switching.
Competitor Weakness: Monte Carlo has no notebook story; GX is validation-only; Soda is testing-only.
Task 0.2.1: Enhanced %dataing ask with Rich Output
Title: Upgrade notebook ask command with streaming rich widgets
Description:
The existing %dataing ask magic should be enhanced with proper Jupyter widgets for streaming output. Hypotheses, queries, and evidence should render as interactive collapsible sections. The final synthesis should display with formatting.
Why: Notebooks are the natural home for data scientists. Rich output makes investigations feel native to the Jupyter experience.
Acceptance Criteria:
-
%dataing ask "<question>"streams investigation to output - Each hypothesis renders as collapsible accordion widget
- SQL queries render with syntax highlighting (pygments)
- Query results show as pandas DataFrames (truncated)
- Evidence sections show support/refute badges
- Final synthesis renders as styled HTML box
- Confidence displayed as progress bar
- Timeline visualization shows investigation flow
- Output is reproducible—re-running cell shows cached result
-
%%dataing askcell magic for multi-line questions
Key Design Notes:
- Use
ipywidgetsfor interactive elements - Fallback to plain text for non-widget environments (VS Code, etc.)
- Cache investigation results in notebook metadata for reproducibility
Key APIs:
- Existing investigation APIs
- SSE stream handling
Dependencies:
- Existing notebook magic infrastructure
Risks + Mitigations:
- Risk: Widget rendering varies across Jupyter environments → Mitigation: Feature detection, graceful fallback
- Risk: Large DataFrames crash output → Mitigation: Always truncate, show row count
Effort: M (4-5 days)
Designation: OSS
Task 0.2.2: Notebook Lineage Visualization
Title: Interactive lineage graph in notebook cells
Description:
%dataing lineage should render an interactive lineage graph showing upstream and downstream dependencies of the current context. Clicking nodes should show dataset details. The graph should support pan/zoom.
Why: Understanding data flow is essential for debugging. An interactive graph in the notebook keeps engineers in their environment.
Acceptance Criteria:
-
%dataing lineagerenders graph for current context -
%dataing lineage <dataset>renders graph for specific dataset -
--depth <n>controls traversal depth (default 2) -
--direction upstream|downstream|bothcontrols direction - Nodes show dataset name, type (table/view), and data source
- Edges show job names and last run time
- Clicking node shows panel with: schema, metrics, recent investigations
- Jobs (transformations) shown as diamond nodes
- Pan/zoom/reset controls
- Export to PNG/SVG
- Fallback to ASCII art when widgets unavailable
Key Design Notes:
- Use
pyvisoripycytoscapefor graph rendering - Layout algorithm: dagre for hierarchy
- Color coding: tables=blue, views=green, current=highlighted
Key APIs:
GET /api/v1/lineage/graph(exists)GET /api/v1/lineage/job/{id}(exists)
Dependencies:
- Task 0.1.4 (context management)
Risks + Mitigations:
- Risk: Large lineage graphs unreadable → Mitigation: Collapse distant nodes, expand on click
- Risk: Different notebook environments → Mitigation: Multiple backends (pyvis, graphviz, ASCII)
Effort: M (4-5 days)
Designation: OSS
Task 0.2.3: Investigation History and Replay
Title: Investigation history browser in notebook
Description:
%dataing history should show past investigations with the ability to load and replay them. This enables comparing investigations over time and building institutional knowledge.
Why: Investigations are valuable artifacts. Being able to browse and replay them turns debugging sessions into reusable knowledge.
Acceptance Criteria:
-
%dataing historyshows list of recent investigations - Each entry shows: dataset, goal, status, duration, root cause summary
-
%dataing history --dataset <id>filters to specific dataset -
%dataing history --days <n>filters by recency - Clicking entry loads investigation details
-
%dataing replay <investigation_id>loads investigation into context - Replayed investigation shows all evidence and synthesis
- Compare mode:
%dataing compare <id1> <id2>shows diff - Pagination for large history
Key Design Notes:
- Use existing
GET /api/v1/investigationsendpoint - History widget as selectable list
- Compare mode highlights different hypotheses and findings
Key APIs:
GET /api/v1/investigations(exists)GET /api/v1/investigations/{id}(exists)
Dependencies:
- Task 0.2.1 (rich output infrastructure)
Risks + Mitigations:
- Risk: Large history slows load → Mitigation: Pagination, lazy loading
Effort: S (3 days)
Designation: OSS