bordumb · bordumb · Feb 2, 2026 · Feb 2, 2026 · Feb 2, 2026 · Feb 2, 2026
@@ -0,0 +1,62 @@
+# Git
+.git
+.gitignore
+
+# Python virtual environments (including nested)
+**/.venv
+**/venv
+**/*.egg-info
+**/dist
+**/build
+
+# Python cache (including nested)
+**/__pycache__
+**/*.pyc
+**/*.pyo
+**/.pytest_cache
+**/.mypy_cache
+**/.ruff_cache
+**/.coverage
+**/htmlcov
+
+# Node (including nested)
+**/node_modules
+**/.npm
+**/.pnpm-store
+
+# IDE
+.idea
+.vscode
+*.swp
+*.swo
+
+# Local config
+.env
+.env.*
+!.env.example
+
+# Flow tracking
+.flow
+
+# Demo fixtures (large)
+demo/fixtures
+
+# Documentation build
+site
+docs/_build
+
+# Test artifacts
+.hypothesis
+
+# OS files
+.DS_Store
+Thumbs.db
+
+# Logs
+*.log
+logs
+
+# Temporary
+tmp
+temp
+*.tmp
@@ -0,0 +1,13 @@
+{
+  "branch_name": "fn-56",
+  "created_at": "2026-02-02T22:00:58.309446Z",
+  "depends_on_epics": [],
+  "id": "fn-56",
+  "next_task": 1,
+  "plan_review_status": "unknown",
+  "plan_reviewed_at": null,
+  "spec_path": ".flow/specs/fn-56.md",
+  "status": "open",
+  "title": "Self-Debugging Chat Widget (Dogfooding)",
+  "updated_at": "2026-02-02T22:43:29.882512Z"
+}
@@ -0,0 +1,278 @@
+# Dataing Assistant (fn-56)
+
+A unified AI assistant for Dataing that handles infrastructure debugging, data questions, and investigation support.
+
+## Overview
+
+**Problem**: Users need help with various Dataing tasks - debugging infrastructure issues, understanding data quality problems, querying connected datasources, and getting context on investigations. Currently they must use external tools or ask for human help.
+
+**Solution**: Persistent chat widget ("Dataing Assistant") that provides a unified AI assistant with access to:
+- Local files, configs, and git history
+- Docker container status and logs
+- Connected datasources (reusing existing query tools)
+- Investigation context and findings
+- User's recent activity for contextual suggestions
+
+## Key Decisions (from interview)
+
+### Agent Configuration
+- **LLM Model**: Claude Sonnet (fast, cost-effective)
+- **Response time target**: First token under 3 seconds
+- **Agent focus**: Balanced - explain root cause AND provide fix steps with code snippets
+- **Out-of-scope handling**: Polite decline, redirect to docs
+- **Tone**: Match existing Dataing UI voice
+
+### Tools & Capabilities (Priority Order)
+
+1. **File Access**
+   - Read any UTF-8 text file in allowlisted directories
+   - Smart chunking: request specific line ranges
+   - Grep-like search across files (max 100 results)
+   - Include logs, data samples (CSV/parquet first N rows)
+   - Centralized parsers in `core/parsing/` organized by file type
+
+2. **Git Access**
+   - Full read access via githunter tools
+   - blame_line, find_pr_discussion, get_file_experts
+   - Recent commits, branches, diffs
+
+3. **Docker Access**
+   - Container status via Docker API
+   - Log reading via pluggable LogProvider interface
+   - Auth: Configurable per deployment (socket, TCP+TLS, env auto-detect)
+
+4. **Log Providers** (pluggable interface)
+   - LocalFileLogProvider
+   - DockerLogProvider
+   - CloudWatchLogProvider (IAM role auth)
+
+5. **Datasource Access**
+   - Reuse existing query tools from investigation agents
+   - Full read access to connected datasources
+   - Unified tool registry for all capabilities
+
+6. **Environment Access**
+   - Read non-sensitive env vars (filter *SECRET*, *KEY*, *PASSWORD*, *TOKEN*)
+   - Compare current config with .env.example defaults
+
+### Security
+
+- **Path canonicalization** before allowlist check (prevent traversal)
+- **Blocked patterns**: `.env`, `*.pem`, `*.key`, `*secret*`, `*credential*`
+- **Security-blocked errors**: Suggest alternatives ("Can't read .env, but can check .env.example")
+- **Security findings**: Alert immediately if exposed secrets discovered
+- **Audit log**: Full log of every file read, search, and tool call
+- **Tool indicators**: Show detailed progress ("Reading docker-compose.yml...")
+
+### Data Model
+
+**Debug chats are investigations** with parent/child relationships:
+- Each chat session gets its own `investigation_id`
+- Can be linked to existing investigations as parent OR child
+- Child chats have full access to parent investigation context
+- DebugChatSession model with FK to Investigation when linked
+
+**Storage**: Hybrid Redis/Postgres
+- Recent sessions in Redis for fast access
+- Old sessions archived to Postgres
+- Retention: Configurable per tenant
+
+**Schema migration**: Add to existing migrations (013_dataing_assistant.sql)
+
+### User Experience
+
+- **Visibility**: All authenticated users (no restriction)
+- **Widget position**: Fixed bottom-20 right-4 (above DemoToggle)
+- **Panel width**: Resizable, remembers size per-user preference
+- **Keyboard shortcut**: None for MVP
+- **Markdown**: Full rendering (headers, lists, code blocks, links, tables)
+
+**Chat behavior**:
+- Smart placeholder text with example questions
+- Permanent history with session list (new sessions start fresh, can reopen old)
+- Minimize to button (badge shows unread), preserves state
+- Collapsible sections for long responses
+- Copy code button always visible on code blocks
+- Edit and resubmit previous messages
+
+**Streaming & errors**:
+- Queue messages if user sends while response streaming
+- Auto-retry 3x on errors before showing error
+- Offline: Retry with exponential backoff + "Reconnecting..." indicator
+
+### Concurrency & Limits
+
+- **Message queueing**: Complete current response, then process next
+- **Context limit**: Token-based, summarize when approaching model limit
+- **Rate limiting**: Admin-set token budget per tenant
+- **Limit exceeded**: Soft block with override for urgent issues
+- **Usage display**: Always visible ("X of Y tokens used this month")
+
+### Context & Memory
+
+- **User context**: Full access to recent investigations, alerts, queries
+- **Memory integration**: User confirms "This was helpful" to save to agent memory (fn-55)
+- **Multi-tenancy**: Tenant isolation - each tenant gets isolated agent instance
+
+### Export
+
+- **Formats**: Both JSON and Markdown export
+- **Sharing**: No sharing for MVP (export and send manually)
+
+### Testing & Telemetry
+
+- **Testing**: Unit tests with mocked LLM
+- **Dry run**: No special mode, use real APIs in test environment
+- **Telemetry**: Full integration with existing Dataing telemetry
+- **Metrics**: Defer to later (analyze datasets first)
+- **Analytics**: No query tracking (privacy-first)
+
+## Architecture
+
+### Backend Components
+
+```
+dataing/
+  agents/
+    assistant.py              # DataingAssistant (was SelfDebugAgent)
+    tools/
+      registry.py             # Unified tool registry
+      local_files.py          # File reading with safety
+      docker.py               # Docker API access
+      log_providers/
+        __init__.py           # LogProvider protocol
+        local.py              # LocalFileLogProvider
+        docker.py             # DockerLogProvider
+        cloudwatch.py         # CloudWatchLogProvider
+  core/
+    parsing/                  # Centralized file parsers
+      yaml_parser.py
+      json_parser.py
+      text_parser.py
+      log_parser.py
+      data_parser.py          # CSV, parquet sampling
+  entrypoints/api/routes/
+    assistant.py              # API routes (was debug_chat.py)
+  models/
+    assistant.py              # DebugChatSession, DebugChatMessage
+```
+
+### Frontend Components
+
+```
+features/assistant/
+  index.ts
+  AssistantWidget.tsx         # Floating button + resizable panel
+  AssistantPanel.tsx          # Chat interface
+  AssistantMessage.tsx        # Message with collapsible sections
+  useAssistant.ts             # State management hook
+  SessionList.tsx             # Previous session selector
+```
+
+### Database Schema
+
+```sql
+-- 013_dataing_assistant.sql
+
+CREATE TABLE assistant_sessions (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    investigation_id UUID NOT NULL,  -- Each session IS an investigation
+    tenant_id UUID NOT NULL,
+    user_id UUID NOT NULL,
+    parent_investigation_id UUID REFERENCES investigations(id),
+    is_parent BOOLEAN DEFAULT false,
+    created_at TIMESTAMPTZ DEFAULT NOW(),
+    last_activity TIMESTAMPTZ DEFAULT NOW(),
+    token_count INTEGER DEFAULT 0,
+    metadata JSONB
+);
+
+CREATE TABLE assistant_messages (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    session_id UUID REFERENCES assistant_sessions(id),
+    role TEXT NOT NULL,  -- 'user', 'assistant', 'system', 'tool'
+    content TEXT NOT NULL,
+    tool_calls JSONB,    -- For tool execution tracking
+    created_at TIMESTAMPTZ DEFAULT NOW(),
+    token_count INTEGER
+);
+
+CREATE TABLE assistant_audit_log (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    session_id UUID REFERENCES assistant_sessions(id),
+    action TEXT NOT NULL,  -- 'file_read', 'search', 'query', 'docker_status'
+    target TEXT NOT NULL,  -- File path, query, etc.
+    result_summary TEXT,
+    created_at TIMESTAMPTZ DEFAULT NOW()
+);
+
+CREATE INDEX idx_assistant_sessions_tenant ON assistant_sessions(tenant_id);
+CREATE INDEX idx_assistant_sessions_user ON assistant_sessions(user_id);
+CREATE INDEX idx_assistant_messages_session ON assistant_messages(session_id);
+```
+
+## Quick Commands
+
+```bash
+# Run backend
+just dev-backend
+
+# Run frontend
+just dev-frontend
+
+# Run tests
+uv run pytest python-packages/dataing/tests/unit/agents/test_assistant.py -v
+
+# Generate OpenAPI client
+just generate-client
+
+# Run migrations
+just migrate
+```
+
+## Acceptance Criteria
+
+- [ ] Assistant widget visible on all authenticated pages
+- [ ] Resizable panel that remembers size per-user
+- [ ] Full markdown rendering with syntax-highlighted code blocks
+- [ ] Copy code button on all code blocks
+- [ ] Agent streams response in real-time with tool progress indicators
+- [ ] Can read files from allowlisted directories with smart chunking
+- [ ] Can search across files (grep-like) with result limits
+- [ ] Can access git history via githunter tools
+- [ ] Can check Docker container status via API
+- [ ] Can read logs via pluggable LogProvider interface
+- [ ] Can query connected datasources (reuses existing tools)
+- [ ] Has full context of user's recent activity
+- [ ] Sessions persist permanently with session history browser
+- [ ] Parent/child investigation linking works
+- [ ] Path traversal attempts rejected with helpful alternatives
+- [ ] Security findings alert user immediately
+- [ ] Full audit log of tool usage
+- [ ] Token-based usage tracking with admin-set budgets
+- [ ] Soft block on limit exceeded with override option
+- [ ] Auto-retry 3x on errors
+- [ ] "This was helpful" saves to agent memory
+- [ ] Export to JSON and Markdown works
+
+## Tasks (Updated)
+
+1. **Create unified tool registry** - Central registry for all assistant tools
+2. **Create centralized file parsers** - core/parsing/ module by file type
+3. **Create DataingAssistant agent** - Main agent with unified tools
+4. **Create log provider interface + implementations** - Pluggable log access
+5. **Create Docker status tool** - Container status via Docker API
+6. **Create assistant API routes** - Sessions, messages, streaming
+7. **Create database migration** - 013_dataing_assistant.sql
+8. **Create frontend AssistantWidget** - Resizable floating panel
+9. **Create frontend AssistantPanel** - Chat UI with all features
+10. **Integrate with existing query tools** - Datasource access
+11. **Add investigation linking** - Parent/child relationships
+12. **Add memory integration** - "This was helpful" feedback
+
+## References
+
+- Existing patterns: `agents/client.py`, `routes/investigations.py`
+- Bond-agent tools: `/Users/bordumb/workspace/repositories/bond-agent/src/bond/tools/`
+- SSE-starlette: https://pypi.org/project/sse-starlette/
+- shadcn/ui Sheet: https://ui.shadcn.com/docs/components/sheet
@@ -0,0 +1,28 @@
+{
+  "assignee": "bordumbb@gmail.com",
+  "claim_note": "",
+  "claimed_at": "2026-02-02T23:38:26.263668Z",
+  "created_at": "2026-02-02T22:01:48.610812Z",
+  "depends_on": [
+    "fn-56.7",
+    "fn-56.2",
+    "fn-56.9",
+    "fn-56.10"
+  ],
+  "epic": "fn-56",
+  "evidence": {
+    "files_created": [
+      "python-packages/dataing/src/dataing/agents/assistant.py",
+      "python-packages/dataing/tests/unit/agents/test_assistant.py"
+    ],
+    "pre_commit_passed": true,
+    "tests_failed": 0,
+    "tests_passed": 22
+  },
+  "id": "fn-56.1",
+  "priority": null,
+  "spec_path": ".flow/tasks/fn-56.1.md",
+  "status": "done",
+  "title": "Create DataingAssistant agent (agents/assistant.py)",
+  "updated_at": "2026-02-02T23:41:27.015915Z"
+}