Skip to content

Latest commit

 

History

History
331 lines (280 loc) · 13.3 KB

File metadata and controls

331 lines (280 loc) · 13.3 KB

DataChat Production Architecture

Overview

A production-ready, scalable Text2SQL system with self-learning capabilities.

Tech Stack

Layer Technology
Frontend Next.js 14, shadcn/ui, Vercel AI SDK
Backend Python 3.12, FastAPI, Pydantic
LLM Groq API (configurable: GPT-4, Claude, Ollama)
Vector DB ChromaDB
Embeddings nomic-embed-text
Database PostgreSQL (metadata), SQLite (local dev)
Queue Redis (optional, for async jobs)

System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              FRONTEND (Next.js)                              │
├─────────────────────────────────────────────────────────────────────────────┤
│  Chat UI │ Project Settings │ Data Source Config │ Human Validation │ KB UI │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │ REST/WebSocket
┌────────────────────────────────▼────────────────────────────────────────────┐
│                           API GATEWAY (FastAPI)                              │
├─────────────────────────────────────────────────────────────────────────────┤
│  /api/chat │ /api/projects │ /api/datasources │ /api/feedback │ /api/kb    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
┌────────────────────────────────▼────────────────────────────────────────────┐
│                          ORCHESTRATOR AGENT                                  │
│  Manages conversation flow, plan creation, execution coordination           │
└────────┬──────────────────┬──────────────────┬──────────────────┬───────────┘
         │                  │                  │                  │
    ┌────▼────┐       ┌─────▼─────┐      ┌─────▼─────┐     ┌──────▼──────┐
    │ PLANNER │       │   SQL     │      │ VALIDATOR │     │   LEARNING  │
    │  AGENT  │       │   AGENT   │      │   AGENT   │     │    AGENT    │
    └────┬────┘       └─────┬─────┘      └─────┬─────┘     └──────┬──────┘
         │                  │                  │                  │
         └──────────────────┴──────────────────┴──────────────────┘
                                 │
┌────────────────────────────────▼────────────────────────────────────────────┐
│                           SERVICES LAYER                                     │
├──────────────┬──────────────┬──────────────┬──────────────┬─────────────────┤
│ LLM Service  │ DataSource   │ Schema       │ Knowledge    │ Embedding       │
│ (Groq/etc)   │ Service      │ Indexer      │ Base         │ Service         │
└──────┬───────┴──────┬───────┴──────┬───────┴──────┬───────┴────────┬────────┘
       │              │              │              │                │
       ▼              ▼              ▼              ▼                ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         DATA LAYER                                           │
├─────────────────┬─────────────────┬─────────────────┬───────────────────────┤
│   PostgreSQL    │    ChromaDB     │  User Data      │   External DBs        │
│   (Metadata)    │    (Vectors)    │  Sources        │   (via Connectors)    │
└─────────────────┴─────────────────┴─────────────────┴───────────────────────┘

Agent Responsibilities (SOLID Principles)

1. Orchestrator Agent

Single Responsibility: Coordinate conversation flow and agent delegation

  • Manages chat context and history
  • Routes requests to appropriate agents
  • Handles plan confirmation workflow

2. Planner Agent

Single Responsibility: Create execution plans from user queries

  • Analyzes user intent
  • Creates step-by-step execution plans
  • Handles clarification requests

3. SQL Agent

Single Responsibility: Generate and optimize SQL queries

  • Retrieves relevant schema context
  • Generates SQL from natural language
  • Handles query optimization

4. Validator Agent

Single Responsibility: Validate query correctness

  • Syntax validation
  • Back-translation validation
  • Semantic correctness checks
  • Routes to human validation when needed

5. Learning Agent

Single Responsibility: Manage knowledge base updates

  • Processes feedback
  • Updates query patterns
  • Manages schema learnings

Data Source Interface

class DataSourceInterface(ABC):
    """Interface for all data source connectors (ISP - Interface Segregation)"""

    @abstractmethod
    async def connect(self, config: ConnectionConfig) -> bool: ...

    @abstractmethod
    async def get_schema(self) -> SchemaInfo: ...

    @abstractmethod
    async def execute_query(self, query: str) -> QueryResult: ...

    @abstractmethod
    async def test_connection(self) -> bool: ...

Supported Connectors

  • PostgreSQL
  • MySQL
  • SQLite
  • BigQuery
  • Snowflake
  • CSV/Parquet (via DuckDB)
  • REST APIs
  • MCP Servers

Chat Flow

User Message
     │
     ▼
┌─────────────┐
│ Orchestrator│ ──── Load Context (history, instructions, KB)
└──────┬──────┘
       │
       ▼
┌─────────────┐
│   Planner   │ ──── Create Plan
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ User Confirm│ ◄─── "I'll query products table, filter by rating..."
└──────┬──────┘
       │ (Confirmed)
       ▼
┌─────────────┐
│  SQL Agent  │ ──── Generate SQL
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Validator  │ ──── Check correctness
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Execute    │ ──── Run query on data source
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Response   │ ──── Return results + feedback buttons
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Learning   │ ──── Process feedback, update KB
└─────────────┘

Project Structure

text2sql/
├── backend/
│   ├── api/                    # FastAPI routes
│   │   ├── __init__.py
│   │   ├── chat.py
│   │   ├── projects.py
│   │   ├── datasources.py
│   │   ├── feedback.py
│   │   └── knowledge.py
│   │
│   ├── agents/                 # Agent implementations
│   │   ├── __init__.py
│   │   ├── base.py            # Base agent interface
│   │   ├── orchestrator.py
│   │   ├── planner.py
│   │   ├── sql_agent.py
│   │   ├── validator.py
│   │   └── learning.py
│   │
│   ├── services/              # Business logic services
│   │   ├── __init__.py
│   │   ├── llm/
│   │   │   ├── __init__.py
│   │   │   ├── base.py        # LLM interface
│   │   │   ├── groq.py
│   │   │   ├── openai.py
│   │   │   └── ollama.py
│   │   ├── datasources/
│   │   │   ├── __init__.py
│   │   │   ├── base.py        # DataSource interface
│   │   │   ├── postgres.py
│   │   │   ├── mysql.py
│   │   │   ├── sqlite.py
│   │   │   └── registry.py
│   │   ├── embeddings/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   └── nomic.py
│   │   ├── knowledge/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── schema_indexer.py
│   │   │   └── query_patterns.py
│   │   └── vector_store/
│   │       ├── __init__.py
│   │       └── chroma.py
│   │
│   ├── models/                # Pydantic models
│   │   ├── __init__.py
│   │   ├── chat.py
│   │   ├── project.py
│   │   ├── datasource.py
│   │   └── feedback.py
│   │
│   ├── db/                    # Database layer
│   │   ├── __init__.py
│   │   ├── database.py
│   │   └── models.py
│   │
│   ├── core/                  # Core utilities
│   │   ├── __init__.py
│   │   ├── config.py
│   │   ├── exceptions.py
│   │   └── logging.py
│   │
│   ├── main.py               # FastAPI app
│   └── requirements.txt
│
├── frontend/                  # Next.js app
│   ├── app/
│   │   ├── layout.tsx
│   │   ├── page.tsx
│   │   ├── chat/
│   │   ├── projects/
│   │   └── settings/
│   ├── components/
│   │   ├── ui/               # shadcn components
│   │   ├── chat/
│   │   ├── datasource/
│   │   └── validation/
│   ├── lib/
│   ├── package.json
│   └── tailwind.config.js
│
├── docker-compose.yml
├── Dockerfile.backend
├── Dockerfile.frontend
└── README.md

Phase-wise Implementation

Phase 1: Core Backend Foundation ✓

  • Project structure
  • Base interfaces (SOLID)
  • Configuration system
  • Database models

Phase 2: Data Source Layer

  • DataSource interface
  • PostgreSQL, MySQL, SQLite connectors
  • Schema extraction
  • Connection management

Phase 3: Vector DB & Indexing

  • ChromaDB integration
  • Nomic embeddings
  • Schema indexer
  • Periodic sync

Phase 4: Agent System

  • Base agent interface
  • Orchestrator
  • Planner (with confirmation)
  • SQL Agent
  • Validator

Phase 5: LLM Integration

  • Groq integration
  • Model configuration
  • Prompt management

Phase 6: Knowledge Base

  • Feedback processing
  • Query pattern learning
  • Self-improvement pipeline

Phase 7: Frontend

  • Next.js setup
  • Chat UI
  • Multi-chat support
  • Custom instructions
  • Human validation UI

Key Design Decisions

  1. Dependency Injection: All services use constructor injection for testability
  2. Interface-based Design: Every major component has an interface (ISP)
  3. Event-driven Learning: Feedback triggers async KB updates
  4. Streaming Responses: WebSocket for real-time chat
  5. Modular Connectors: New data sources can be added without core changes