DataChat Production Architecture

Overview

A production-ready, scalable Text2SQL system with self-learning capabilities.

Tech Stack

Layer	Technology
Frontend	Next.js 14, shadcn/ui, Vercel AI SDK
Backend	Python 3.12, FastAPI, Pydantic
LLM	Groq API (configurable: GPT-4, Claude, Ollama)
Vector DB	ChromaDB
Embeddings	nomic-embed-text
Database	PostgreSQL (metadata), SQLite (local dev)
Queue	Redis (optional, for async jobs)

System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              FRONTEND (Next.js)                              │
├─────────────────────────────────────────────────────────────────────────────┤
│  Chat UI │ Project Settings │ Data Source Config │ Human Validation │ KB UI │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │ REST/WebSocket
┌────────────────────────────────▼────────────────────────────────────────────┐
│                           API GATEWAY (FastAPI)                              │
├─────────────────────────────────────────────────────────────────────────────┤
│  /api/chat │ /api/projects │ /api/datasources │ /api/feedback │ /api/kb    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
┌────────────────────────────────▼────────────────────────────────────────────┐
│                          ORCHESTRATOR AGENT                                  │
│  Manages conversation flow, plan creation, execution coordination           │
└────────┬──────────────────┬──────────────────┬──────────────────┬───────────┘
         │                  │                  │                  │
    ┌────▼────┐       ┌─────▼─────┐      ┌─────▼─────┐     ┌──────▼──────┐
    │ PLANNER │       │   SQL     │      │ VALIDATOR │     │   LEARNING  │
    │  AGENT  │       │   AGENT   │      │   AGENT   │     │    AGENT    │
    └────┬────┘       └─────┬─────┘      └─────┬─────┘     └──────┬──────┘
         │                  │                  │                  │
         └──────────────────┴──────────────────┴──────────────────┘
                                 │
┌────────────────────────────────▼────────────────────────────────────────────┐
│                           SERVICES LAYER                                     │
├──────────────┬──────────────┬──────────────┬──────────────┬─────────────────┤
│ LLM Service  │ DataSource   │ Schema       │ Knowledge    │ Embedding       │
│ (Groq/etc)   │ Service      │ Indexer      │ Base         │ Service         │
└──────┬───────┴──────┬───────┴──────┬───────┴──────┬───────┴────────┬────────┘
       │              │              │              │                │
       ▼              ▼              ▼              ▼                ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         DATA LAYER                                           │
├─────────────────┬─────────────────┬─────────────────┬───────────────────────┤
│   PostgreSQL    │    ChromaDB     │  User Data      │   External DBs        │
│   (Metadata)    │    (Vectors)    │  Sources        │   (via Connectors)    │
└─────────────────┴─────────────────┴─────────────────┴───────────────────────┘

Agent Responsibilities (SOLID Principles)

1. Orchestrator Agent

Single Responsibility: Coordinate conversation flow and agent delegation

Manages chat context and history
Routes requests to appropriate agents
Handles plan confirmation workflow

2. Planner Agent

Single Responsibility: Create execution plans from user queries

Analyzes user intent
Creates step-by-step execution plans
Handles clarification requests

3. SQL Agent

Single Responsibility: Generate and optimize SQL queries

Retrieves relevant schema context
Generates SQL from natural language
Handles query optimization

4. Validator Agent

Single Responsibility: Validate query correctness

Syntax validation
Back-translation validation
Semantic correctness checks
Routes to human validation when needed

5. Learning Agent

Single Responsibility: Manage knowledge base updates

Processes feedback
Updates query patterns
Manages schema learnings

Data Source Interface

class DataSourceInterface(ABC):
    """Interface for all data source connectors (ISP - Interface Segregation)"""

    @abstractmethod
    async def connect(self, config: ConnectionConfig) -> bool: ...

    @abstractmethod
    async def get_schema(self) -> SchemaInfo: ...

    @abstractmethod
    async def execute_query(self, query: str) -> QueryResult: ...

    @abstractmethod
    async def test_connection(self) -> bool: ...

Supported Connectors

PostgreSQL
MySQL
SQLite
BigQuery
Snowflake
CSV/Parquet (via DuckDB)
REST APIs
MCP Servers

Chat Flow

User Message
     │
     ▼
┌─────────────┐
│ Orchestrator│ ──── Load Context (history, instructions, KB)
└──────┬──────┘
       │
       ▼
┌─────────────┐
│   Planner   │ ──── Create Plan
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ User Confirm│ ◄─── "I'll query products table, filter by rating..."
└──────┬──────┘
       │ (Confirmed)
       ▼
┌─────────────┐
│  SQL Agent  │ ──── Generate SQL
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Validator  │ ──── Check correctness
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Execute    │ ──── Run query on data source
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Response   │ ──── Return results + feedback buttons
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Learning   │ ──── Process feedback, update KB
└─────────────┘

Project Structure

text2sql/
├── backend/
│   ├── api/                    # FastAPI routes
│   │   ├── __init__.py
│   │   ├── chat.py
│   │   ├── projects.py
│   │   ├── datasources.py
│   │   ├── feedback.py
│   │   └── knowledge.py
│   │
│   ├── agents/                 # Agent implementations
│   │   ├── __init__.py
│   │   ├── base.py            # Base agent interface
│   │   ├── orchestrator.py
│   │   ├── planner.py
│   │   ├── sql_agent.py
│   │   ├── validator.py
│   │   └── learning.py
│   │
│   ├── services/              # Business logic services
│   │   ├── __init__.py
│   │   ├── llm/
│   │   │   ├── __init__.py
│   │   │   ├── base.py        # LLM interface
│   │   │   ├── groq.py
│   │   │   ├── openai.py
│   │   │   └── ollama.py
│   │   ├── datasources/
│   │   │   ├── __init__.py
│   │   │   ├── base.py        # DataSource interface
│   │   │   ├── postgres.py
│   │   │   ├── mysql.py
│   │   │   ├── sqlite.py
│   │   │   └── registry.py
│   │   ├── embeddings/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   └── nomic.py
│   │   ├── knowledge/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── schema_indexer.py
│   │   │   └── query_patterns.py
│   │   └── vector_store/
│   │       ├── __init__.py
│   │       └── chroma.py
│   │
│   ├── models/                # Pydantic models
│   │   ├── __init__.py
│   │   ├── chat.py
│   │   ├── project.py
│   │   ├── datasource.py
│   │   └── feedback.py
│   │
│   ├── db/                    # Database layer
│   │   ├── __init__.py
│   │   ├── database.py
│   │   └── models.py
│   │
│   ├── core/                  # Core utilities
│   │   ├── __init__.py
│   │   ├── config.py
│   │   ├── exceptions.py
│   │   └── logging.py
│   │
│   ├── main.py               # FastAPI app
│   └── requirements.txt
│
├── frontend/                  # Next.js app
│   ├── app/
│   │   ├── layout.tsx
│   │   ├── page.tsx
│   │   ├── chat/
│   │   ├── projects/
│   │   └── settings/
│   ├── components/
│   │   ├── ui/               # shadcn components
│   │   ├── chat/
│   │   ├── datasource/
│   │   └── validation/
│   ├── lib/
│   ├── package.json
│   └── tailwind.config.js
│
├── docker-compose.yml
├── Dockerfile.backend
├── Dockerfile.frontend
└── README.md

Phase-wise Implementation

Phase 1: Core Backend Foundation ✓

Project structure
Base interfaces (SOLID)
Configuration system
Database models

Phase 2: Data Source Layer

DataSource interface
PostgreSQL, MySQL, SQLite connectors
Schema extraction
Connection management

Phase 3: Vector DB & Indexing

ChromaDB integration
Nomic embeddings
Schema indexer
Periodic sync

Phase 4: Agent System

Base agent interface
Orchestrator
Planner (with confirmation)
SQL Agent
Validator

Phase 5: LLM Integration

Groq integration
Model configuration
Prompt management

Phase 6: Knowledge Base

Feedback processing
Query pattern learning
Self-improvement pipeline

Phase 7: Frontend

Next.js setup
Chat UI
Multi-chat support
Custom instructions
Human validation UI

Key Design Decisions

Dependency Injection: All services use constructor injection for testability
Interface-based Design: Every major component has an interface (ISP)
Event-driven Learning: Feedback triggers async KB updates
Streaming Responses: WebSocket for real-time chat
Modular Connectors: New data sources can be added without core changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataChat Production Architecture

Overview

Tech Stack

System Architecture

Agent Responsibilities (SOLID Principles)

1. Orchestrator Agent

2. Planner Agent

3. SQL Agent

4. Validator Agent

5. Learning Agent

Data Source Interface

Supported Connectors

Chat Flow

Project Structure

Phase-wise Implementation

Phase 1: Core Backend Foundation ✓

Phase 2: Data Source Layer

Phase 3: Vector DB & Indexing

Phase 4: Agent System

Phase 5: LLM Integration

Phase 6: Knowledge Base

Phase 7: Frontend

Key Design Decisions

FilesExpand file tree

PRODUCTION_ARCHITECTURE.md

Latest commit

History

PRODUCTION_ARCHITECTURE.md

File metadata and controls

DataChat Production Architecture

Overview

Tech Stack

System Architecture

Agent Responsibilities (SOLID Principles)

1. Orchestrator Agent

2. Planner Agent

3. SQL Agent

4. Validator Agent

5. Learning Agent

Data Source Interface

Supported Connectors

Chat Flow

Project Structure

Phase-wise Implementation

Phase 1: Core Backend Foundation ✓

Phase 2: Data Source Layer

Phase 3: Vector DB & Indexing

Phase 4: Agent System

Phase 5: LLM Integration

Phase 6: Knowledge Base

Phase 7: Frontend

Key Design Decisions