A production-ready, scalable Text2SQL system with self-learning capabilities.
| Layer | Technology |
|---|---|
| Frontend | Next.js 14, shadcn/ui, Vercel AI SDK |
| Backend | Python 3.12, FastAPI, Pydantic |
| LLM | Groq API (configurable: GPT-4, Claude, Ollama) |
| Vector DB | ChromaDB |
| Embeddings | nomic-embed-text |
| Database | PostgreSQL (metadata), SQLite (local dev) |
| Queue | Redis (optional, for async jobs) |
┌─────────────────────────────────────────────────────────────────────────────┐
│ FRONTEND (Next.js) │
├─────────────────────────────────────────────────────────────────────────────┤
│ Chat UI │ Project Settings │ Data Source Config │ Human Validation │ KB UI │
└────────────────────────────────┬────────────────────────────────────────────┘
│ REST/WebSocket
┌────────────────────────────────▼────────────────────────────────────────────┐
│ API GATEWAY (FastAPI) │
├─────────────────────────────────────────────────────────────────────────────┤
│ /api/chat │ /api/projects │ /api/datasources │ /api/feedback │ /api/kb │
└────────────────────────────────┬────────────────────────────────────────────┘
│
┌────────────────────────────────▼────────────────────────────────────────────┐
│ ORCHESTRATOR AGENT │
│ Manages conversation flow, plan creation, execution coordination │
└────────┬──────────────────┬──────────────────┬──────────────────┬───────────┘
│ │ │ │
┌────▼────┐ ┌─────▼─────┐ ┌─────▼─────┐ ┌──────▼──────┐
│ PLANNER │ │ SQL │ │ VALIDATOR │ │ LEARNING │
│ AGENT │ │ AGENT │ │ AGENT │ │ AGENT │
└────┬────┘ └─────┬─────┘ └─────┬─────┘ └──────┬──────┘
│ │ │ │
└──────────────────┴──────────────────┴──────────────────┘
│
┌────────────────────────────────▼────────────────────────────────────────────┐
│ SERVICES LAYER │
├──────────────┬──────────────┬──────────────┬──────────────┬─────────────────┤
│ LLM Service │ DataSource │ Schema │ Knowledge │ Embedding │
│ (Groq/etc) │ Service │ Indexer │ Base │ Service │
└──────┬───────┴──────┬───────┴──────┬───────┴──────┬───────┴────────┬────────┘
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ DATA LAYER │
├─────────────────┬─────────────────┬─────────────────┬───────────────────────┤
│ PostgreSQL │ ChromaDB │ User Data │ External DBs │
│ (Metadata) │ (Vectors) │ Sources │ (via Connectors) │
└─────────────────┴─────────────────┴─────────────────┴───────────────────────┘
Single Responsibility: Coordinate conversation flow and agent delegation
- Manages chat context and history
- Routes requests to appropriate agents
- Handles plan confirmation workflow
Single Responsibility: Create execution plans from user queries
- Analyzes user intent
- Creates step-by-step execution plans
- Handles clarification requests
Single Responsibility: Generate and optimize SQL queries
- Retrieves relevant schema context
- Generates SQL from natural language
- Handles query optimization
Single Responsibility: Validate query correctness
- Syntax validation
- Back-translation validation
- Semantic correctness checks
- Routes to human validation when needed
Single Responsibility: Manage knowledge base updates
- Processes feedback
- Updates query patterns
- Manages schema learnings
class DataSourceInterface(ABC):
"""Interface for all data source connectors (ISP - Interface Segregation)"""
@abstractmethod
async def connect(self, config: ConnectionConfig) -> bool: ...
@abstractmethod
async def get_schema(self) -> SchemaInfo: ...
@abstractmethod
async def execute_query(self, query: str) -> QueryResult: ...
@abstractmethod
async def test_connection(self) -> bool: ...- PostgreSQL
- MySQL
- SQLite
- BigQuery
- Snowflake
- CSV/Parquet (via DuckDB)
- REST APIs
- MCP Servers
User Message
│
▼
┌─────────────┐
│ Orchestrator│ ──── Load Context (history, instructions, KB)
└──────┬──────┘
│
▼
┌─────────────┐
│ Planner │ ──── Create Plan
└──────┬──────┘
│
▼
┌─────────────┐
│ User Confirm│ ◄─── "I'll query products table, filter by rating..."
└──────┬──────┘
│ (Confirmed)
▼
┌─────────────┐
│ SQL Agent │ ──── Generate SQL
└──────┬──────┘
│
▼
┌─────────────┐
│ Validator │ ──── Check correctness
└──────┬──────┘
│
▼
┌─────────────┐
│ Execute │ ──── Run query on data source
└──────┬──────┘
│
▼
┌─────────────┐
│ Response │ ──── Return results + feedback buttons
└──────┬──────┘
│
▼
┌─────────────┐
│ Learning │ ──── Process feedback, update KB
└─────────────┘
text2sql/
├── backend/
│ ├── api/ # FastAPI routes
│ │ ├── __init__.py
│ │ ├── chat.py
│ │ ├── projects.py
│ │ ├── datasources.py
│ │ ├── feedback.py
│ │ └── knowledge.py
│ │
│ ├── agents/ # Agent implementations
│ │ ├── __init__.py
│ │ ├── base.py # Base agent interface
│ │ ├── orchestrator.py
│ │ ├── planner.py
│ │ ├── sql_agent.py
│ │ ├── validator.py
│ │ └── learning.py
│ │
│ ├── services/ # Business logic services
│ │ ├── __init__.py
│ │ ├── llm/
│ │ │ ├── __init__.py
│ │ │ ├── base.py # LLM interface
│ │ │ ├── groq.py
│ │ │ ├── openai.py
│ │ │ └── ollama.py
│ │ ├── datasources/
│ │ │ ├── __init__.py
│ │ │ ├── base.py # DataSource interface
│ │ │ ├── postgres.py
│ │ │ ├── mysql.py
│ │ │ ├── sqlite.py
│ │ │ └── registry.py
│ │ ├── embeddings/
│ │ │ ├── __init__.py
│ │ │ ├── base.py
│ │ │ └── nomic.py
│ │ ├── knowledge/
│ │ │ ├── __init__.py
│ │ │ ├── base.py
│ │ │ ├── schema_indexer.py
│ │ │ └── query_patterns.py
│ │ └── vector_store/
│ │ ├── __init__.py
│ │ └── chroma.py
│ │
│ ├── models/ # Pydantic models
│ │ ├── __init__.py
│ │ ├── chat.py
│ │ ├── project.py
│ │ ├── datasource.py
│ │ └── feedback.py
│ │
│ ├── db/ # Database layer
│ │ ├── __init__.py
│ │ ├── database.py
│ │ └── models.py
│ │
│ ├── core/ # Core utilities
│ │ ├── __init__.py
│ │ ├── config.py
│ │ ├── exceptions.py
│ │ └── logging.py
│ │
│ ├── main.py # FastAPI app
│ └── requirements.txt
│
├── frontend/ # Next.js app
│ ├── app/
│ │ ├── layout.tsx
│ │ ├── page.tsx
│ │ ├── chat/
│ │ ├── projects/
│ │ └── settings/
│ ├── components/
│ │ ├── ui/ # shadcn components
│ │ ├── chat/
│ │ ├── datasource/
│ │ └── validation/
│ ├── lib/
│ ├── package.json
│ └── tailwind.config.js
│
├── docker-compose.yml
├── Dockerfile.backend
├── Dockerfile.frontend
└── README.md
- Project structure
- Base interfaces (SOLID)
- Configuration system
- Database models
- DataSource interface
- PostgreSQL, MySQL, SQLite connectors
- Schema extraction
- Connection management
- ChromaDB integration
- Nomic embeddings
- Schema indexer
- Periodic sync
- Base agent interface
- Orchestrator
- Planner (with confirmation)
- SQL Agent
- Validator
- Groq integration
- Model configuration
- Prompt management
- Feedback processing
- Query pattern learning
- Self-improvement pipeline
- Next.js setup
- Chat UI
- Multi-chat support
- Custom instructions
- Human validation UI
- Dependency Injection: All services use constructor injection for testability
- Interface-based Design: Every major component has an interface (ISP)
- Event-driven Learning: Feedback triggers async KB updates
- Streaming Responses: WebSocket for real-time chat
- Modular Connectors: New data sources can be added without core changes