|
| 1 | +# BYK-RAG Module - Copilot Instructions |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +BYK-RAG is a Retrieval-Augmented Generation module for Estonian government digital services (Bürokratt ecosystem). It provides secure, multilingual AI-powered responses by integrating multiple LLM providers, contextual retrieval, and guardrails. |
| 6 | + |
| 7 | +## Build, Test, and Lint Commands |
| 8 | + |
| 9 | +### Environment Setup |
| 10 | +```bash |
| 11 | +# Install Python 3.12.10 and create virtual environment |
| 12 | +uv python install 3.12.10 |
| 13 | +uv sync --frozen |
| 14 | + |
| 15 | +# Install pre-commit hooks |
| 16 | +uv run pre-commit install |
| 17 | +``` |
| 18 | + |
| 19 | +### Running Services |
| 20 | +```bash |
| 21 | +# Always use uv run for Python scripts (whether venv is activated or not) |
| 22 | +uv run python <script.py> |
| 23 | + |
| 24 | +# Start all services with Docker Compose |
| 25 | +docker compose up |
| 26 | + |
| 27 | +# Run FastAPI orchestration service locally |
| 28 | +uv run uvicorn src.llm_orchestration_service_api:app --reload |
| 29 | +``` |
| 30 | + |
| 31 | +### Testing |
| 32 | +```bash |
| 33 | +# Run all tests |
| 34 | +uv run pytest |
| 35 | + |
| 36 | +# Run specific test file |
| 37 | +uv run pytest tests/test_query_validator.py -v |
| 38 | + |
| 39 | +# Run integration tests (requires Docker and secrets) |
| 40 | +uv run pytest tests/integration_tests/ -v --tb=short --log-cli-level=INFO |
| 41 | + |
| 42 | +# Run deepeval tests |
| 43 | +uv run pytest tests/deepeval_tests/standard_tests.py -v --tb=short |
| 44 | +``` |
| 45 | + |
| 46 | +### Linting and Formatting |
| 47 | +```bash |
| 48 | +# Check code formatting (does NOT modify files) |
| 49 | +uv run ruff format --check |
| 50 | + |
| 51 | +# Apply code formatting (SAFE - layout only, no logic changes) |
| 52 | +uv run ruff format |
| 53 | + |
| 54 | +# Check linting issues (manual fixes required) |
| 55 | +uv run ruff check . |
| 56 | + |
| 57 | +# Get explanation for specific lint rule |
| 58 | +uv run ruff rule <rule-code> # e.g., ANN204 |
| 59 | + |
| 60 | +# NEVER use ruff check --fix (can alter logic/control flow) |
| 61 | +``` |
| 62 | + |
| 63 | +### Type Checking |
| 64 | +```bash |
| 65 | +# Run Pyright type checker (runs on src/ only, not tests/) |
| 66 | +uv run pyright |
| 67 | +``` |
| 68 | + |
| 69 | +### Pre-commit Hooks |
| 70 | +```bash |
| 71 | +# Run all pre-commit hooks manually |
| 72 | +uv run pre-commit run --all-files |
| 73 | +``` |
| 74 | + |
| 75 | +## Architecture |
| 76 | + |
| 77 | +### Core Components |
| 78 | + |
| 79 | +1. **LLM Orchestration Service** (`src/llm_orchestration_service.py`) |
| 80 | + - Central business logic for RAG orchestration |
| 81 | + - Coordinates prompt refinement, retrieval, generation, and guardrails |
| 82 | + - Integrates with Langfuse for observability |
| 83 | + |
| 84 | +2. **FastAPI Application** (`src/llm_orchestration_service_api.py`) |
| 85 | + - HTTP API layer exposing `/orchestrate` endpoint |
| 86 | + - Handles streaming responses and rate limiting |
| 87 | + - Request/response validation via Pydantic models |
| 88 | + |
| 89 | +3. **Contextual Retrieval** (`src/contextual_retrieval/`) |
| 90 | + - Implements Anthropic's Contextual Retrieval methodology |
| 91 | + - Hybrid search: Vector (semantic) + BM25 (lexical) with RRF fusion |
| 92 | + - Multi-query expansion (6 refined queries per user query) |
| 93 | + - Qdrant vector database integration |
| 94 | + |
| 95 | +4. **Prompt Refinement** (`src/prompt_refine_manager/`) |
| 96 | + - DSPy-based query expansion |
| 97 | + - Generates 5 refined variations + original query |
| 98 | + |
| 99 | +5. **Response Generation** (`src/response_generator/`) |
| 100 | + - DSPy-based response synthesis |
| 101 | + - Supports streaming via SSE (Server-Sent Events) |
| 102 | + - Uses top-K retrieved chunks (default: 10) |
| 103 | + |
| 104 | +6. **Guardrails** (`src/guardrails/`) |
| 105 | + - NeMo Guardrails integration with DSPy |
| 106 | + - Input guardrails (pre-refinement) and output guardrails (post-generation) |
| 107 | + - Blocks out-of-scope queries and harmful content |
| 108 | + |
| 109 | +7. **LLM Manager** (`src/llm_orchestrator_config/llm_manager.py`) |
| 110 | + - Multi-provider support: AWS Bedrock, Azure OpenAI, Google Cloud, OpenAI, Anthropic |
| 111 | + - HashiCorp Vault integration for secret management |
| 112 | + - RSA-2048 encrypted credentials storage |
| 113 | + |
| 114 | +8. **Vector Indexer** (`src/vector_indexer/`) |
| 115 | + - Qdrant collection management |
| 116 | + - Embedding generation and indexing |
| 117 | + - BM25 index creation |
| 118 | + |
| 119 | +### Supporting Services (Docker Compose) |
| 120 | + |
| 121 | +- **Ruuter** (Public/Private): API gateway and routing |
| 122 | +- **DataMapper**: Data transformation layer |
| 123 | +- **Resql**: PostgreSQL query builder |
| 124 | +- **CronManager**: Scheduled jobs (knowledge base sync) |
| 125 | +- **Qdrant**: Vector database |
| 126 | +- **MinIO**: S3-compatible object storage |
| 127 | +- **HashiCorp Vault**: Secret management |
| 128 | +- **Grafana Loki**: Log aggregation |
| 129 | +- **Langfuse**: LLM observability dashboard |
| 130 | + |
| 131 | +### Key Data Flow |
| 132 | + |
| 133 | +``` |
| 134 | +User Query |
| 135 | + ↓ |
| 136 | +Input Guardrails (NeMo Rails) |
| 137 | + ↓ |
| 138 | +Prompt Refinement (DSPy) → 6 queries |
| 139 | + ↓ |
| 140 | +Parallel Hybrid Search (each query) |
| 141 | + ├─→ Semantic Search (Qdrant, top-40 per query, threshold ≥0.4) |
| 142 | + └─→ BM25 Search (top-40 per query) |
| 143 | + ↓ |
| 144 | +RRF Fusion → Top-K chunks (10 default) |
| 145 | + ↓ |
| 146 | +Response Generation (DSPy) |
| 147 | + ↓ |
| 148 | +Output Guardrails (NeMo Rails) |
| 149 | + ↓ |
| 150 | +Response to User (JSON or SSE stream) |
| 151 | +``` |
| 152 | + |
| 153 | +## Key Conventions |
| 154 | + |
| 155 | +### Dependency Management |
| 156 | + |
| 157 | +- **ALWAYS use `uv add <package>`** to add dependencies (never `pip install`) |
| 158 | +- **ALWAYS commit both `pyproject.toml` AND `uv.lock`** together |
| 159 | +- Use bounded version ranges: `uv add "package>=x.y,<x.(y+1)"` |
| 160 | +- After adding/removing deps: `uv sync --reinstall` |
| 161 | +- **NEVER edit `uv.lock` manually** or use `requirements.txt` |
| 162 | + |
| 163 | +### Python Execution |
| 164 | + |
| 165 | +```bash |
| 166 | +# Correct |
| 167 | +uv run python app.py |
| 168 | +uv run pytest |
| 169 | +uv run pyright |
| 170 | + |
| 171 | +# Wrong (bypasses uv's environment management) |
| 172 | +python3 app.py |
| 173 | +pytest |
| 174 | +``` |
| 175 | + |
| 176 | +### Type Safety |
| 177 | + |
| 178 | +- **Pyright in `standard` mode** (configured in `pyproject.toml`) |
| 179 | +- Type checks enforced by CI, but **NOT on test files** (src/ only) |
| 180 | +- **Runtime validation at system boundaries**: FastAPI endpoints use Pydantic models |
| 181 | +- Prefer type inference over explicit annotations where clear |
| 182 | +- Third-party libraries without stubs treated as `Any` |
| 183 | + |
| 184 | +### Linting Rules (Ruff) |
| 185 | + |
| 186 | +Selected categories (see `pyproject.toml` for full config): |
| 187 | +- **E4, E7, E9**: Pycodestyle errors (imports, indentation, syntax) |
| 188 | +- **F**: Pyflakes (undefined names, unused imports) |
| 189 | +- **B**: Flake8-bugbear (mutable defaults, exception handling) |
| 190 | +- **T20**: Flake8-print (flags `print()` statements) |
| 191 | +- **N**: PEP8-naming conventions |
| 192 | +- **ANN**: Flake8-annotations (type annotation discipline) |
| 193 | +- **ERA**: Eradicate (no commented-out code) |
| 194 | +- **PERF**: Perflint (performance anti-patterns) |
| 195 | + |
| 196 | +**Fixing linting issues:** |
| 197 | +- ALWAYS fix manually (never use `ruff check --fix`) |
| 198 | +- Use `uv run ruff rule <rule-code>` for explanations |
| 199 | +- Autofixes can alter control flow/logic unintentionally |
| 200 | + |
| 201 | +### Formatting (Ruff Formatter) |
| 202 | + |
| 203 | +- Double quotes for strings |
| 204 | +- Spaces for indentation (4 spaces) |
| 205 | +- Respects magic trailing commas |
| 206 | +- Auto-detects line endings (LF/CRLF) |
| 207 | +- Does NOT reformat docstring code blocks |
| 208 | +- `uv run ruff format` is SAFE (layout only, no logic changes) |
| 209 | + |
| 210 | +### DSPy Usage |
| 211 | + |
| 212 | +- Used for prompt refinement (multi-query expansion) and response generation |
| 213 | +- Custom LLM adapters integrate DSPy with NeMo Guardrails |
| 214 | +- Optimization modules under `src/optimization/` for tuning prompts/metrics |
| 215 | +- Models loaded via `optimized_module_loader.py` for compiled DSPy modules |
| 216 | + |
| 217 | +### HashiCorp Vault Integration |
| 218 | + |
| 219 | +- Secrets stored at `secret/users/<user>/<connection_id>/` |
| 220 | +- Each connection has `provider`, `environment`, and provider-specific keys |
| 221 | +- RSA-2048 encryption layer BEFORE Vault storage |
| 222 | +- GUI encrypts with public key; CronManager decrypts with private key |
| 223 | +- Vault unavailable = graceful degradation (fail securely) |
| 224 | + |
| 225 | +### Logging |
| 226 | + |
| 227 | +- **loguru** for application logging |
| 228 | +- Grafana Loki integration for centralized logs |
| 229 | +- Use `logger.info()`, `logger.warning()`, `logger.error()` (NOT `print()`) |
| 230 | +- Loki logger available at `grafana-configs/loki_logger.py` |
| 231 | + |
| 232 | +### Streaming Responses |
| 233 | + |
| 234 | +- Implemented via Server-Sent Events (SSE) in FastAPI |
| 235 | +- `StreamConfig` and `stream_manager` coordinate streaming state |
| 236 | +- `stream_response_native()` in response_generator yields tokens |
| 237 | +- Timeout handling via `stream_timeout` utility |
| 238 | +- Environment-gated: check `STREAMING_ALLOWED_ENVS` |
| 239 | + |
| 240 | +### Configuration Loading |
| 241 | + |
| 242 | +- `PromptConfigurationLoader` fetches prompt configs from Ruuter endpoint |
| 243 | +- Cache TTL: `PROMPT_CONFIG_CACHE_TTL` |
| 244 | +- Custom prompts per user/organization (stored in Vault/database) |
| 245 | +- Fallback to defaults if Ruuter unavailable |
| 246 | + |
| 247 | +### Error Handling |
| 248 | + |
| 249 | +- `generate_error_id()` creates unique error IDs for tracking |
| 250 | +- `log_error_with_context()` for structured error logging |
| 251 | +- Localized error messages via `get_localized_message()` (multilingual support) |
| 252 | +- Predefined message constants in `llm_orchestrator_constants.py` |
| 253 | + |
| 254 | +### Testing Conventions |
| 255 | + |
| 256 | +- Test files under `tests/` (unit, integration, deepeval) |
| 257 | +- Integration tests use `testcontainers` for Docker orchestration |
| 258 | +- Secrets required for integration tests (Azure OpenAI keys, etc.) |
| 259 | +- Mock data in `tests/mocks/` and `tests/data/` |
| 260 | + |
| 261 | +### CI/CD Checks |
| 262 | + |
| 263 | +1. **uv-env-check**: Lockfile vs. pyproject.toml consistency |
| 264 | +2. **pyright-type-check**: Type checking on src/ (strict mode) |
| 265 | +3. **ruff-format-check**: Code formatting compliance |
| 266 | +4. **ruff-lint-check**: Linting standards |
| 267 | +5. **pytest-integration-check**: Full integration tests (requires secrets) |
| 268 | +6. **deepeval-tests**: LLM evaluation metrics |
| 269 | +7. **gitleaks-check**: Secret detection (pre-commit + CI) |
| 270 | + |
| 271 | +### Pre-commit Hooks |
| 272 | + |
| 273 | +Configured in `.pre-commit-config.yaml`: |
| 274 | +- **gitleaks**: Secret scanning |
| 275 | +- **uv-lock**: Ensures lockfile consistency |
| 276 | + |
| 277 | +### Constants and Thresholds |
| 278 | + |
| 279 | +Key retrieval constants (`src/vector_indexer/constants.py` and contextual retrieval): |
| 280 | +- **Semantic search top-K**: 40 per query |
| 281 | +- **Semantic threshold**: 0.4 (cosine similarity ≥0.4 = 50-60% alignment) |
| 282 | +- **BM25 top-K**: 40 per query |
| 283 | +- **Response generation top-K**: 10 chunks (after RRF fusion) |
| 284 | +- **Query refinement count**: 5 variations + original = 6 total |
| 285 | +- **Search timeout**: 2 seconds per query |
| 286 | + |
| 287 | +### Docker and Services |
| 288 | + |
| 289 | +- Use `docker compose` (not `docker-compose`) |
| 290 | +- Services communicate via `bykstack` network |
| 291 | +- Shared volumes: `shared-volume`, `cron_data` |
| 292 | +- Vault agent containers per service (llm, gui, cron) |
| 293 | +- Resource limits: CPU and memory constraints defined in docker-compose.yml |
| 294 | + |
| 295 | +## Important Notes |
| 296 | + |
| 297 | +- **Python version pinned to 3.12.10** (see `pyproject.toml` and `.python-version`) |
| 298 | +- **Line length: 88** (Black-compatible, enforced by Ruff) |
| 299 | +- **No print() statements** in production code (use loguru logger) |
| 300 | +- **Pydantic for runtime validation** at API boundaries (FastAPI endpoints) |
| 301 | +- **Langfuse tracing** for observability (public/secret keys from Vault) |
| 302 | +- **Rate limiting** via `RateLimiter` utility (token and request budgets) |
| 303 | +- **Cost tracking** via `calculate_total_costs()` and budget tracker |
| 304 | +- **Language detection** for multilingual support (Estonian primary) |
0 commit comments