Skip to content

Commit 42f688a

Browse files
authored
Merge pull request #120 from buerokratt/wip
Sync wip branches
2 parents 25e33b9 + 05f0f94 commit 42f688a

26 files changed

+2962
-674
lines changed

.github/copilot-instructions.md

Lines changed: 304 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,304 @@
1+
# BYK-RAG Module - Copilot Instructions
2+
3+
## Project Overview
4+
5+
BYK-RAG is a Retrieval-Augmented Generation module for Estonian government digital services (Bürokratt ecosystem). It provides secure, multilingual AI-powered responses by integrating multiple LLM providers, contextual retrieval, and guardrails.
6+
7+
## Build, Test, and Lint Commands
8+
9+
### Environment Setup
10+
```bash
11+
# Install Python 3.12.10 and create virtual environment
12+
uv python install 3.12.10
13+
uv sync --frozen
14+
15+
# Install pre-commit hooks
16+
uv run pre-commit install
17+
```
18+
19+
### Running Services
20+
```bash
21+
# Always use uv run for Python scripts (whether venv is activated or not)
22+
uv run python <script.py>
23+
24+
# Start all services with Docker Compose
25+
docker compose up
26+
27+
# Run FastAPI orchestration service locally
28+
uv run uvicorn src.llm_orchestration_service_api:app --reload
29+
```
30+
31+
### Testing
32+
```bash
33+
# Run all tests
34+
uv run pytest
35+
36+
# Run specific test file
37+
uv run pytest tests/test_query_validator.py -v
38+
39+
# Run integration tests (requires Docker and secrets)
40+
uv run pytest tests/integration_tests/ -v --tb=short --log-cli-level=INFO
41+
42+
# Run deepeval tests
43+
uv run pytest tests/deepeval_tests/standard_tests.py -v --tb=short
44+
```
45+
46+
### Linting and Formatting
47+
```bash
48+
# Check code formatting (does NOT modify files)
49+
uv run ruff format --check
50+
51+
# Apply code formatting (SAFE - layout only, no logic changes)
52+
uv run ruff format
53+
54+
# Check linting issues (manual fixes required)
55+
uv run ruff check .
56+
57+
# Get explanation for specific lint rule
58+
uv run ruff rule <rule-code> # e.g., ANN204
59+
60+
# NEVER use ruff check --fix (can alter logic/control flow)
61+
```
62+
63+
### Type Checking
64+
```bash
65+
# Run Pyright type checker (runs on src/ only, not tests/)
66+
uv run pyright
67+
```
68+
69+
### Pre-commit Hooks
70+
```bash
71+
# Run all pre-commit hooks manually
72+
uv run pre-commit run --all-files
73+
```
74+
75+
## Architecture
76+
77+
### Core Components
78+
79+
1. **LLM Orchestration Service** (`src/llm_orchestration_service.py`)
80+
- Central business logic for RAG orchestration
81+
- Coordinates prompt refinement, retrieval, generation, and guardrails
82+
- Integrates with Langfuse for observability
83+
84+
2. **FastAPI Application** (`src/llm_orchestration_service_api.py`)
85+
- HTTP API layer exposing `/orchestrate` endpoint
86+
- Handles streaming responses and rate limiting
87+
- Request/response validation via Pydantic models
88+
89+
3. **Contextual Retrieval** (`src/contextual_retrieval/`)
90+
- Implements Anthropic's Contextual Retrieval methodology
91+
- Hybrid search: Vector (semantic) + BM25 (lexical) with RRF fusion
92+
- Multi-query expansion (6 refined queries per user query)
93+
- Qdrant vector database integration
94+
95+
4. **Prompt Refinement** (`src/prompt_refine_manager/`)
96+
- DSPy-based query expansion
97+
- Generates 5 refined variations + original query
98+
99+
5. **Response Generation** (`src/response_generator/`)
100+
- DSPy-based response synthesis
101+
- Supports streaming via SSE (Server-Sent Events)
102+
- Uses top-K retrieved chunks (default: 10)
103+
104+
6. **Guardrails** (`src/guardrails/`)
105+
- NeMo Guardrails integration with DSPy
106+
- Input guardrails (pre-refinement) and output guardrails (post-generation)
107+
- Blocks out-of-scope queries and harmful content
108+
109+
7. **LLM Manager** (`src/llm_orchestrator_config/llm_manager.py`)
110+
- Multi-provider support: AWS Bedrock, Azure OpenAI, Google Cloud, OpenAI, Anthropic
111+
- HashiCorp Vault integration for secret management
112+
- RSA-2048 encrypted credentials storage
113+
114+
8. **Vector Indexer** (`src/vector_indexer/`)
115+
- Qdrant collection management
116+
- Embedding generation and indexing
117+
- BM25 index creation
118+
119+
### Supporting Services (Docker Compose)
120+
121+
- **Ruuter** (Public/Private): API gateway and routing
122+
- **DataMapper**: Data transformation layer
123+
- **Resql**: PostgreSQL query builder
124+
- **CronManager**: Scheduled jobs (knowledge base sync)
125+
- **Qdrant**: Vector database
126+
- **MinIO**: S3-compatible object storage
127+
- **HashiCorp Vault**: Secret management
128+
- **Grafana Loki**: Log aggregation
129+
- **Langfuse**: LLM observability dashboard
130+
131+
### Key Data Flow
132+
133+
```
134+
User Query
135+
136+
Input Guardrails (NeMo Rails)
137+
138+
Prompt Refinement (DSPy) → 6 queries
139+
140+
Parallel Hybrid Search (each query)
141+
├─→ Semantic Search (Qdrant, top-40 per query, threshold ≥0.4)
142+
└─→ BM25 Search (top-40 per query)
143+
144+
RRF Fusion → Top-K chunks (10 default)
145+
146+
Response Generation (DSPy)
147+
148+
Output Guardrails (NeMo Rails)
149+
150+
Response to User (JSON or SSE stream)
151+
```
152+
153+
## Key Conventions
154+
155+
### Dependency Management
156+
157+
- **ALWAYS use `uv add <package>`** to add dependencies (never `pip install`)
158+
- **ALWAYS commit both `pyproject.toml` AND `uv.lock`** together
159+
- Use bounded version ranges: `uv add "package>=x.y,<x.(y+1)"`
160+
- After adding/removing deps: `uv sync --reinstall`
161+
- **NEVER edit `uv.lock` manually** or use `requirements.txt`
162+
163+
### Python Execution
164+
165+
```bash
166+
# Correct
167+
uv run python app.py
168+
uv run pytest
169+
uv run pyright
170+
171+
# Wrong (bypasses uv's environment management)
172+
python3 app.py
173+
pytest
174+
```
175+
176+
### Type Safety
177+
178+
- **Pyright in `standard` mode** (configured in `pyproject.toml`)
179+
- Type checks enforced by CI, but **NOT on test files** (src/ only)
180+
- **Runtime validation at system boundaries**: FastAPI endpoints use Pydantic models
181+
- Prefer type inference over explicit annotations where clear
182+
- Third-party libraries without stubs treated as `Any`
183+
184+
### Linting Rules (Ruff)
185+
186+
Selected categories (see `pyproject.toml` for full config):
187+
- **E4, E7, E9**: Pycodestyle errors (imports, indentation, syntax)
188+
- **F**: Pyflakes (undefined names, unused imports)
189+
- **B**: Flake8-bugbear (mutable defaults, exception handling)
190+
- **T20**: Flake8-print (flags `print()` statements)
191+
- **N**: PEP8-naming conventions
192+
- **ANN**: Flake8-annotations (type annotation discipline)
193+
- **ERA**: Eradicate (no commented-out code)
194+
- **PERF**: Perflint (performance anti-patterns)
195+
196+
**Fixing linting issues:**
197+
- ALWAYS fix manually (never use `ruff check --fix`)
198+
- Use `uv run ruff rule <rule-code>` for explanations
199+
- Autofixes can alter control flow/logic unintentionally
200+
201+
### Formatting (Ruff Formatter)
202+
203+
- Double quotes for strings
204+
- Spaces for indentation (4 spaces)
205+
- Respects magic trailing commas
206+
- Auto-detects line endings (LF/CRLF)
207+
- Does NOT reformat docstring code blocks
208+
- `uv run ruff format` is SAFE (layout only, no logic changes)
209+
210+
### DSPy Usage
211+
212+
- Used for prompt refinement (multi-query expansion) and response generation
213+
- Custom LLM adapters integrate DSPy with NeMo Guardrails
214+
- Optimization modules under `src/optimization/` for tuning prompts/metrics
215+
- Models loaded via `optimized_module_loader.py` for compiled DSPy modules
216+
217+
### HashiCorp Vault Integration
218+
219+
- Secrets stored at `secret/users/<user>/<connection_id>/`
220+
- Each connection has `provider`, `environment`, and provider-specific keys
221+
- RSA-2048 encryption layer BEFORE Vault storage
222+
- GUI encrypts with public key; CronManager decrypts with private key
223+
- Vault unavailable = graceful degradation (fail securely)
224+
225+
### Logging
226+
227+
- **loguru** for application logging
228+
- Grafana Loki integration for centralized logs
229+
- Use `logger.info()`, `logger.warning()`, `logger.error()` (NOT `print()`)
230+
- Loki logger available at `grafana-configs/loki_logger.py`
231+
232+
### Streaming Responses
233+
234+
- Implemented via Server-Sent Events (SSE) in FastAPI
235+
- `StreamConfig` and `stream_manager` coordinate streaming state
236+
- `stream_response_native()` in response_generator yields tokens
237+
- Timeout handling via `stream_timeout` utility
238+
- Environment-gated: check `STREAMING_ALLOWED_ENVS`
239+
240+
### Configuration Loading
241+
242+
- `PromptConfigurationLoader` fetches prompt configs from Ruuter endpoint
243+
- Cache TTL: `PROMPT_CONFIG_CACHE_TTL`
244+
- Custom prompts per user/organization (stored in Vault/database)
245+
- Fallback to defaults if Ruuter unavailable
246+
247+
### Error Handling
248+
249+
- `generate_error_id()` creates unique error IDs for tracking
250+
- `log_error_with_context()` for structured error logging
251+
- Localized error messages via `get_localized_message()` (multilingual support)
252+
- Predefined message constants in `llm_orchestrator_constants.py`
253+
254+
### Testing Conventions
255+
256+
- Test files under `tests/` (unit, integration, deepeval)
257+
- Integration tests use `testcontainers` for Docker orchestration
258+
- Secrets required for integration tests (Azure OpenAI keys, etc.)
259+
- Mock data in `tests/mocks/` and `tests/data/`
260+
261+
### CI/CD Checks
262+
263+
1. **uv-env-check**: Lockfile vs. pyproject.toml consistency
264+
2. **pyright-type-check**: Type checking on src/ (strict mode)
265+
3. **ruff-format-check**: Code formatting compliance
266+
4. **ruff-lint-check**: Linting standards
267+
5. **pytest-integration-check**: Full integration tests (requires secrets)
268+
6. **deepeval-tests**: LLM evaluation metrics
269+
7. **gitleaks-check**: Secret detection (pre-commit + CI)
270+
271+
### Pre-commit Hooks
272+
273+
Configured in `.pre-commit-config.yaml`:
274+
- **gitleaks**: Secret scanning
275+
- **uv-lock**: Ensures lockfile consistency
276+
277+
### Constants and Thresholds
278+
279+
Key retrieval constants (`src/vector_indexer/constants.py` and contextual retrieval):
280+
- **Semantic search top-K**: 40 per query
281+
- **Semantic threshold**: 0.4 (cosine similarity ≥0.4 = 50-60% alignment)
282+
- **BM25 top-K**: 40 per query
283+
- **Response generation top-K**: 10 chunks (after RRF fusion)
284+
- **Query refinement count**: 5 variations + original = 6 total
285+
- **Search timeout**: 2 seconds per query
286+
287+
### Docker and Services
288+
289+
- Use `docker compose` (not `docker-compose`)
290+
- Services communicate via `bykstack` network
291+
- Shared volumes: `shared-volume`, `cron_data`
292+
- Vault agent containers per service (llm, gui, cron)
293+
- Resource limits: CPU and memory constraints defined in docker-compose.yml
294+
295+
## Important Notes
296+
297+
- **Python version pinned to 3.12.10** (see `pyproject.toml` and `.python-version`)
298+
- **Line length: 88** (Black-compatible, enforced by Ruff)
299+
- **No print() statements** in production code (use loguru logger)
300+
- **Pydantic for runtime validation** at API boundaries (FastAPI endpoints)
301+
- **Langfuse tracing** for observability (public/secret keys from Vault)
302+
- **Rate limiting** via `RateLimiter` utility (token and request budgets)
303+
- **Cost tracking** via `calculate_total_costs()` and budget tracker
304+
- **Language detection** for multilingual support (Estonian primary)
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
name: code-review
3+
description: Make sure all Python coding standards in the pyproject.toml file are followed, and that the code is clean, well-structured, maintainable, and efficient. Provide constructive feedback and suggestions for improvement.
4+
---

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ datasets
99
logs/
1010
data_sets
1111
vault/agent-out
12+
.vscode/
1213

1314
# RSA Private Keys - DO NOT COMMIT
1415
vault/keys/rsa_private_key.pem

GUI/.env.development

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,6 @@ REACT_APP_RUUTER_API_URL=http://localhost:8086
22
REACT_APP_RUUTER_PRIVATE_API_URL=http://localhost:8088
33
REACT_APP_CUSTOMER_SERVICE_LOGIN=http://localhost:3004/et/dev-auth
44
REACT_APP_SERVICE_ID=conversations,settings,monitoring
5-
REACT_APP_NOTIFICATION_NODE_URL=http://localhost:3005
6-
REACT_APP_CSP=upgrade-insecure-requests; default-src 'self'; font-src 'self' data:; img-src 'self' data:; script-src 'self' 'unsafe-eval' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; object-src 'none'; connect-src 'self' http://localhost:8086 http://localhost:8088 http://localhost:3004 http://localhost:3005 ws://localhost;
5+
REACT_APP_NOTIFICATION_NODE_URL=http://localhost:4040
6+
REACT_APP_CSP=upgrade-insecure-requests; default-src 'self'; font-src 'self' data:; img-src 'self' data:; script-src 'self' 'unsafe-eval' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; object-src 'none'; connect-src 'self' http://localhost:8086 http://localhost:8088 http://localhost:3004 http://localhost:4040 ws://localhost;
77
REACT_APP_ENABLE_HIDDEN_FEATURES=TRUE

0 commit comments

Comments
 (0)