graph TB
subgraph Frontend["Frontend (React 19 + TypeScript)"]
Dashboard[Dashboard]
Playground[Playground]
AuditUI[Audit Log]
RulesUI[Rules Mgmt]
end
Client[Client App]
subgraph API["Azure Functions (Python 3.10)"]
Validate["POST /validate"]
Ingest["POST /rules/ingest"]
Rules["GET /rules"]
Audit["GET /audit"]
Metrics["GET /metrics"]
end
subgraph Core["Core Business Logic (zero Azure imports)"]
OpenAI[OpenAI Client]
Engine["Compliance Engine<br/>PII · Bias · Safety · Rules"]
RAG[RAG Pipeline]
Logger[Audit Logger]
end
subgraph External["External Services"]
GPT["Azure OpenAI<br/>GPT-4o"]
FAISS["FAISS Index<br/>(embeddings)"]
HF["HuggingFace<br/>all-MiniLM-L6-v2"]
Blob["Azure Blob<br/>Storage"]
end
Frontend -->|HTTP| API
Client -->|POST /validate| Validate
Validate --> OpenAI
Validate --> Engine
Ingest --> RAG
Audit --> Logger
Metrics --> Logger
Rules --> RAG
OpenAI --> GPT
Engine -->|validate| RAG
RAG --> FAISS
RAG --> HF
Logger --> Blob
style Frontend fill:#dbe4ff,stroke:#4a9eed
style API fill:#e5dbff,stroke:#8b5cf6
style Core fill:#d3f9d8,stroke:#22c55e
style External fill:#ffd8a8,stroke:#f59e0b
SafeGen is a serverless middleware that sits between client applications and Azure OpenAI. It uses a multi-layer compliance engine with RAG-based policy retrieval to validate LLM outputs against dynamically loaded rule documents.
- Serverless-first — Azure Functions for zero-ops scaling
- Policy-as-data — Compliance rules are documents, not code; update without redeployment
- Audit everything — Every request/response pair logged for regulatory compliance
- Clean Architecture —
core/has zero Azure Functions imports; business logic is fully testable without the runtime - Type-safe end-to-end — Pydantic v2 on backend, strict TypeScript on frontend, snake_case throughout
safegen/
├── backend/
│ ├── function_app.py # Azure Functions v2 entry point (blueprint registration)
│ ├── requirements.txt # Python dependencies
│ ├── host.json # Azure Functions host config
│ ├── local.settings.example.json # Template for local settings
│ ├── pyproject.toml # Ruff + pytest config
│ │
│ ├── core/ # Business logic (no Azure Functions dependencies)
│ │ ├── models.py # Pydantic v2: ValidateRequest/Response, ComplianceResult, AuditRecord, MetricsResponse
│ │ ├── openai_client.py # Azure OpenAI wrapper with GenerationResult dataclass
│ │ ├── rag_pipeline.py # Text extraction → chunking → embedding → FAISS index
│ │ ├── blob_storage.py # Azure Blob Storage CRUD with BlobMetadata
│ │ ├── compliance_engine.py # Orchestrates all validators, computes compliance score
│ │ ├── validators.py # PIIDetector, BiasChecker, SafetyFilter
│ │ └── audit_logger.py # Dual-backend audit store (FileAuditStore / BlobAuditStore)
│ │
│ ├── functions/ # HTTP triggers (thin Blueprint wrappers)
│ │ ├── validate.py # POST /api/validate (LLM + compliance + audit)
│ │ ├── ingest_rules.py # POST /api/rules/ingest (file upload → FAISS)
│ │ ├── list_rules.py # GET /api/rules (list ingested rules)
│ │ ├── audit.py # GET /api/audit (paginated, date/status filter)
│ │ └── metrics.py # GET /api/metrics (aggregated stats, time series)
│ │
│ └── tests/ # 150 tests, all passing
│ ├── conftest.py # Shared fixtures: mock_env, mock_openai_client
│ ├── test_models.py # 17 tests — Pydantic model validation
│ ├── test_openai_client.py # 7 tests — Azure OpenAI wrapper
│ ├── test_validate.py # 13 tests — /api/validate endpoint
│ ├── test_rag_pipeline.py # 16 tests — extract, chunk, embed, FAISS, semantic search
│ ├── test_ingest_rules.py # 8 tests — /api/rules/ingest endpoint
│ ├── test_compliance_engine.py # 27 tests — scoring, flag aggregation
│ ├── test_validators.py # 40 tests — PII/bias/safety validators
│ ├── test_audit.py # 10 tests — /api/audit endpoint
│ ├── test_audit_logger.py # 6 tests — audit store backends
│ └── test_metrics.py # 6 tests — /api/metrics endpoint
│
├── frontend/
│ ├── vite.config.ts # Vite + Tailwind + @/ alias + /api proxy
│ ├── vitest.config.ts # jsdom test env + path aliases
│ ├── components.json # shadcn/ui config
│ └── src/
│ ├── App.tsx # BrowserRouter + route definitions
│ ├── main.tsx # React entry point (StrictMode)
│ ├── index.css # Tailwind v4 + light/dark tokens
│ ├── types/index.ts # 1:1 mirror of backend Pydantic models
│ ├── services/api.ts # Typed fetch wrappers + ApiError class
│ ├── hooks/ # useApi<T>, useTheme
│ ├── lib/ # cn(), formatters, constants
│ ├── components/
│ │ ├── ui/ # 10 shadcn components
│ │ ├── layout/ # Sidebar + Header + AppLayout
│ │ ├── dashboard/ # KpiCard, TrendChart, FlagBreakdownChart, ScoreGauge
│ │ ├── playground/ # PromptInput, ResultPanel, FlagList, ExamplePrompts
│ │ ├── audit/ # AuditFilters, AuditTable, AuditPagination, AuditDetailModal
│ │ └── rules/ # RuleUploader (drag-and-drop), RuleList
│ ├── pages/ # DashboardPage, PlaygroundPage, AuditPage, RulesPage
│ └── test/ # 53 tests (setup, mocks, component/page/service tests)
│
├── rules/ # Sample compliance documents
│ ├── gdpr_content_rules.md
│ ├── bias_detection_policy.md
│ └── pii_handling_rules.md
│
├── ARCHITECTURE.md
├── BUILDPLAN.md
├── ROADMAP.md
└── README.md
1. Client sends { prompt, context?, rules_category? }
2. Azure Function receives and validates request (Pydantic)
3. Call Azure OpenAI GPT-4o with prompt → raw LLM response
4. Compliance Engine (sequential layers):
a. PIIDetector — regex for email, phone, SSN, credit card, IPv4
b. BiasChecker — gendered job titles, ableist terms, stereotype patterns
c. SafetyFilter — hate speech, violence instructions, self-harm
d. Score: 1.0 base, -0.3 per critical, -0.1 per warning
5. Audit Logger — write full record (request + response + compliance) to store
6. Return { response, compliance: { passed, score, flags }, model }
1. Upload PDF/DOCX/MD/TXT compliance document
2. Extract text (PyMuPDF for PDF, python-docx for DOCX)
3. Chunk into ~500 token segments with 50 token overlap
4. Generate embeddings (HuggingFace all-MiniLM-L6-v2)
5. Add to FAISS index, persist to disk
6. Return { message, chunk_count }
DashboardPage → GET /api/metrics → audit store → O(n) aggregation → KPIs + charts
PlaygroundPage → POST /api/validate → OpenAI + compliance engine → live results
AuditPage → GET /api/audit → audit store → paginated records → table + modal
RulesPage → GET /api/rules → FAISS index metadata → rule cards
→ POST /api/rules/ingest → file upload → chunk + embed → FAISS
| Decision | Choice | Rationale |
|---|---|---|
| Serverless runtime | Azure Functions v2 (Python) | Scales to zero; pay-per-use; matches Azure ecosystem |
| Vector store | FAISS (in-memory) | Fast, no infrastructure; sufficient for rule-set sizes (<10k docs) |
| Embeddings | HuggingFace all-MiniLM-L6-v2 |
Free, fast, good quality for semantic search |
| Audit storage | Dual-backend (File + Blob) | FileAuditStore for local dev; BlobAuditStore for production |
| Frontend UI | shadcn/ui (copy-paste) | Full control, Tailwind-native, no runtime dependency |
| Types strategy | snake_case everywhere | TypeScript interfaces match JSON responses; no transformation |
| Chart library | Recharts | Composable, React-native, supports area/bar charts out of box |
| State management | useState + useApi hook | Minimal deps; no React Query needed at current scale |
| API proxy | Vite dev proxy /api → :7071 |
No CORS changes needed; clean development experience |
The engine runs validation layers based on rules_category:
| Category | Layers Run |
|---|---|
all (default) |
PII + Bias + Safety |
pii |
PII only |
bias |
Bias only |
safety |
Safety only |
regulatory |
PII + Bias + Safety |
Each layer returns ValidationFlag objects with layer, severity, message, and details. The engine aggregates these into a ComplianceResult with a boolean pass/fail and numeric score.
The validators include context-aware exclusions to reduce false positives:
- PII:
example.comemails excluded, date-like SSN patterns (2024-01-15) excluded, version-like IPs (v1.2.3.4) excluded - Bias: Only flags terms in isolation, not when part of legitimate compound words
- Safety: Educational and clinical context detection prevents over-flagging medical/academic content
Each HTTP endpoint is an azure.functions.Blueprint registered in function_app.py:
# functions/validate.py
bp = Blueprint()
@bp.route(route="validate", methods=["POST"])
def validate(req: HttpRequest) -> HttpResponse:
...
# function_app.py
app = FunctionApp()
app.register_functions(validate_bp)
app.register_functions(ingest_bp)
# ...This keeps each endpoint isolated and testable. Adding a new endpoint: create a Blueprint in functions/, register it in function_app.py.
Module-level singletons for expensive resources:
- OpenAI client (
validate.py): Created on first request, reused across invocations - Embedding model (
rag_pipeline.py): sentence-transformers model loaded once, cached
This optimizes Azure Functions cold starts while maintaining connection reuse.