diff --git a/docs/refactor_plan.md b/docs/refactor_plan.md index edf8df8..8d83bac 100644 --- a/docs/refactor_plan.md +++ b/docs/refactor_plan.md @@ -1,65 +1,118 @@ -# Refactor Plan for `app/main.py` - -## Current pain points -- `app/main.py` mixes HTTP routing, session persistence, AI agent orchestration, realtime voice plumbing, and fallback heuristics inside a single 1,100+ line module, making it hard to reason about or extend.【F:app/main.py†L1-L336】【F:app/main.py†L337-L676】【F:app/main.py†L677-L1023】【F:app/main.py†L1024-L1131】 -- Domain concepts (session metadata, voice messaging, evaluation summaries) are implicit dictionaries that are mutated from multiple endpoints, which invites inconsistent shapes and brittle tests.【F:app/main.py†L74-L164】【F:app/main.py†L336-L571】【F:app/main.py†L572-L856】 -- OpenAI realtime/voice configuration is hard-coded inside the route, preventing reuse for the upcoming general chat experience and any “team of agents” orchestration with new parameters.【F:app/main.py†L856-L1023】 - -## Target architecture -1. **API routers** - - Introduce `app/api/__init__.py` and split FastAPI routes into dedicated routers: `sessions.py`, `questions.py`, `evaluations.py`, and `voice.py`. Each router imports thin service functions instead of touching globals directly. - - Mount routers from a slim `app/main.py` (now mostly FastAPI initialization and middleware wiring). - -2. **Service layer** - - Create `app/services/session_service.py` encapsulating `_get_session`, `_persist_session_state`, and CRUD helpers. Use dataclasses or Pydantic models for strongly typed session structures to make mutation explicit. - - Move question generation, evaluation, and example-answer fallbacks into `app/services/qa_service.py`, parameterized to accept an injected `InterviewPracticeAgent` (or a more general `AgentTeam` factory for future multi-agent support). - - Extract voice-specific helpers (`_build_voice_instructions`, catalog helpers, preview synthesis) into `app/services/voice_service.py` with a `VoiceConfig` settings object so different features can reuse the same plumbing. - - Encapsulate OpenAI realtime session bootstrapping into `app/services/realtime_client.py`, allowing dependency injection of API clients and customizable parameters for new agent teams or general chat. - -3. **Agent orchestration** - - Replace the `active_sessions` global dict with a dedicated `SessionStore` abstraction that composes the existing persistent store and in-memory cache. Expose a strategy interface so the forthcoming “team of agents” feature can register multiple agents per session (e.g., `PrimaryCoach`, `FeedbackCoach`, `GeneralChatCoach`). - - Model a reusable `AgentTeam` concept that aggregates multiple agent instances behind a single facade. Each team exposes lifecycle hooks (`boot`, `delegate`, `summarize`) so the API layer never talks to raw agents. - - Add a lightweight `AgentFactory` that accepts configuration (model, persona, parameters) and returns initialized agents. `start_agent` becomes a service function that registers whichever agents the feature flag requires. - - Define configuration-driven personas (e.g., YAML or Pydantic settings) so product teams can roll out additional agents by adding parameter bundles instead of code changes. - -4. **Schemas & DTOs** - - Move all request/response models into `app/schemas/*.py`. Group by concern (e.g., `session.py`, `voice.py`). This keeps the routers concise and centralizes validation logic. - - Provide explicit models for session state slices (e.g., `SessionSummary`, `VoiceTranscript`) so downstream tests can assert structure without brittle dict-key checks. - -5. **Configuration** - - Introduce `app/settings.py` or extend `app/config.py` with structured Pydantic settings classes (e.g., `RealtimeSettings`, `AgentSettings`). Dependency injection via FastAPI’s `Depends` can supply the configuration to routers/services, simplifying overrides for new environments. - - Capture the general chat defaults in a `GeneralChatSettings` object (model id, tone, allowed tools) so the new agent team can opt-in without mutating interview-specific values. - - Externalize runtime-tunable parameters (temperature, max tokens, concurrency) for each agent persona to support experimentation. - -6. **Observability** - - Instrument the new service layer with structured logs/metrics (agent decision traces, queue times) to validate multi-agent coordination. - - Emit tracing spans around agent-team delegation to pinpoint latency regressions when general chat launches. - -7. **Extensibility for additional teams** - - Ensure routers/services accept an injected `AgentTeamResolver` that selects the right team based on feature flags, session metadata, or tenant configuration. - - Provide clear extension points for future teams (e.g., skill-assessment, onboarding) so new personas can ship without touching existing flows. - -## Testing strategy -1. **Unit tests** - - Add tests for the new service modules (`tests/services/test_session_service.py`, `test_voice_service.py`) that validate business rules such as transcript aggregation, session naming, and fallback scoring. - - Use dependency injection to pass fake agents into `qa_service` tests to cover both agent and fallback paths without live API calls. - - Mock the HTTP client in `realtime_client` tests to validate payload composition, ensuring the new agent parameters are forwarded correctly. - -2. **Integration tests** - - With routers split, write FastAPI `TestClient` tests per router module (e.g., `tests/api/test_voice_routes.py`) asserting HTTP status codes, validation errors, and session state mutations. - - Introduce fixtures for `SessionStore` that start with seeded session data to test rename/delete flows and verify persistence hooks. - - Add general-chat scenarios that assert the correct agent team is selected and that team-specific parameters (model, style guides) propagate to downstream services. - -3. **End-to-end contract** - - Keep a small set of smoke tests hitting the high-level flows (`upload -> generate -> evaluate`). These should mock outbound HTTP calls but run against the assembled FastAPI app to ensure routers and middleware wiring remain intact. - - Add a general-chat smoke test covering `start chat -> exchange messages -> summarize` to detect regressions in the agent-team orchestration. - -## Incremental adoption plan -1. Extract session helper functions into `app/services/session_service.py` and refactor a single router (e.g., `/sessions` endpoints) to use it. Add unit tests for the new service. -2. Move the question/evaluation endpoints into `app/api/questions.py` + `qa_service`. Introduce agent factory abstraction and tests for fallback scoring logic. -3. Carve out voice-specific routes and helpers into `voice.py` + `voice_service` and `realtime_client`. Backfill catalog/preview tests using local fixtures. -4. Once services are covered, slim down `app/main.py` to FastAPI initialization, router registration, and the `if __name__ == "__main__"` block. -5. Introduce new general-chat feature module that composes the shared services and registers additional agent teams by supplying different agent factory parameters. -6. Ship the general-chat configuration behind a feature flag, validate telemetry, and then document how partner teams can add new agent personas via configuration plus targeted service tests. - -This staged approach keeps the app deployable while enabling the upcoming agent-team functionality and richer configuration without rewriting everything at once. +# Refactor Plan for Multi-Agent General Chat Expansion + +This document captures the refactor strategy needed to evolve the current interview-practice application into a compartmentalized platform that can host multiple **agent teams** (e.g., the existing interview coach plus a new general chat team for Veneo Inc.) with configurable parameters. + +## 1. Context and goals + +* `app/main.py` currently centralizes HTTP routing, session state management, agent orchestration, and voice/realtime plumbing in a single file of more than 1,100 lines.【F:app/main.py†L1-L1131】 +* Domain objects (session metadata, transcripts, evaluation summaries) are shaped as loosely typed dictionaries shared across endpoints, making it difficult to safely add new personas or parameter sets.【F:app/main.py†L74-L1023】 +* The new requirement—introducing another team of agents that supports a distinct parameter bundle for Veneo's general chat—requires clearer seams for routing, state, configuration, and agent instantiation so that features can coexist without interference. + +**Primary objectives** + +1. Decouple HTTP concerns from domain logic so additional features (general chat, future agent teams) can ship independently. +2. Provide explicit data contracts and persistence boundaries for session state to prevent cross-feature regressions. +3. Introduce an extensible agent-team orchestration layer that can register multiple personas per session and surface configuration-driven parameters. +4. Preserve existing interview flows while enabling an opt-in Veneo general chat experience behind feature gates. + +## 2. Observed pain points (code review summary) + +| Area | Issue | Impact on new agent team | +| --- | --- | --- | +| Routing (`app/main.py`) | All endpoints (upload, questions, evaluation, voice, realtime) live in one module, interleaving FastAPI routing with business logic and cross-cutting concerns.【F:app/main.py†L1-L571】【F:app/main.py†L572-L1023】 | Hard to isolate changes for Veneo chat without risking regressions in interview flows. | +| Session state | Global `active_sessions` dict and helper functions mutate nested dicts in place.【F:app/main.py†L74-L388】 | No schema enforcement; adding a general chat session risks type mismatches and race conditions. | +| Agent orchestration | The `InterviewPracticeAgent` is instantiated and invoked inside route handlers with hard-coded model/temperature parameters.【F:app/main.py†L336-L856】 | Cannot register multiple agents or swap parameter bundles per customer. | +| Realtime/voice setup | Realtime connection payloads and voice catalog helpers are baked into the route logic.【F:app/main.py†L856-L1023】 | Reuse for general chat would require duplicating code or branching logic inline. | + +## 3. Target compartmentalized architecture + +### 3.1 API boundary + +* Create `app/api/` package with routers grouped by concern: `sessions.py`, `questions.py`, `evaluations.py`, `voice.py`, and `chat.py` (new for Veneo general chat). +* Slim down `app/main.py` to FastAPI initialization, dependency wiring, and router inclusion. This file should not hold business rules. +* Each router should delegate to service-layer functions and operate solely on typed request/response models. + +### 3.2 Service layer + +* **SessionService (`app/services/session_service.py`)**: encapsulate CRUD operations on session state; expose methods like `create_session`, `get_session`, `update_transcript`. Internally depend on a `SessionStore` abstraction. +* **QAService (`app/services/qa_service.py`)**: orchestrate question generation, evaluation, and fallback heuristics. Accept injected agents via an `AgentTeam` facade so both interview and general chat flows can share infrastructure while swapping personas/parameters. +* **ChatService (`app/services/chat_service.py`)**: handle Veneo-specific conversation flows, applying their parameter bundle, conversation memory rules, and guardrails. +* **VoiceService & RealtimeClient**: extract voice catalog lookup, preview synthesis, and realtime session bootstrapping into reusable modules to prevent duplication between interview and general chat experiences. + +### 3.3 Agent team orchestration + +* Model an `AgentTeam` interface exposing lifecycle hooks (`boot`, `delegate`, `summarize`). Each feature registers its team composition (e.g., Veneo general chat might use `PrimaryResponder` + `SafetyReviewer`). +* Implement an `AgentFactory` that accepts persona definitions (model, parameters, prompt templates, safety settings) and returns configured agent instances. Factor out repeated configuration to support future teams without modifying routes. +* Introduce an `AgentTeamResolver` responsible for selecting the appropriate team per session based on feature flags, tenant metadata, or request payloads. +* Ensure the resolver is injected into services/routers via FastAPI dependencies to keep the API layer declarative. + +### 3.4 Data contracts and persistence + +* Define Pydantic models under `app/schemas/` for session state slices (e.g., `SessionSummary`, `TranscriptEntry`, `AgentRunConfig`). +* Replace dictionary mutation with typed updates to guarantee compatibility when multiple features read/write the same session record. +* Consider persisting session data via an interface (in-memory + optional durable backend) to support parallel development of new agent teams without reworking storage. + +### 3.5 Configuration management + +* Create `app/settings.py` (Pydantic `BaseSettings`) containing structured configuration classes: `AgentSettings`, `RealtimeSettings`, `VoiceSettings`, and feature-specific `GeneralChatSettings` for Veneo. +* Store persona/parameter bundles in configuration (YAML or JSON) to enable non-engineering teams to adjust prompts, temperatures, and tool permissions. +* Use FastAPI dependency injection to fetch settings per request, enabling environment overrides and A/B experiments. + +### 3.6 Observability and compliance + +* Instrument the new services with structured logging around agent delegation, latency, and error handling so that introducing additional teams remains debuggable. +* Emit metrics/traces for team selection decisions to verify that Veneo tenants route to the general chat persona while others stay on the interview coach. +* Centralize audit logging (message content, agent responses, safety filters) to satisfy enterprise compliance requirements when multiple agent teams coexist. + +## 4. Implementation roadmap + +1. **Lay the foundation** + * Add `app/api/`, `app/services/`, `app/schemas/`, and `app/settings.py` modules. + * Introduce `SessionStore` abstraction with unit tests to cover basic CRUD plus concurrency edge cases. + +2. **Extract session and QA flows** + * Move existing session and question/evaluation logic into `SessionService` and `QAService` respectively. + * Update existing routes to use the services while keeping their public API unchanged; add targeted tests. + +3. **Introduce agent-team infrastructure** + * Implement `AgentFactory`, `AgentTeam`, and `AgentTeamResolver`. + * Wrap current `InterviewPracticeAgent` usage into an interview-specific team for backwards compatibility. + +4. **Enable Veneo general chat** + * Create `ChatService` and `app/api/chat.py` router exposing the general chat endpoints. + * Define Veneo configuration bundle and register it with the resolver behind a feature flag. + * Verify routing/tests ensure only opted-in sessions receive the new team. + +5. **Refine realtime & voice modules** + * Extract voice/realtime helpers into dedicated services. + * Update both interview and general chat flows to consume the shared implementations. + +6. **Hardening & rollout** + * Expand test coverage (unit + integration + smoke tests) for both agent teams. + * Instrument telemetry dashboards; conduct load testing to ensure the resolver/agent factory scale. + * Document extension points for additional teams and finalize migration guide. + +## 5. Testing strategy + +* **Unit tests**: cover service classes, resolver logic, and agent factory configuration parsing. Use fakes/mocks for agents to avoid live API calls. +* **Integration tests**: FastAPI `TestClient` suites per router validating HTTP contracts, feature flag behaviour, and session persistence. +* **Contract/smoke tests**: orchestrate end-to-end flows (upload → generate → evaluate, start general chat → exchange messages → summarize) with mocked outbound calls to ensure the assembled app behaves correctly. + +## 6. Risks and mitigations + +| Risk | Mitigation | +| --- | --- | +| Refactor touches many files simultaneously | Ship in the staged roadmap above with feature-flagged Veneo chat; keep tests green at each step. | +| Configuration drift between agent teams | Centralize persona definitions in configuration files validated by CI (schema linting). | +| Session schema incompatibilities | Use versioned Pydantic models and migration helpers when storing sessions. | +| Performance regressions with multi-agent orchestration | Add tracing and load tests to monitor latency; allow teams to toggle agents per feature until tuning is complete. | + +## 7. Definition of done + +* `app/main.py` reduced to bootstrapper with <200 lines and no embedded business logic. +* All routes reside under `app/api/` and rely on typed schemas and service abstractions. +* `SessionStore`, `AgentFactory`, and `AgentTeamResolver` support both interview and Veneo general chat flows, with tests demonstrating persona selection and parameter propagation. +* Configuration-driven persona bundles documented and validated in CI. +* Observability dashboards updated to reflect multi-agent metrics. + +Executing this plan will compartmentalize the application, enabling Veneo's general chat team—and future agent teams—to plug into a consistent architecture without destabilizing existing interview functionality.