Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
183 changes: 118 additions & 65 deletions docs/refactor_plan.md
Original file line number Diff line number Diff line change
@@ -1,65 +1,118 @@
# Refactor Plan for `app/main.py`

## Current pain points
- `app/main.py` mixes HTTP routing, session persistence, AI agent orchestration, realtime voice plumbing, and fallback heuristics inside a single 1,100+ line module, making it hard to reason about or extend.【F:app/main.py†L1-L336】【F:app/main.py†L337-L676】【F:app/main.py†L677-L1023】【F:app/main.py†L1024-L1131】
- Domain concepts (session metadata, voice messaging, evaluation summaries) are implicit dictionaries that are mutated from multiple endpoints, which invites inconsistent shapes and brittle tests.【F:app/main.py†L74-L164】【F:app/main.py†L336-L571】【F:app/main.py†L572-L856】
- OpenAI realtime/voice configuration is hard-coded inside the route, preventing reuse for the upcoming general chat experience and any “team of agents” orchestration with new parameters.【F:app/main.py†L856-L1023】

## Target architecture
1. **API routers**
- Introduce `app/api/__init__.py` and split FastAPI routes into dedicated routers: `sessions.py`, `questions.py`, `evaluations.py`, and `voice.py`. Each router imports thin service functions instead of touching globals directly.
- Mount routers from a slim `app/main.py` (now mostly FastAPI initialization and middleware wiring).

2. **Service layer**
- Create `app/services/session_service.py` encapsulating `_get_session`, `_persist_session_state`, and CRUD helpers. Use dataclasses or Pydantic models for strongly typed session structures to make mutation explicit.
- Move question generation, evaluation, and example-answer fallbacks into `app/services/qa_service.py`, parameterized to accept an injected `InterviewPracticeAgent` (or a more general `AgentTeam` factory for future multi-agent support).
- Extract voice-specific helpers (`_build_voice_instructions`, catalog helpers, preview synthesis) into `app/services/voice_service.py` with a `VoiceConfig` settings object so different features can reuse the same plumbing.
- Encapsulate OpenAI realtime session bootstrapping into `app/services/realtime_client.py`, allowing dependency injection of API clients and customizable parameters for new agent teams or general chat.

3. **Agent orchestration**
- Replace the `active_sessions` global dict with a dedicated `SessionStore` abstraction that composes the existing persistent store and in-memory cache. Expose a strategy interface so the forthcoming “team of agents” feature can register multiple agents per session (e.g., `PrimaryCoach`, `FeedbackCoach`, `GeneralChatCoach`).
- Model a reusable `AgentTeam` concept that aggregates multiple agent instances behind a single facade. Each team exposes lifecycle hooks (`boot`, `delegate`, `summarize`) so the API layer never talks to raw agents.
- Add a lightweight `AgentFactory` that accepts configuration (model, persona, parameters) and returns initialized agents. `start_agent` becomes a service function that registers whichever agents the feature flag requires.
- Define configuration-driven personas (e.g., YAML or Pydantic settings) so product teams can roll out additional agents by adding parameter bundles instead of code changes.

4. **Schemas & DTOs**
- Move all request/response models into `app/schemas/*.py`. Group by concern (e.g., `session.py`, `voice.py`). This keeps the routers concise and centralizes validation logic.
- Provide explicit models for session state slices (e.g., `SessionSummary`, `VoiceTranscript`) so downstream tests can assert structure without brittle dict-key checks.

5. **Configuration**
- Introduce `app/settings.py` or extend `app/config.py` with structured Pydantic settings classes (e.g., `RealtimeSettings`, `AgentSettings`). Dependency injection via FastAPI’s `Depends` can supply the configuration to routers/services, simplifying overrides for new environments.
- Capture the general chat defaults in a `GeneralChatSettings` object (model id, tone, allowed tools) so the new agent team can opt-in without mutating interview-specific values.
- Externalize runtime-tunable parameters (temperature, max tokens, concurrency) for each agent persona to support experimentation.

6. **Observability**
- Instrument the new service layer with structured logs/metrics (agent decision traces, queue times) to validate multi-agent coordination.
- Emit tracing spans around agent-team delegation to pinpoint latency regressions when general chat launches.

7. **Extensibility for additional teams**
- Ensure routers/services accept an injected `AgentTeamResolver` that selects the right team based on feature flags, session metadata, or tenant configuration.
- Provide clear extension points for future teams (e.g., skill-assessment, onboarding) so new personas can ship without touching existing flows.

## Testing strategy
1. **Unit tests**
- Add tests for the new service modules (`tests/services/test_session_service.py`, `test_voice_service.py`) that validate business rules such as transcript aggregation, session naming, and fallback scoring.
- Use dependency injection to pass fake agents into `qa_service` tests to cover both agent and fallback paths without live API calls.
- Mock the HTTP client in `realtime_client` tests to validate payload composition, ensuring the new agent parameters are forwarded correctly.

2. **Integration tests**
- With routers split, write FastAPI `TestClient` tests per router module (e.g., `tests/api/test_voice_routes.py`) asserting HTTP status codes, validation errors, and session state mutations.
- Introduce fixtures for `SessionStore` that start with seeded session data to test rename/delete flows and verify persistence hooks.
- Add general-chat scenarios that assert the correct agent team is selected and that team-specific parameters (model, style guides) propagate to downstream services.

3. **End-to-end contract**
- Keep a small set of smoke tests hitting the high-level flows (`upload -> generate -> evaluate`). These should mock outbound HTTP calls but run against the assembled FastAPI app to ensure routers and middleware wiring remain intact.
- Add a general-chat smoke test covering `start chat -> exchange messages -> summarize` to detect regressions in the agent-team orchestration.

## Incremental adoption plan
1. Extract session helper functions into `app/services/session_service.py` and refactor a single router (e.g., `/sessions` endpoints) to use it. Add unit tests for the new service.
2. Move the question/evaluation endpoints into `app/api/questions.py` + `qa_service`. Introduce agent factory abstraction and tests for fallback scoring logic.
3. Carve out voice-specific routes and helpers into `voice.py` + `voice_service` and `realtime_client`. Backfill catalog/preview tests using local fixtures.
4. Once services are covered, slim down `app/main.py` to FastAPI initialization, router registration, and the `if __name__ == "__main__"` block.
5. Introduce new general-chat feature module that composes the shared services and registers additional agent teams by supplying different agent factory parameters.
6. Ship the general-chat configuration behind a feature flag, validate telemetry, and then document how partner teams can add new agent personas via configuration plus targeted service tests.

This staged approach keeps the app deployable while enabling the upcoming agent-team functionality and richer configuration without rewriting everything at once.
# Refactor Plan for Multi-Agent General Chat Expansion

This document captures the refactor strategy needed to evolve the current interview-practice application into a compartmentalized platform that can host multiple **agent teams** (e.g., the existing interview coach plus a new general chat team for Veneo Inc.) with configurable parameters.

## 1. Context and goals

* `app/main.py` currently centralizes HTTP routing, session state management, agent orchestration, and voice/realtime plumbing in a single file of more than 1,100 lines.【F:app/main.py†L1-L1131】
* Domain objects (session metadata, transcripts, evaluation summaries) are shaped as loosely typed dictionaries shared across endpoints, making it difficult to safely add new personas or parameter sets.【F:app/main.py†L74-L1023】
* The new requirement—introducing another team of agents that supports a distinct parameter bundle for Veneo's general chat—requires clearer seams for routing, state, configuration, and agent instantiation so that features can coexist without interference.

**Primary objectives**

1. Decouple HTTP concerns from domain logic so additional features (general chat, future agent teams) can ship independently.
2. Provide explicit data contracts and persistence boundaries for session state to prevent cross-feature regressions.
3. Introduce an extensible agent-team orchestration layer that can register multiple personas per session and surface configuration-driven parameters.
4. Preserve existing interview flows while enabling an opt-in Veneo general chat experience behind feature gates.

## 2. Observed pain points (code review summary)

| Area | Issue | Impact on new agent team |
| --- | --- | --- |
| Routing (`app/main.py`) | All endpoints (upload, questions, evaluation, voice, realtime) live in one module, interleaving FastAPI routing with business logic and cross-cutting concerns.【F:app/main.py†L1-L571】【F:app/main.py†L572-L1023】 | Hard to isolate changes for Veneo chat without risking regressions in interview flows. |
| Session state | Global `active_sessions` dict and helper functions mutate nested dicts in place.【F:app/main.py†L74-L388】 | No schema enforcement; adding a general chat session risks type mismatches and race conditions. |
| Agent orchestration | The `InterviewPracticeAgent` is instantiated and invoked inside route handlers with hard-coded model/temperature parameters.【F:app/main.py†L336-L856】 | Cannot register multiple agents or swap parameter bundles per customer. |
| Realtime/voice setup | Realtime connection payloads and voice catalog helpers are baked into the route logic.【F:app/main.py†L856-L1023】 | Reuse for general chat would require duplicating code or branching logic inline. |

## 3. Target compartmentalized architecture

### 3.1 API boundary

* Create `app/api/` package with routers grouped by concern: `sessions.py`, `questions.py`, `evaluations.py`, `voice.py`, and `chat.py` (new for Veneo general chat).
* Slim down `app/main.py` to FastAPI initialization, dependency wiring, and router inclusion. This file should not hold business rules.
* Each router should delegate to service-layer functions and operate solely on typed request/response models.

### 3.2 Service layer

* **SessionService (`app/services/session_service.py`)**: encapsulate CRUD operations on session state; expose methods like `create_session`, `get_session`, `update_transcript`. Internally depend on a `SessionStore` abstraction.
* **QAService (`app/services/qa_service.py`)**: orchestrate question generation, evaluation, and fallback heuristics. Accept injected agents via an `AgentTeam` facade so both interview and general chat flows can share infrastructure while swapping personas/parameters.
* **ChatService (`app/services/chat_service.py`)**: handle Veneo-specific conversation flows, applying their parameter bundle, conversation memory rules, and guardrails.
* **VoiceService & RealtimeClient**: extract voice catalog lookup, preview synthesis, and realtime session bootstrapping into reusable modules to prevent duplication between interview and general chat experiences.

### 3.3 Agent team orchestration

* Model an `AgentTeam` interface exposing lifecycle hooks (`boot`, `delegate`, `summarize`). Each feature registers its team composition (e.g., Veneo general chat might use `PrimaryResponder` + `SafetyReviewer`).
* Implement an `AgentFactory` that accepts persona definitions (model, parameters, prompt templates, safety settings) and returns configured agent instances. Factor out repeated configuration to support future teams without modifying routes.
* Introduce an `AgentTeamResolver` responsible for selecting the appropriate team per session based on feature flags, tenant metadata, or request payloads.
* Ensure the resolver is injected into services/routers via FastAPI dependencies to keep the API layer declarative.

### 3.4 Data contracts and persistence

* Define Pydantic models under `app/schemas/` for session state slices (e.g., `SessionSummary`, `TranscriptEntry`, `AgentRunConfig`).
* Replace dictionary mutation with typed updates to guarantee compatibility when multiple features read/write the same session record.
* Consider persisting session data via an interface (in-memory + optional durable backend) to support parallel development of new agent teams without reworking storage.

### 3.5 Configuration management

* Create `app/settings.py` (Pydantic `BaseSettings`) containing structured configuration classes: `AgentSettings`, `RealtimeSettings`, `VoiceSettings`, and feature-specific `GeneralChatSettings` for Veneo.
* Store persona/parameter bundles in configuration (YAML or JSON) to enable non-engineering teams to adjust prompts, temperatures, and tool permissions.
* Use FastAPI dependency injection to fetch settings per request, enabling environment overrides and A/B experiments.

### 3.6 Observability and compliance

* Instrument the new services with structured logging around agent delegation, latency, and error handling so that introducing additional teams remains debuggable.
* Emit metrics/traces for team selection decisions to verify that Veneo tenants route to the general chat persona while others stay on the interview coach.
* Centralize audit logging (message content, agent responses, safety filters) to satisfy enterprise compliance requirements when multiple agent teams coexist.

## 4. Implementation roadmap

1. **Lay the foundation**
* Add `app/api/`, `app/services/`, `app/schemas/`, and `app/settings.py` modules.
* Introduce `SessionStore` abstraction with unit tests to cover basic CRUD plus concurrency edge cases.

2. **Extract session and QA flows**
* Move existing session and question/evaluation logic into `SessionService` and `QAService` respectively.
* Update existing routes to use the services while keeping their public API unchanged; add targeted tests.

3. **Introduce agent-team infrastructure**
* Implement `AgentFactory`, `AgentTeam`, and `AgentTeamResolver`.
* Wrap current `InterviewPracticeAgent` usage into an interview-specific team for backwards compatibility.

4. **Enable Veneo general chat**
* Create `ChatService` and `app/api/chat.py` router exposing the general chat endpoints.
* Define Veneo configuration bundle and register it with the resolver behind a feature flag.
* Verify routing/tests ensure only opted-in sessions receive the new team.

5. **Refine realtime & voice modules**
* Extract voice/realtime helpers into dedicated services.
* Update both interview and general chat flows to consume the shared implementations.

6. **Hardening & rollout**
* Expand test coverage (unit + integration + smoke tests) for both agent teams.
* Instrument telemetry dashboards; conduct load testing to ensure the resolver/agent factory scale.
* Document extension points for additional teams and finalize migration guide.

## 5. Testing strategy

* **Unit tests**: cover service classes, resolver logic, and agent factory configuration parsing. Use fakes/mocks for agents to avoid live API calls.
* **Integration tests**: FastAPI `TestClient` suites per router validating HTTP contracts, feature flag behaviour, and session persistence.
* **Contract/smoke tests**: orchestrate end-to-end flows (upload → generate → evaluate, start general chat → exchange messages → summarize) with mocked outbound calls to ensure the assembled app behaves correctly.

## 6. Risks and mitigations

| Risk | Mitigation |
| --- | --- |
| Refactor touches many files simultaneously | Ship in the staged roadmap above with feature-flagged Veneo chat; keep tests green at each step. |
| Configuration drift between agent teams | Centralize persona definitions in configuration files validated by CI (schema linting). |
| Session schema incompatibilities | Use versioned Pydantic models and migration helpers when storing sessions. |
| Performance regressions with multi-agent orchestration | Add tracing and load tests to monitor latency; allow teams to toggle agents per feature until tuning is complete. |

## 7. Definition of done

* `app/main.py` reduced to bootstrapper with <200 lines and no embedded business logic.
* All routes reside under `app/api/` and rely on typed schemas and service abstractions.
* `SessionStore`, `AgentFactory`, and `AgentTeamResolver` support both interview and Veneo general chat flows, with tests demonstrating persona selection and parameter propagation.
* Configuration-driven persona bundles documented and validated in CI.
* Observability dashboards updated to reflect multi-agent metrics.

Executing this plan will compartmentalize the application, enabling Veneo's general chat team—and future agent teams—to plug into a consistent architecture without destabilizing existing interview functionality.