Skip to content

Refactoring Proposal: Large Classes Identified for Decomposition #2274

@VascoSch92

Description

@VascoSch92

I asked OpenHands to review our codebase and search for possible classes that we could simplify. Moreover, I asked it to come up with design patterns that we should implement for that, and I gave a threshold of what constitutes a class that is too big. This was with the idea of reducing technical debt and making life easier in the future when adding new features.

Summary

A comprehensive code architecture analysis identified 6 classes that exceed complexity thresholds and should be refactored. These classes have:

  • More than 300 lines of code
  • More than 7-8 public methods
  • Multiple mixed responsibilities
  • High change frequency (hotspots)

📊 Prioritization Matrix

Priority Class File Lines Methods Changes (3mo) Impact
🔴 1 LLM openhands-sdk/.../llm/llm.py 1,472 37 22 Critical
🟠 2 Skill openhands-sdk/.../skills/skill.py 1,099 28 21 High
🟡 3 RemoteConversation openhands-sdk/.../remote_conversation.py 1,257 66 16 Medium-High
🟡 4 LocalConversation openhands-sdk/.../local_conversation.py 976 23 17 Medium
🟢 5 EventService openhands-agent-server/.../event_service.py 691 40 11 Medium
🟢 6 ConversationService openhands-agent-server/.../conversation_service.py 751 30 10 Lower

🔴 Priority 1: LLM Class

File: openhands-sdk/openhands/sdk/llm/llm.py

Current State

Metric Value Threshold
Lines 1,472 300
Methods 37 7-8
Dependencies 15+ 5-6
Git changes (3mo) 22 -

Responsibilities (God Class)

  1. Configuration management (30+ config fields)
  2. API key/authentication handling (AWS, OpenRouter, etc.)
  3. Model capability detection (vision, caching, responses API)
  4. Message formatting for Chat API
  5. Message formatting for Responses API
  6. Completion execution with retry logic
  7. Responses API execution with streaming
  8. Metrics and telemetry tracking
  9. Serialization/deserialization
  10. Subscription-based authentication flow

Code Smells

  • God Class: 10+ unrelated responsibilities
  • Hardcoded Dependencies: Creates Metrics, Telemetry, _tokenizer internally
  • Long Methods: completion() ~160 lines, responses() ~200 lines
  • Divergent Change: Auth, formatting, capabilities, metrics all modify this file

Proposed Solution

1. Extract LLMCapabilities class

# New file: openhands/sdk/llm/capabilities.py
class LLMCapabilities:
    """Detects and caches model capabilities."""
    
    def __init__(self, model: str, base_url: str | None, model_info: dict | None): ...
    def supports_vision(self) -> bool: ...
    def supports_prompt_cache(self) -> bool: ...
    def uses_responses_api(self) -> bool: ...
    def validate_context_window(self) -> None: ...

Methods to extract:

  • _init_model_info_and_caps()
  • _validate_context_window_size()
  • vision_is_active()
  • _supports_vision()
  • is_caching_prompt_active()
  • uses_responses_api()
  • _model_name_for_capabilities()

2. Extract MessageFormatter class

# New file: openhands/sdk/llm/formatter.py
class MessageFormatter:
    """Formats Message objects for different LLM APIs."""
    
    def __init__(self, capabilities: LLMCapabilities, config: FormatterConfig): ...
    def format_for_chat(self, messages: list[Message]) -> list[dict]: ...
    def format_for_responses(self, messages: list[Message]) -> tuple[str | None, list[dict]]: ...
    def apply_prompt_caching(self, messages: list[Message]) -> None: ...

Methods to extract:

  • format_messages_for_llm()
  • format_messages_for_responses()
  • _apply_prompt_caching()

3. Dependency Injection for Metrics & Telemetry

# Before (hardcoded in validator):
self._metrics = Metrics(model_name=self.model)
self._telemetry = Telemetry(...)

# After (injectable):
class LLM(BaseModel):
    metrics_factory: Callable[[], Metrics] | None = Field(default=None, exclude=True)
    telemetry_factory: Callable[[], Telemetry] | None = Field(default=None, exclude=True)

Target Structure

openhands/sdk/llm/
├── llm.py              # ~400 lines (config + orchestration)
├── capabilities.py     # ~150 lines (NEW)
├── formatter.py        # ~200 lines (NEW)
├── utils/
│   ├── metrics.py      # (existing)
│   ├── telemetry.py    # (existing)
│   └── retry_mixin.py  # (existing)

🟠 Priority 2: Skill Module

File: openhands-sdk/openhands/sdk/context/skills/skill.py

Current State

Metric Value Threshold
Lines 1,099 300
Methods 28 7-8
Module functions 10+ -
Git changes (3mo) 21 -

Responsibilities (God Class)

  1. Skill model definition with validation
  2. Legacy OpenHands skill loading (_load_legacy_openhands_skill)
  3. AgentSkills-format loading (_load_agentskills_skill)
  4. Third-party file handling (.cursorrules, agents.md)
  5. Skill resources management
  6. Trigger matching
  7. Variable extraction
  8. Prompt generation (to_prompt())
  9. Directory discovery (load_skills_from_dir(), load_project_skills())
  10. Marketplace integration (load_marketplace_skill_names(), load_public_skills())

Code Smells

  • God Class: Too many loading strategies in one class
  • Divergent Change: AgentSkills format, legacy format, third-party files all modify this
  • Long Method: _create_skill_from_metadata() ~95 lines

Proposed Solution

1. Strategy Pattern: Skill Loaders

# New: openhands/sdk/context/skills/loaders/__init__.py
from typing import Protocol

class SkillLoader(Protocol):
    def can_load(self, path: Path) -> bool: ...
    def load(self, path: Path, strict: bool) -> Skill: ...

# New: openhands/sdk/context/skills/loaders/agentskills.py
class AgentSkillsLoader(SkillLoader):
    """Loads SKILL.md files following AgentSkills standard."""

# New: openhands/sdk/context/skills/loaders/legacy.py
class LegacyOpenHandsLoader(SkillLoader):
    """Loads legacy .md files with frontmatter."""

# New: openhands/sdk/context/skills/loaders/third_party.py
class ThirdPartyLoader(SkillLoader):
    """Loads .cursorrules, agents.md, etc."""

2. Extract SkillDiscovery class

# New: openhands/sdk/context/skills/discovery.py
class SkillDiscovery:
    """Discovers and loads skills from various sources."""
    
    def __init__(self, loaders: list[SkillLoader]): ...
    def load_from_directory(self, dir: Path) -> list[Skill]: ...
    def load_project_skills(self, work_dir: Path) -> list[Skill]: ...
    def load_user_skills(self) -> list[Skill]: ...

3. Extract MarketplaceClient class

# New: openhands/sdk/context/skills/marketplace.py
class MarketplaceClient:
    """Interacts with public skill marketplace."""
    
    def load_skill_names(self) -> list[str]: ...
    def load_public_skills(self, skill_names: list[str] | None) -> list[Skill]: ...

Target Structure

openhands/sdk/context/skills/
├── skill.py            # ~200 lines (pure data model)
├── discovery.py        # ~150 lines (NEW)
├── marketplace.py      # ~100 lines (NEW)
├── loaders/            # (NEW)
│   ├── __init__.py
│   ├── agentskills.py
│   ├── legacy.py
│   └── third_party.py
├── utils.py            # (existing)
└── trigger.py          # (existing)

🟡 Priority 3: RemoteConversation

File: openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py

Current State

Metric Value Threshold
Lines 1,257 300
Methods 66 (across 4 classes) 7-8
Git changes (3mo) 16 -

Issue

Contains 4 classes in one file:

  • WebSocketCallbackClient (lines 97-220)
  • RemoteEventsList (lines 221-380)
  • RemoteState (lines 381-550)
  • RemoteConversation (lines 551-1257)

Code Smells

  • Shotgun Surgery: Protocol changes touch multiple inline classes
  • Long Method: __init__ is 175+ lines

Proposed Solution

Extract to separate modules (pure reorganization, no behavior change):

openhands/sdk/conversation/impl/remote/
├── __init__.py         # Re-exports RemoteConversation
├── conversation.py     # RemoteConversation (~350 lines)
├── state.py            # RemoteState (~170 lines)
├── events.py           # RemoteEventsList (~160 lines)
└── websocket.py        # WebSocketCallbackClient (~125 lines)

🟡 Priority 4: LocalConversation

File: openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py

Current State

Metric Value Threshold
Lines 976 300
Methods 23 7-8
Git changes (3mo) 17 -

Code Smells

  • Hardcoded Dependencies: Creates ConversationState, StuckDetector internally
  • Long Method: run() is ~135 lines
  • Divergent Change: Plugin logic, agent logic, run loop intertwined

Proposed Solution

1. Extract PluginManager class

# New: openhands/sdk/conversation/plugin_manager.py
class PluginManager:
    """Manages plugin loading and resolution."""
    
    def __init__(self, plugin_specs: list[PluginSource] | None): ...
    def ensure_loaded(self) -> list[ResolvedPluginSource]: ...
    def get_combined_hook_config(self, base_config: HookConfig | None) -> HookConfig: ...

2. Dependency Injection

# Add optional factories with backward-compatible defaults
class LocalConversation:
    def __init__(
        self,
        ...,
        state_factory: Callable[..., ConversationState] | None = None,
        stuck_detector_factory: Callable[[ConversationState], StuckDetector] | None = None,
    ): ...

🟢 Priority 5: EventService

File: openhands-agent-server/openhands/agent_server/event_service.py

Current State

Metric Value Threshold
Lines 691 300
Methods 40 7-8
Git changes (3mo) 11 -

Code Smells

  • God Class: Manages events AND conversation lifecycle
  • Feature Envy: generate_title(), ask_agent(), condense() delegate to conversation

Proposed Solution

Apply Facade Pattern - split into focused services:

EventService (keep):

  • get_event(), search_events(), count_events(), batch_get_events()
  • subscribe_to_events(), unsubscribe_from_events()
  • send_message()

ConversationLifecycleManager (extract):

  • start(), run(), pause(), close()
  • generate_title(), ask_agent(), condense()
  • set_confirmation_policy(), set_security_analyzer()

🟢 Priority 6: ConversationService

File: openhands-agent-server/openhands/agent_server/conversation_service.py

Current State

Metric Value Threshold
Lines 751 300
Methods 30 7-8
Git changes (3mo) 10 -

Proposed Solution

  1. Extract inline subscriber classes to separate module
  2. Extract webhook handling to WebhookManager

Implementation Plan

Phase 1: Low-Risk Extractions

  • Extract LLM.MessageFormatter (isolated, high impact)
  • Split remote_conversation.py into modules (reorganization only)
  • Extract Skill loader strategies

Phase 2: Dependency Injection

  • Add factory parameters to LLM for Metrics/Telemetry
  • Add factory parameters to LocalConversation for State/Detector

Phase 3: Server-Side Refactoring

  • Split EventService responsibilities
  • Extract ConversationService subscribers

Acceptance Criteria

For each refactoring:

  • No external behavior changes
  • All existing tests pass
  • New unit tests for extracted classes
  • Line count of original class reduced by 30%+
  • Each new class has single responsibility

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureRelated to core architecture.llmAbout LLMs.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions