Refactoring Proposal: Large Classes Identified for Decomposition


_I asked OpenHands to review our codebase and search for possible classes that we could simplify. Moreover, I asked it to come up with design patterns that we should implement for that, and I gave a threshold of what constitutes a class that is too big. This was with the idea of reducing technical debt and making life easier in the future when adding new features._

## Summary

A comprehensive code architecture analysis identified **6 classes** that exceed complexity thresholds and should be refactored. These classes have:
- More than 300 lines of code
- More than 7-8 public methods
- Multiple mixed responsibilities
- High change frequency (hotspots)

---

## 📊 Prioritization Matrix

| Priority | Class | File | Lines | Methods | Changes (3mo) | Impact |
|:--------:|-------|------|------:|--------:|--------------:|--------|
| 🔴 1 | `LLM` | `openhands-sdk/.../llm/llm.py` | 1,472 | 37 | 22 | Critical |
| 🟠 2 | `Skill` | `openhands-sdk/.../skills/skill.py` | 1,099 | 28 | 21 | High |
| 🟡 3 | `RemoteConversation` | `openhands-sdk/.../remote_conversation.py` | 1,257 | 66 | 16 | Medium-High |
| 🟡 4 | `LocalConversation` | `openhands-sdk/.../local_conversation.py` | 976 | 23 | 17 | Medium |
| 🟢 5 | `EventService` | `openhands-agent-server/.../event_service.py` | 691 | 40 | 11 | Medium |
| 🟢 6 | `ConversationService` | `openhands-agent-server/.../conversation_service.py` | 751 | 30 | 10 | Lower |

---

## 🔴 Priority 1: LLM Class

**File:** `openhands-sdk/openhands/sdk/llm/llm.py`

### Current State

| Metric | Value | Threshold |
|--------|------:|----------:|
| Lines | 1,472 | 300 |
| Methods | 37 | 7-8 |
| Dependencies | 15+ | 5-6 |
| Git changes (3mo) | 22 | - |

### Responsibilities (God Class)

1. Configuration management (30+ config fields)
2. API key/authentication handling (AWS, OpenRouter, etc.)
3. Model capability detection (vision, caching, responses API)
4. Message formatting for Chat API
5. Message formatting for Responses API  
6. Completion execution with retry logic
7. Responses API execution with streaming
8. Metrics and telemetry tracking
9. Serialization/deserialization
10. Subscription-based authentication flow

### Code Smells

- **God Class**: 10+ unrelated responsibilities
- **Hardcoded Dependencies**: Creates `Metrics`, `Telemetry`, `_tokenizer` internally
- **Long Methods**: `completion()` ~160 lines, `responses()` ~200 lines
- **Divergent Change**: Auth, formatting, capabilities, metrics all modify this file

### Proposed Solution

#### 1. Extract `LLMCapabilities` class

```python
# New file: openhands/sdk/llm/capabilities.py
class LLMCapabilities:
    """Detects and caches model capabilities."""
    
    def __init__(self, model: str, base_url: str | None, model_info: dict | None): ...
    def supports_vision(self) -> bool: ...
    def supports_prompt_cache(self) -> bool: ...
    def uses_responses_api(self) -> bool: ...
    def validate_context_window(self) -> None: ...
```

**Methods to extract:**
- `_init_model_info_and_caps()`
- `_validate_context_window_size()`
- `vision_is_active()`
- `_supports_vision()`
- `is_caching_prompt_active()`
- `uses_responses_api()`
- `_model_name_for_capabilities()`

#### 2. Extract `MessageFormatter` class

```python
# New file: openhands/sdk/llm/formatter.py
class MessageFormatter:
    """Formats Message objects for different LLM APIs."""
    
    def __init__(self, capabilities: LLMCapabilities, config: FormatterConfig): ...
    def format_for_chat(self, messages: list[Message]) -> list[dict]: ...
    def format_for_responses(self, messages: list[Message]) -> tuple[str | None, list[dict]]: ...
    def apply_prompt_caching(self, messages: list[Message]) -> None: ...
```

**Methods to extract:**
- `format_messages_for_llm()`
- `format_messages_for_responses()`
- `_apply_prompt_caching()`

#### 3. Dependency Injection for Metrics & Telemetry

```python
# Before (hardcoded in validator):
self._metrics = Metrics(model_name=self.model)
self._telemetry = Telemetry(...)

# After (injectable):
class LLM(BaseModel):
    metrics_factory: Callable[[], Metrics] | None = Field(default=None, exclude=True)
    telemetry_factory: Callable[[], Telemetry] | None = Field(default=None, exclude=True)
```

### Target Structure

```
openhands/sdk/llm/
├── llm.py              # ~400 lines (config + orchestration)
├── capabilities.py     # ~150 lines (NEW)
├── formatter.py        # ~200 lines (NEW)
├── utils/
│   ├── metrics.py      # (existing)
│   ├── telemetry.py    # (existing)
│   └── retry_mixin.py  # (existing)
```

---

## 🟠 Priority 2: Skill Module

**File:** `openhands-sdk/openhands/sdk/context/skills/skill.py`

### Current State

| Metric | Value | Threshold |
|--------|------:|----------:|
| Lines | 1,099 | 300 |
| Methods | 28 | 7-8 |
| Module functions | 10+ | - |
| Git changes (3mo) | 21 | - |

### Responsibilities (God Class)

1. Skill model definition with validation
2. Legacy OpenHands skill loading (`_load_legacy_openhands_skill`)
3. AgentSkills-format loading (`_load_agentskills_skill`)
4. Third-party file handling (`.cursorrules`, `agents.md`)
5. Skill resources management
6. Trigger matching
7. Variable extraction
8. Prompt generation (`to_prompt()`)
9. Directory discovery (`load_skills_from_dir()`, `load_project_skills()`)
10. Marketplace integration (`load_marketplace_skill_names()`, `load_public_skills()`)

### Code Smells

- **God Class**: Too many loading strategies in one class
- **Divergent Change**: AgentSkills format, legacy format, third-party files all modify this
- **Long Method**: `_create_skill_from_metadata()` ~95 lines

### Proposed Solution

#### 1. Strategy Pattern: Skill Loaders

```python
# New: openhands/sdk/context/skills/loaders/__init__.py
from typing import Protocol

class SkillLoader(Protocol):
    def can_load(self, path: Path) -> bool: ...
    def load(self, path: Path, strict: bool) -> Skill: ...

# New: openhands/sdk/context/skills/loaders/agentskills.py
class AgentSkillsLoader(SkillLoader):
    """Loads SKILL.md files following AgentSkills standard."""

# New: openhands/sdk/context/skills/loaders/legacy.py
class LegacyOpenHandsLoader(SkillLoader):
    """Loads legacy .md files with frontmatter."""

# New: openhands/sdk/context/skills/loaders/third_party.py
class ThirdPartyLoader(SkillLoader):
    """Loads .cursorrules, agents.md, etc."""
```

#### 2. Extract `SkillDiscovery` class

```python
# New: openhands/sdk/context/skills/discovery.py
class SkillDiscovery:
    """Discovers and loads skills from various sources."""
    
    def __init__(self, loaders: list[SkillLoader]): ...
    def load_from_directory(self, dir: Path) -> list[Skill]: ...
    def load_project_skills(self, work_dir: Path) -> list[Skill]: ...
    def load_user_skills(self) -> list[Skill]: ...
```

#### 3. Extract `MarketplaceClient` class

```python
# New: openhands/sdk/context/skills/marketplace.py
class MarketplaceClient:
    """Interacts with public skill marketplace."""
    
    def load_skill_names(self) -> list[str]: ...
    def load_public_skills(self, skill_names: list[str] | None) -> list[Skill]: ...
```

### Target Structure

```
openhands/sdk/context/skills/
├── skill.py            # ~200 lines (pure data model)
├── discovery.py        # ~150 lines (NEW)
├── marketplace.py      # ~100 lines (NEW)
├── loaders/            # (NEW)
│   ├── __init__.py
│   ├── agentskills.py
│   ├── legacy.py
│   └── third_party.py
├── utils.py            # (existing)
└── trigger.py          # (existing)
```

---

## 🟡 Priority 3: RemoteConversation

**File:** `openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py`

### Current State

| Metric | Value | Threshold |
|--------|------:|----------:|
| Lines | 1,257 | 300 |
| Methods | 66 (across 4 classes) | 7-8 |
| Git changes (3mo) | 16 | - |

### Issue

Contains 4 classes in one file:
- `WebSocketCallbackClient` (lines 97-220)
- `RemoteEventsList` (lines 221-380)
- `RemoteState` (lines 381-550)
- `RemoteConversation` (lines 551-1257)

### Code Smells

- **Shotgun Surgery**: Protocol changes touch multiple inline classes
- **Long Method**: `__init__` is 175+ lines

### Proposed Solution

Extract to separate modules (pure reorganization, no behavior change):

```
openhands/sdk/conversation/impl/remote/
├── __init__.py         # Re-exports RemoteConversation
├── conversation.py     # RemoteConversation (~350 lines)
├── state.py            # RemoteState (~170 lines)
├── events.py           # RemoteEventsList (~160 lines)
└── websocket.py        # WebSocketCallbackClient (~125 lines)
```

---

## 🟡 Priority 4: LocalConversation

**File:** `openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py`

### Current State

| Metric | Value | Threshold |
|--------|------:|----------:|
| Lines | 976 | 300 |
| Methods | 23 | 7-8 |
| Git changes (3mo) | 17 | - |

### Code Smells

- **Hardcoded Dependencies**: Creates `ConversationState`, `StuckDetector` internally
- **Long Method**: `run()` is ~135 lines
- **Divergent Change**: Plugin logic, agent logic, run loop intertwined

### Proposed Solution

#### 1. Extract `PluginManager` class

```python
# New: openhands/sdk/conversation/plugin_manager.py
class PluginManager:
    """Manages plugin loading and resolution."""
    
    def __init__(self, plugin_specs: list[PluginSource] | None): ...
    def ensure_loaded(self) -> list[ResolvedPluginSource]: ...
    def get_combined_hook_config(self, base_config: HookConfig | None) -> HookConfig: ...
```

#### 2. Dependency Injection

```python
# Add optional factories with backward-compatible defaults
class LocalConversation:
    def __init__(
        self,
        ...,
        state_factory: Callable[..., ConversationState] | None = None,
        stuck_detector_factory: Callable[[ConversationState], StuckDetector] | None = None,
    ): ...
```

---

## 🟢 Priority 5: EventService

**File:** `openhands-agent-server/openhands/agent_server/event_service.py`

### Current State

| Metric | Value | Threshold |
|--------|------:|----------:|
| Lines | 691 | 300 |
| Methods | 40 | 7-8 |
| Git changes (3mo) | 11 | - |

### Code Smells

- **God Class**: Manages events AND conversation lifecycle
- **Feature Envy**: `generate_title()`, `ask_agent()`, `condense()` delegate to conversation

### Proposed Solution

Apply **Facade Pattern** - split into focused services:

**EventService (keep):**
- `get_event()`, `search_events()`, `count_events()`, `batch_get_events()`
- `subscribe_to_events()`, `unsubscribe_from_events()`
- `send_message()`

**ConversationLifecycleManager (extract):**
- `start()`, `run()`, `pause()`, `close()`
- `generate_title()`, `ask_agent()`, `condense()`
- `set_confirmation_policy()`, `set_security_analyzer()`

---

## 🟢 Priority 6: ConversationService

**File:** `openhands-agent-server/openhands/agent_server/conversation_service.py`

### Current State

| Metric | Value | Threshold |
|--------|------:|----------:|
| Lines | 751 | 300 |
| Methods | 30 | 7-8 |
| Git changes (3mo) | 10 | - |

### Proposed Solution

1. Extract inline subscriber classes to separate module
2. Extract webhook handling to `WebhookManager`

---

## Implementation Plan

### Phase 1: Low-Risk Extractions
- [ ] Extract `LLM.MessageFormatter` (isolated, high impact)
- [ ] Split `remote_conversation.py` into modules (reorganization only)
- [ ] Extract Skill loader strategies

### Phase 2: Dependency Injection
- [ ] Add factory parameters to `LLM` for Metrics/Telemetry
- [ ] Add factory parameters to `LocalConversation` for State/Detector

### Phase 3: Server-Side Refactoring
- [ ] Split `EventService` responsibilities
- [ ] Extract `ConversationService` subscribers

---

## Acceptance Criteria

For each refactoring:
- [ ] No external behavior changes
- [ ] All existing tests pass
- [ ] New unit tests for extracted classes
- [ ] Line count of original class reduced by 30%+
- [ ] Each new class has single responsibility

Priority	Class	File	Lines	Methods	Changes (3mo)	Impact
🔴 1	`LLM`	`openhands-sdk/.../llm/llm.py`	1,472	37	22	Critical
🟠 2	`Skill`	`openhands-sdk/.../skills/skill.py`	1,099	28	21	High
🟡 3	`RemoteConversation`	`openhands-sdk/.../remote_conversation.py`	1,257	66	16	Medium-High
🟡 4	`LocalConversation`	`openhands-sdk/.../local_conversation.py`	976	23	17	Medium
🟢 5	`EventService`	`openhands-agent-server/.../event_service.py`	691	40	11	Medium
🟢 6	`ConversationService`	`openhands-agent-server/.../conversation_service.py`	751	30	10	Lower

Metric	Value	Threshold
Lines	1,472	300
Methods	37	7-8
Dependencies	15+	5-6
Git changes (3mo)	22	-

Refactoring Proposal: Large Classes Identified for Decomposition #2274

Description

Summary

📊 Prioritization Matrix

🔴 Priority 1: LLM Class

Current State

Responsibilities (God Class)

Code Smells

Proposed Solution

1. Extract LLMCapabilities class

2. Extract MessageFormatter class

3. Dependency Injection for Metrics & Telemetry

Target Structure

🟠 Priority 2: Skill Module

Current State

Responsibilities (God Class)

Code Smells

Proposed Solution

1. Strategy Pattern: Skill Loaders

2. Extract SkillDiscovery class

3. Extract MarketplaceClient class

Target Structure

🟡 Priority 3: RemoteConversation

Current State

Issue

Code Smells

Proposed Solution

🟡 Priority 4: LocalConversation

Current State

Code Smells

Proposed Solution

1. Extract PluginManager class

2. Dependency Injection

🟢 Priority 5: EventService

Current State

Code Smells

Proposed Solution

🟢 Priority 6: ConversationService

Current State

Proposed Solution

Implementation Plan

Phase 1: Low-Risk Extractions

Phase 2: Dependency Injection

Phase 3: Server-Side Refactoring

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Extract `LLMCapabilities` class

2. Extract `MessageFormatter` class

2. Extract `SkillDiscovery` class

3. Extract `MarketplaceClient` class

1. Extract `PluginManager` class