feat: Add conversation.interrupt() for immediate LLM cancellation#2206
feat: Add conversation.interrupt() for immediate LLM cancellation#2206
Conversation
Implement Option 3 (Async Internally with Sync API) for interrupting agent execution with immediate LLM cancellation: - Add LLMCancelledError exception for cancelled LLM calls - Add InterruptEvent for visibility when agent is interrupted - Modify LLM class to use async completion internally (litellm.acompletion) while maintaining sync API, enabling Task.cancel() for immediate cancellation - Add LLM.cancel() method that can be called from any thread - Add conversation.interrupt() method that cancels LLM and sets PAUSED status - Add comprehensive unit tests for interrupt functionality The async internal implementation allows: - Instant cancellation (no polling delay) - HTTP connection closure on cancel - Sync API preserved (no breaking changes) Co-authored-by: openhands <openhands@all-hands.dev>
Add example script that shows how to use conversation.interrupt() to immediately cancel in-flight LLM calls when user presses Ctrl+C. Features demonstrated: - Signal handling for Ctrl+C - conversation.interrupt() for immediate cancellation - Timing measurement from Ctrl+C to full stop - Complex reasoning task that benefits from interrupt capability Co-authored-by: openhands <openhands@all-hands.dev>
|
📁 PR Artifacts Notice This PR contains a
|
API breakage checks (Griffe)Result: Passed |
enyst
left a comment
There was a problem hiding this comment.
Thank you!
🤔 I always thought that when we do this, we really should make an acompletion ourselves. An async path, following the usual code design patterns of async alternatives to sync methods/execution paths.
I think I heard concerns that async ran amok in V0 and therefore we're too worried to let it happen again. I don't think those concerns are warranted in this case, if we can follow proven ways to make it work reasonably. Hey, this is not the Saas.
Just food for thought. I think it's a bit of a confusion somewhere, between the Saas adding asyncs everywhere, and making async pathways like every Python app out there does, but maybe it's fine the way the PR goes, too.
- Add interrupt() method to RemoteConversation - Add __deepcopy__ to LLM to handle unpicklable thread state - Update test mocks from litellm_completion to litellm_acompletion - Fix streaming tests to use async iterators - Add interrupt() method to MockConversation in tests - Keep litellm_completion_cost as sync (not async) All 2515 tests pass (2448 SDK + 67 cross tests). Co-authored-by: openhands <openhands@all-hands.dev>
|
Yeah I think there overhead in maintaining both sync and async contracts - ideally we'd have both though. WRT to SAAS influencing the sync only APIs in the sdk, @tofarr has the best context |
Coverage Report •
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Worth Merging with Considerations
Taste Rating: This solves a real problem (immediate LLM interruption) with a pragmatic approach. The threading model is reasonably simple, but adds non-trivial complexity to the LLM class.
Key Insight: You're maintaining a sync API by running async internally with background threads. This preserves backward compatibility but increases cognitive load - every LLM instance now manages its own event loop thread. The __deepcopy__ workaround is a symptom of this complexity.
What's Good ✅
- Solves actual user pain (can't interrupt expensive LLM calls)
- Thread-safe implementation with proper locking
- Comprehensive test coverage (27 tests + example)
- Tests verify real behavior, not just mocks
- Backward compatible sync API
Critical Observations 🔴
Resource Management: Event loop threads are never explicitly cleaned up. While daemon threads die with the process, long-running applications that create/destroy many LLM instances will leak threads. Consider adding a close() or cleanup method.
Thread Proliferation: Each LLM gets its own event loop thread. In systems with many LLMs (llm_registry), this creates many threads. You claim "minimal overhead" but this could be noticeable in high-concurrency scenarios.
Test Breaking: Changing from litellm_completion to litellm_acompletion is a semi-breaking change for anyone mocking SDK internals. The PR acknowledges this, but it's worth highlighting the impact.
Improvement Opportunities 🟡
See inline comments for specific suggestions.
VERDICT: ✅ Worth merging - the complexity is justified by the feature, implementation is sound, and tests are thorough.
Add close() method to LLM class to stop the background event loop and cleanup resources. This addresses the thread leak issue in long-running applications that create/destroy many LLM instances. The close() method: - Cancels any in-flight task first - Stops the event loop - Waits for the thread to finish (with timeout) - Is thread-safe and idempotent After calling close(), the LLM can still be used - the event loop will be lazily recreated on the next LLM call. Co-authored-by: openhands <openhands@all-hands.dev>
Document that cancel() schedules cancellation on the background event loop but does not block waiting for confirmation. The cancellation takes effect at the next await point in the LLM call. Co-authored-by: openhands <openhands@all-hands.dev>
|
You want a tough review? :) To clarify a detail,
I was talking about V0. I meant as a psychological thing, the reasoning "let's not do many async like V0" seemed persuasive at a point in time, because V0 has done it too much. Async scars, if you will. 😅 Not a direct relation today. I think the best way to deal with it is to be careful, sure, but maybe we can consider to apply good, known practices, not avoiding it at all costs. |
The __deepcopy__ method is required because: 1. LLM contains threading primitives (asyncio.AbstractEventLoop, threading.Thread, threading.Lock, asyncio.Task) that cannot be pickled or deepcopied by Python's standard mechanisms 2. It is invoked by both copy.deepcopy() and Pydantic's model_copy(deep=True) While the current codebase uses shallow copies, this ensures the LLM class remains copyable for future use or by external users. Co-authored-by: openhands <openhands@all-hands.dev>
…ancellation Replace _current_task (asyncio.Task) with _current_future (concurrent.futures.Future) for LLM call cancellation. The Future can be cancelled directly from any thread, eliminating the race condition where the event loop could stop before processing a scheduled task cancellation. Co-authored-by: openhands <openhands@all-hands.dev>
Agent server REST API breakage checks (OpenAPI)Result: Failed Log excerpt (first 1000 characters) |
Move async event loop management and cancellation support from LLM class into a dedicated mixin to improve code readability and maintainability. Changes: - Create AsyncCancellationMixin in openhands/sdk/llm/mixins/async_cancellation.py - Contains _ensure_async_loop(), cancel(), is_cancelled(), _close_async_resources() - Contains _run_async_with_cancellation() helper for running async code with cancel - Contains _reset_async_state() for deepcopy support - Update LLM class to inherit from AsyncCancellationMixin - Simplify _transport_call and responses call to use mixin helper - Keep LLM.close() as a thin wrapper around _close_async_resources() This separates async/threading machinery from sync LLM logic, making the code easier to understand and maintain while preserving all functionality. Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Eval Risk Requires Human Review
Taste Rating: This solves a real problem (immediate LLM interruption) with a pragmatic approach. The threading model adds complexity but enables independent cancellation.
Eval Risk Flag
🟠 Important: This PR changes how LLM calls execute internally (async vs sync). While the API is preserved, execution paths differ. According to repo guidelines, this requires human review with lightweight evals before merging to verify no unexpected impact on benchmark performance.
Changes affecting execution:
- Internal switch from
litellm_completiontolitellm_acompletion - Background event loop thread for async execution
- Future-based cancellation mechanism
- Async streaming iteration vs sync streaming
Recommendation: Run lightweight eval on a representative benchmark (e.g., SWE-bench Verified, mini subset) to confirm no performance regression before merging.
Key Observations
Threading Model Tradeoff (🟢 Acceptable):
Every LLM instance now gets its own event loop thread. In long-running applications with many LLM instances, this could accumulate threads. However, this is a reasonable tradeoff:
- ✅ Enables per-instance cancellation
- ✅ Avoids shared state complexity
- ✅ Daemon threads are cleaned up on process exit
- ✅
close()method provides explicit cleanup when needed - ✅ Mixin separation keeps complexity isolated
If thread count becomes an issue in practice, you could consider a shared event loop pool, but YAGNI applies here - solve the real problem first.
Resource Management (🟡 Suggestion):
The close() method requires explicit calls, which developers might forget. Consider implementing the context manager protocol (__enter__/__exit__) for automatic cleanup:
def __enter__(self) -> LLM:
return self
def __exit__(self, *args) -> None:
self.close()This would enable:
with LLM(...) as llm:
llm.completion(...) # Automatic cleanup on exitTest Coverage (✅ Good):
Comprehensive tests (13 LLM + 14 conversation interrupt tests). Good coverage of:
- Thread safety
- Resource cleanup
- Cancellation timing
- Multiple interrupt scenarios
- Edge cases (cancel when no task, reuse after cancel, etc.)
Verdict
Worth merging after eval verification. The core implementation is sound and solves a real user need. The complexity is justified by the benefit (immediate cancellation) and is well-isolated in the mixin.
Next Steps:
- Human maintainer runs lightweight eval to verify no performance impact
- Consider adding context manager protocol (optional enhancement)
- Merge after eval verification
Replace the mixin pattern with a standalone AsyncRunner class that the LLM owns via composition. This eliminates implicit attribute contracts between the mixin and host class. Changes: - AsyncRunner is now a self-contained class with explicit interface: run(), cancel(), is_cancelled(), close() - LLM class uses _async_runner attribute instead of inheriting mixin - Simplified __deepcopy__ - just creates fresh AsyncRunner for copy - Fixed close() to perform cleanup outside the lock (avoids holding lock during thread.join()) - Updated tests to use new AsyncRunner interface Co-authored-by: openhands <openhands@all-hands.dev>
…enser tests The AsyncRunner refactor requires these attributes on LLM instances. Update the mock fixture to include them. Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Pragmatic Solution with Eval Risk
Taste Rating: This solves a real problem (immediate LLM interruption) with a pragmatic approach. The threading model is inherent to the problem (immediate cancellation + backward compatibility). Implementation is clean and well-tested.
Key Insight: The async-internal/sync-external pattern is the right trade-off here. You maintain backward compatibility while enabling immediate cancellation through task cancellation rather than polling.
Eval Risk Flag:
Minor Improvements: See inline comments for optional enhancements.
| The event loop runs in a daemon thread and is used to execute async | ||
| coroutines. This allows synchronous callers to use async internally | ||
| while supporting immediate cancellation. | ||
| """ |
There was a problem hiding this comment.
🟡 Suggestion: Theoretical race condition - if two threads call run() simultaneously before the loop is created, both could create event loops. This is unlikely in practice (only agent loop calls run(), other threads call cancel()), but worth noting.
Consider adding a lock around the check-and-create:
with self._lock:
if self._loop is None:
self._loop = asyncio.new_event_loop()
# ... rest of initializationNot blocking - this is a corner case that is unlikely to manifest given typical usage patterns.
| API authentication, retry logic, tool calling capabilities, and async | ||
| cancellation support. | ||
|
|
||
| Example: |
There was a problem hiding this comment.
🟡 Suggestion: The class docstring could mention that long-running applications should call close() when the LLM is no longer needed to prevent thread leaks.
Add a note like:
Note:
In long-running applications that create/destroy many LLM instances,
call `close()` when done to clean up the background event loop thread.
The LLM can still be used after close() - the loop will be recreated lazily.
The close() method itself is well-documented, but surfacing this in the main docstring would help users discover it.
Summary
Implement
conversation.interrupt()for immediate agent interruption with LLM cancellation. This uses Option 3: Async Internally with Sync API - runninglitellm.acompletion()in a background event loop thread and cancelling viaasyncio.Task.cancel().Key Changes
New Exception:
LLMCancelledError- raised when LLM calls are cancelled via interruptNew Event:
InterruptEvent- emitted when agent is interrupted (distinct fromPauseEvent)LLM Class Modifications:
litellm.acompletion) while maintaining sync APIcancel()method - can be called from any thread for immediate cancellationis_cancelled()method - check if current task was cancelledConversation:
interrupt()method that cancels all LLMs and sets status to PAUSEDBenefits
CancelledErrorExample Usage
Checklist
tests/sdk/llm/test_llm_interrupt.py)tests/sdk/conversation/test_conversation_interrupt.py)examples/01_standalone_sdk/43_interrupt_example.pylitellm_acompletioninstead oflitellm_completionNote on Test Failures
The change from sync to async completion internally requires updating test mocks from
litellm_completiontolitellm_acompletion. The new interrupt tests pass, but some existing tests need this update.Fixes #2208
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:faff3c9-pythonRun
All tags pushed for this build
About Multi-Architecture Support
faff3c9-python) is a multi-arch manifest supporting both amd64 and arm64faff3c9-python-amd64) are also available if needed