Skip to content

feat: Add conversation.interrupt() for immediate LLM cancellation#2206

Draft
malhotra5 wants to merge 15 commits intomainfrom
feature/conversation-interrupt
Draft

feat: Add conversation.interrupt() for immediate LLM cancellation#2206
malhotra5 wants to merge 15 commits intomainfrom
feature/conversation-interrupt

Conversation

@malhotra5
Copy link
Collaborator

@malhotra5 malhotra5 commented Feb 25, 2026

Summary

Implement conversation.interrupt() for immediate agent interruption with LLM cancellation. This uses Option 3: Async Internally with Sync API - running litellm.acompletion() in a background event loop thread and cancelling via asyncio.Task.cancel().

Key Changes

New Exception:

  • LLMCancelledError - raised when LLM calls are cancelled via interrupt

New Event:

  • InterruptEvent - emitted when agent is interrupted (distinct from PauseEvent)

LLM Class Modifications:

  • Now uses async completion internally (litellm.acompletion) while maintaining sync API
  • Added cancel() method - can be called from any thread for immediate cancellation
  • Added is_cancelled() method - check if current task was cancelled
  • Background event loop runs in a daemon thread, created lazily on first LLM call

Conversation:

  • Added interrupt() method that cancels all LLMs and sets status to PAUSED

Benefits

  • Instant cancellation - No 100ms polling delay
  • HTTP connection closed - httpx closes connection on CancelledError
  • Sync API preserved - No breaking changes to existing code
  • Minimal overhead - Single daemon thread vs thread pool

Example Usage

import signal
from openhands.sdk import LLM, Agent, Conversation

conversation = Conversation(agent=agent, workspace="./workspace")

def signal_handler(_signum, _frame):
    conversation.interrupt()  # Immediately cancels any in-flight LLM call!

signal.signal(signal.SIGINT, signal_handler)

conversation.send_message("Write a very long essay...")
conversation.run()  # Press Ctrl+C to interrupt immediately

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
    • 13 tests for LLM interrupt (tests/sdk/llm/test_llm_interrupt.py)
    • 14 tests for conversation interrupt (tests/sdk/conversation/test_conversation_interrupt.py)
  • If there is an example, have you run the example to make sure that it works?
    • Added examples/01_standalone_sdk/43_interrupt_example.py
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?
    • ⚠️ Some existing tests need updating to mock litellm_acompletion instead of litellm_completion

Note on Test Failures

The change from sync to async completion internally requires updating test mocks from litellm_completion to litellm_acompletion. The new interrupt tests pass, but some existing tests need this update.

Fixes #2208


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:faff3c9-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-faff3c9-python \
  ghcr.io/openhands/agent-server:faff3c9-python

All tags pushed for this build

ghcr.io/openhands/agent-server:faff3c9-golang-amd64
ghcr.io/openhands/agent-server:faff3c9-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:faff3c9-golang-arm64
ghcr.io/openhands/agent-server:faff3c9-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:faff3c9-java-amd64
ghcr.io/openhands/agent-server:faff3c9-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:faff3c9-java-arm64
ghcr.io/openhands/agent-server:faff3c9-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:faff3c9-python-amd64
ghcr.io/openhands/agent-server:faff3c9-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:faff3c9-python-arm64
ghcr.io/openhands/agent-server:faff3c9-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:faff3c9-golang
ghcr.io/openhands/agent-server:faff3c9-java
ghcr.io/openhands/agent-server:faff3c9-python

About Multi-Architecture Support

  • Each variant tag (e.g., faff3c9-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., faff3c9-python-amd64) are also available if needed

Implement Option 3 (Async Internally with Sync API) for interrupting
agent execution with immediate LLM cancellation:

- Add LLMCancelledError exception for cancelled LLM calls
- Add InterruptEvent for visibility when agent is interrupted
- Modify LLM class to use async completion internally (litellm.acompletion)
  while maintaining sync API, enabling Task.cancel() for immediate cancellation
- Add LLM.cancel() method that can be called from any thread
- Add conversation.interrupt() method that cancels LLM and sets PAUSED status
- Add comprehensive unit tests for interrupt functionality

The async internal implementation allows:
- Instant cancellation (no polling delay)
- HTTP connection closure on cancel
- Sync API preserved (no breaking changes)

Co-authored-by: openhands <openhands@all-hands.dev>
Add example script that shows how to use conversation.interrupt() to
immediately cancel in-flight LLM calls when user presses Ctrl+C.

Features demonstrated:
- Signal handling for Ctrl+C
- conversation.interrupt() for immediate cancellation
- Timing measurement from Ctrl+C to full stop
- Complex reasoning task that benefits from interrupt capability

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

📁 PR Artifacts Notice

This PR contains a .pr/ directory with PR-specific documents. This directory will be automatically removed when the PR is approved.

For fork PRs: Manual removal is required before merging.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

API breakage checks (Griffe)

Result: Passed

Action log

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

🤔 I always thought that when we do this, we really should make an acompletion ourselves. An async path, following the usual code design patterns of async alternatives to sync methods/execution paths.

I think I heard concerns that async ran amok in V0 and therefore we're too worried to let it happen again. I don't think those concerns are warranted in this case, if we can follow proven ways to make it work reasonably. Hey, this is not the Saas.

Just food for thought. I think it's a bit of a confusion somewhere, between the Saas adding asyncs everywhere, and making async pathways like every Python app out there does, but maybe it's fine the way the PR goes, too.

- Add interrupt() method to RemoteConversation
- Add __deepcopy__ to LLM to handle unpicklable thread state
- Update test mocks from litellm_completion to litellm_acompletion
- Fix streaming tests to use async iterators
- Add interrupt() method to MockConversation in tests
- Keep litellm_completion_cost as sync (not async)

All 2515 tests pass (2448 SDK + 67 cross tests).

Co-authored-by: openhands <openhands@all-hands.dev>
@malhotra5
Copy link
Collaborator Author

Yeah I think there overhead in maintaining both sync and async contracts - ideally we'd have both though.

WRT to SAAS influencing the sync only APIs in the sdk, @tofarr has the best context

@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py4072494%289, 294, 322, 365, 383, 399, 461, 635–636, 639, 687, 825, 833, 835, 846, 848–850, 875, 1068, 1072, 1142, 1149–1150
   remote_conversation.py59610782%132, 159, 172, 174–177, 187, 209–210, 215–218, 294, 304–306, 312, 353, 485–488, 490, 510–514, 519–522, 525, 668–669, 673–674, 685, 704–705, 724, 735–736, 756–759, 761–762, 780, 786–787, 791, 796–797, 803–805, 808–812, 814–815, 819, 821–829, 831, 868, 995, 1063–1064, 1068, 1073–1077, 1083–1089, 1102–1103, 1128, 1187, 1194, 1200–1201, 1251, 1257–1258, 1272–1273
openhands-sdk/openhands/sdk/event
   user_action.py24195%21
openhands-sdk/openhands/sdk/llm
   llm.py5188284%439, 496, 621, 625, 630, 667, 806, 912, 914–915, 943, 1163, 1173–1175, 1179–1183, 1191–1193, 1201–1203, 1206–1207, 1211, 1213–1215, 1217, 1293–1294, 1491–1492, 1501, 1514, 1516–1521, 1523–1540, 1543–1547, 1549–1550, 1556–1565, 1616, 1618
openhands-sdk/openhands/sdk/llm/exceptions
   types.py63296%104, 109
openhands-sdk/openhands/sdk/llm/mixins
   async_cancellation.py65296%124, 160
TOTAL19938576471% 

@malhotra5 malhotra5 marked this pull request as ready for review February 25, 2026 21:29
@malhotra5 malhotra5 marked this pull request as draft February 25, 2026 21:29
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Worth Merging with Considerations

Taste Rating: This solves a real problem (immediate LLM interruption) with a pragmatic approach. The threading model is reasonably simple, but adds non-trivial complexity to the LLM class.

Key Insight: You're maintaining a sync API by running async internally with background threads. This preserves backward compatibility but increases cognitive load - every LLM instance now manages its own event loop thread. The __deepcopy__ workaround is a symptom of this complexity.

What's Good ✅

  • Solves actual user pain (can't interrupt expensive LLM calls)
  • Thread-safe implementation with proper locking
  • Comprehensive test coverage (27 tests + example)
  • Tests verify real behavior, not just mocks
  • Backward compatible sync API

Critical Observations 🔴

Resource Management: Event loop threads are never explicitly cleaned up. While daemon threads die with the process, long-running applications that create/destroy many LLM instances will leak threads. Consider adding a close() or cleanup method.

Thread Proliferation: Each LLM gets its own event loop thread. In systems with many LLMs (llm_registry), this creates many threads. You claim "minimal overhead" but this could be noticeable in high-concurrency scenarios.

Test Breaking: Changing from litellm_completion to litellm_acompletion is a semi-breaking change for anyone mocking SDK internals. The PR acknowledges this, but it's worth highlighting the impact.

Improvement Opportunities 🟡

See inline comments for specific suggestions.

VERDICT: ✅ Worth merging - the complexity is justified by the feature, implementation is sound, and tests are thorough.

Add close() method to LLM class to stop the background event loop
and cleanup resources. This addresses the thread leak issue in
long-running applications that create/destroy many LLM instances.

The close() method:
- Cancels any in-flight task first
- Stops the event loop
- Waits for the thread to finish (with timeout)
- Is thread-safe and idempotent

After calling close(), the LLM can still be used - the event loop
will be lazily recreated on the next LLM call.

Co-authored-by: openhands <openhands@all-hands.dev>
Document that cancel() schedules cancellation on the background event
loop but does not block waiting for confirmation. The cancellation
takes effect at the next await point in the LLM call.

Co-authored-by: openhands <openhands@all-hands.dev>
@enyst
Copy link
Collaborator

enyst commented Feb 25, 2026

You want a tough review? :)


To clarify a detail,

WRT to SAAS influencing the sync only APIs in the sdk, @tofarr has the best context

I was talking about V0.

I meant as a psychological thing, the reasoning "let's not do many async like V0" seemed persuasive at a point in time, because V0 has done it too much. Async scars, if you will. 😅

Not a direct relation today.

I think the best way to deal with it is to be careful, sure, but maybe we can consider to apply good, known practices, not avoiding it at all costs.

The __deepcopy__ method is required because:
1. LLM contains threading primitives (asyncio.AbstractEventLoop,
   threading.Thread, threading.Lock, asyncio.Task) that cannot be
   pickled or deepcopied by Python's standard mechanisms
2. It is invoked by both copy.deepcopy() and Pydantic's
   model_copy(deep=True)

While the current codebase uses shallow copies, this ensures the LLM
class remains copyable for future use or by external users.

Co-authored-by: openhands <openhands@all-hands.dev>
…ancellation

Replace _current_task (asyncio.Task) with _current_future (concurrent.futures.Future)
for LLM call cancellation. The Future can be cancelled directly from any thread,
eliminating the race condition where the event loop could stop before processing
a scheduled task cancellation.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Failed

Log excerpt (first 1000 characters)
{"asctime": "2026-03-09 18:42:02,707", "levelname": "WARNING", "name": "openhands.agent_server.config", "filename": "config.py", "lineno": 173, "message": "\u26a0\ufe0f OH_SECRET_KEY was not defined. Secrets will not be persisted between restarts."}
::error title=openhands-agent-server REST API::Breaking REST API change detected without MINOR version bump (1.12.0 -> 1.12.0).

Breaking REST API changes detected compared to baseline release:
- added '#/components/schemas/InterruptEvent' to the '/items/anyOf[#/components/schemas/Event]/' response property 'oneOf' list for the response status '200'
- added '#/components/schemas/InterruptEvent' to the 'items/items/' response property 'oneOf' list for the response status '200'
- added '#/components/schemas/InterruptEvent' to the response body 'oneOf' list for the response status '200'
- the 'file' request property type/format changed from 'string'/'' to 'string'/'binary'
/home/runner/work/software-agent-sdk/software-agent-sdk/.venv/lib/pytho

Action log

Move async event loop management and cancellation support from LLM class
into a dedicated mixin to improve code readability and maintainability.

Changes:
- Create AsyncCancellationMixin in openhands/sdk/llm/mixins/async_cancellation.py
  - Contains _ensure_async_loop(), cancel(), is_cancelled(), _close_async_resources()
  - Contains _run_async_with_cancellation() helper for running async code with cancel
  - Contains _reset_async_state() for deepcopy support
- Update LLM class to inherit from AsyncCancellationMixin
- Simplify _transport_call and responses call to use mixin helper
- Keep LLM.close() as a thin wrapper around _close_async_resources()

This separates async/threading machinery from sync LLM logic, making the
code easier to understand and maintain while preserving all functionality.

Co-authored-by: openhands <openhands@all-hands.dev>
@malhotra5 malhotra5 requested a review from all-hands-bot March 9, 2026 17:20
@malhotra5 malhotra5 marked this pull request as ready for review March 9, 2026 17:20
@malhotra5 malhotra5 marked this pull request as draft March 9, 2026 17:20
@malhotra5 malhotra5 added the review-this This label triggers a PR review by OpenHands label Mar 9, 2026
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Eval Risk Requires Human Review

Taste Rating: This solves a real problem (immediate LLM interruption) with a pragmatic approach. The threading model adds complexity but enables independent cancellation.


Eval Risk Flag

🟠 Important: This PR changes how LLM calls execute internally (async vs sync). While the API is preserved, execution paths differ. According to repo guidelines, this requires human review with lightweight evals before merging to verify no unexpected impact on benchmark performance.

Changes affecting execution:

  • Internal switch from litellm_completion to litellm_acompletion
  • Background event loop thread for async execution
  • Future-based cancellation mechanism
  • Async streaming iteration vs sync streaming

Recommendation: Run lightweight eval on a representative benchmark (e.g., SWE-bench Verified, mini subset) to confirm no performance regression before merging.


Key Observations

Threading Model Tradeoff (🟢 Acceptable):
Every LLM instance now gets its own event loop thread. In long-running applications with many LLM instances, this could accumulate threads. However, this is a reasonable tradeoff:

  • ✅ Enables per-instance cancellation
  • ✅ Avoids shared state complexity
  • ✅ Daemon threads are cleaned up on process exit
  • close() method provides explicit cleanup when needed
  • ✅ Mixin separation keeps complexity isolated

If thread count becomes an issue in practice, you could consider a shared event loop pool, but YAGNI applies here - solve the real problem first.

Resource Management (🟡 Suggestion):
The close() method requires explicit calls, which developers might forget. Consider implementing the context manager protocol (__enter__/__exit__) for automatic cleanup:

def __enter__(self) -> LLM:
    return self

def __exit__(self, *args) -> None:
    self.close()

This would enable:

with LLM(...) as llm:
    llm.completion(...)  # Automatic cleanup on exit

Test Coverage (✅ Good):
Comprehensive tests (13 LLM + 14 conversation interrupt tests). Good coverage of:

  • Thread safety
  • Resource cleanup
  • Cancellation timing
  • Multiple interrupt scenarios
  • Edge cases (cancel when no task, reuse after cancel, etc.)

Verdict

Worth merging after eval verification. The core implementation is sound and solves a real user need. The complexity is justified by the benefit (immediate cancellation) and is well-isolated in the mixin.

Next Steps:

  1. Human maintainer runs lightweight eval to verify no performance impact
  2. Consider adding context manager protocol (optional enhancement)
  3. Merge after eval verification

Replace the mixin pattern with a standalone AsyncRunner class that the LLM
owns via composition. This eliminates implicit attribute contracts between
the mixin and host class.

Changes:
- AsyncRunner is now a self-contained class with explicit interface:
  run(), cancel(), is_cancelled(), close()
- LLM class uses _async_runner attribute instead of inheriting mixin
- Simplified __deepcopy__ - just creates fresh AsyncRunner for copy
- Fixed close() to perform cleanup outside the lock (avoids holding
  lock during thread.join())
- Updated tests to use new AsyncRunner interface

Co-authored-by: openhands <openhands@all-hands.dev>
…enser tests

The AsyncRunner refactor requires these attributes on LLM instances.
Update the mock fixture to include them.

Co-authored-by: openhands <openhands@all-hands.dev>
@xingyaoww xingyaoww requested a review from all-hands-bot March 10, 2026 16:33
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Pragmatic Solution with Eval Risk

Taste Rating: This solves a real problem (immediate LLM interruption) with a pragmatic approach. The threading model is inherent to the problem (immediate cancellation + backward compatibility). Implementation is clean and well-tested.

Key Insight: The async-internal/sync-external pattern is the right trade-off here. You maintain backward compatibility while enabling immediate cancellation through task cancellation rather than polling.

Eval Risk Flag: ⚠️ This PR changes LLM execution paths (sync → async internally). Per repo guidelines, this requires human review with lightweight evals before merging to verify no unexpected impact on benchmark performance.

Minor Improvements: See inline comments for optional enhancements.

The event loop runs in a daemon thread and is used to execute async
coroutines. This allows synchronous callers to use async internally
while supporting immediate cancellation.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Theoretical race condition - if two threads call run() simultaneously before the loop is created, both could create event loops. This is unlikely in practice (only agent loop calls run(), other threads call cancel()), but worth noting.

Consider adding a lock around the check-and-create:

with self._lock:
    if self._loop is None:
        self._loop = asyncio.new_event_loop()
        # ... rest of initialization

Not blocking - this is a corner case that is unlikely to manifest given typical usage patterns.

API authentication, retry logic, tool calling capabilities, and async
cancellation support.

Example:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: The class docstring could mention that long-running applications should call close() when the LLM is no longer needed to prevent thread leaks.

Add a note like:

    Note:
        In long-running applications that create/destroy many LLM instances,
        call `close()` when done to clean up the background event loop thread.
        The LLM can still be used after close() - the loop will be recreated lazily.

The close() method itself is well-documented, but surfacing this in the main docstring would help users discover it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-this This label triggers a PR review by OpenHands

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ability to immediately terminate agent

5 participants