Skip to content

Fix: Check Process.alive? before sending to SSE handler#239

Open
mellelieuwes wants to merge 2 commits intocloudwalk:mainfrom
eyra:fix/stale-sse-handler
Open

Fix: Check Process.alive? before sending to SSE handler#239
mellelieuwes wants to merge 2 commits intocloudwalk:mainfrom
eyra:fix/stale-sse-handler

Conversation

@mellelieuwes
Copy link

@mellelieuwes mellelieuwes commented Jan 16, 2026

Summary

This PR includes two related fixes for session handling:

1. Fix stale SSE handler race condition

When an SSE handler process dies but the transport hasn't yet processed the :DOWN message, responses could be silently lost.

The issue:

  1. SSE handler process dies (network drop, crash, etc.)
  2. :DOWN message is queued to transport GenServer
  3. Before transport processes :DOWN, a new request calls get_sse_handler
  4. get_sse_handler returns the stale PID
  5. send/2 silently drops the message to the dead process
  6. Client receives HTTP 202 but never gets the actual response

The fix:

  • Add Process.alive? check in route_sse_response before sending
  • If the handler is stale, clean up the entry and establish a new SSE connection

2. Add session_expired error type

When a session is missing or expired (e.g., after server restart), return a clear error message instead of the vague "Server not initialized".

Changes:

  • New error code -32001 for session_expired
  • Clear message: "Session expired or not initialized. Please reconnect."
  • Enables MCP clients to detect and potentially auto-reconnect

Test plan

  • Added unit tests demonstrating the race condition
  • All existing tests pass (445 tests, 0 failures)
  • Verified fixes in production application (Flux MCP server)

🤖 Generated with Claude Code

This fixes a race condition where responses could be silently lost when
an SSE handler process died but the transport hadn't yet processed the
:DOWN message.

The issue:
1. SSE handler process dies (network drop, crash, etc.)
2. :DOWN message is queued to transport GenServer
3. Before transport processes :DOWN, a new request calls get_sse_handler
4. get_sse_handler returns the stale PID
5. send/2 silently drops the message to the dead process
6. Client receives HTTP 202 but never gets the actual response

The fix adds a Process.alive? check in route_sse_response before
sending. If the handler is stale, it cleans up the entry and
establishes a new SSE connection for the request.
Instead of returning a generic "Server not initialized" error when a
session is missing or expired, return a specific session_expired error
with code -32001 and message "Session expired or not initialized.
Please reconnect."

This gives clients:
1. A specific error code (-32001) they can detect and handle
2. A clear message telling users what to do
3. Potential for auto-reconnect in MCP clients
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant