Skip to content

[Feature]: Stateful MCP Transport Migration Plan #37

@johlju

Description

@johlju

Problem Statement

As a user I want to call the MCP server tools without it returning error messages when the session has dropped. I can be seen wehn using VS Code as the MCP client, in the log output for the client. Don't have the actuall error message. Note, even if it errors the MCP server still works as stateless.

Proposed Solution

Summary

Migrate the MCP transport from the current stateless Next route to a stateful
Cloudflare-backed Streamable HTTP implementation that preserves /api/mcp,
supports persistent sessions across requests, and properly serves the
standalone GET SSE channel used for async notifications.

The target runtime is:

  • production: /api/mcp on the existing Cloudflare deployment
  • local MCP testing: http://localhost:8787/api/mcp via Wrangler preview
  • web app dev: http://localhost:3000 can remain on next dev, but it is no
    longer the source of truth for notification-capable MCP behavior

Implementation Changes

Cloudflare and routing

  • Replace the direct main: ".open-next/worker.js" deployment entry with a
    custom worker wrapper that:
    • intercepts /api/mcp
    • delegates all non-MCP traffic to the generated OpenNext worker
  • Keep the public MCP path /api/mcp unchanged.
  • Add a Durable Object binding in wrangler.jsonc, for example
    MCP_SESSION_DO.
  • Add a Wrangler migration entry with a new tag and new_classes for the MCP
    session Durable Object.
  • Export the Durable Object class from the custom worker entrypoint so Wrangler
    can register it.
  • Do not change current routes, assets, images, or D1 bindings unless required
    by the custom wrapper.
  • Ensure /api/mcp is never cached and that streaming headers are preserved.

Stateful MCP sessions

  • Use one Durable Object per MCP session.
  • Route requests by Mcp-Session-Id:
    • POST initialize without a session header creates a new session and
      returns the new session ID
    • later GET, POST, and DELETE requests route to the same Durable Object
      using Mcp-Session-Id
    • non-initialize requests without a session header are rejected
  • Keep these objects alive for the session lifetime inside the Durable Object:
    • RequirementsService
    • McpServer
    • stateful WebStandardStreamableHTTPServerTransport
    • session-scoped EventStore
  • Persist enough session state in Durable Object storage to survive cold
    starts:
    • session_id
    • initialized
    • created_at
    • last_access_at
    • initialize request metadata
    • stored SSE replay events
  • On Durable Object cold start, rebuild the server and transport, restore
    state, and continue serving the same session.
  • On DELETE, close the transport and remove session metadata and replay
    events.

SSE and async notifications

  • Enable the standalone GET SSE notification channel by reusing the same
    transport instance for the full session lifetime.
  • Implement a Durable Object storage-backed EventStore with:
    • storeEvent
    • getStreamIdForEventId
    • replayEventsAfter
  • Configure a transport retryInterval.
  • Send an initial SSE comment or heartbeat chunk when the standalone GET
    stream is opened so VS Code/Copilot does not interpret the channel as failed
    before the first real notification.
  • Add session and event cleanup for stale sessions and old replay history.

MCP server refactor

  • Refactor createKravhanteringMcpServer so it no longer closes over a
    concrete Request.
  • Derive MCP request context from SDK callback extra.requestInfo.headers and
    extra.sessionId.
  • Add a request-context helper for MCP that accepts headers instead of a full
    Request.
  • Keep all tool names, schemas, resource URIs, and requirement business logic
    unchanged.

Local workflow

  • Use Wrangler preview on port 8787 as the supported local MCP endpoint.
  • Keep next dev on port 3000 for the web app only.
  • No devcontainer port change is required if 8787 remains forwarded.
  • Update the MCP docs so local VS Code/Copilot configuration points to
    http://localhost:8787/api/mcp for notification-capable testing.

Cloudflare Configuration Required

  • wrangler.jsonc
    • set main to the custom worker wrapper entrypoint
    • add durable_objects.bindings for the MCP session class
    • add migrations with a unique tag and the new Durable Object class
  • Cloudflare deployment
    • deploy the worker wrapper and Durable Object migration in the same release
    • keep the existing custom domain route for the app so /api/mcp remains
      reachable on the same hostname
  • Response/header behavior for /api/mcp
    • preserve Mcp-Session-Id
    • preserve MCP-Protocol-Version
    • preserve Last-Event-ID
    • preserve Authorization
    • preserve Accept
    • send Content-Type: text/event-stream, Cache-Control: no-cache, no-transform, and Connection: keep-alive on SSE responses
  • Operational setup
    • add logging for session lifecycle and SSE lifecycle
    • temporarily increase observability around /api/mcp during rollout
    • avoid putting redirect or auth layers in front of /api/mcp that break
      streaming or strip MCP headers

Test Plan

  • Unit tests
    • initialize creates a session and returns a session ID
    • requests with Mcp-Session-Id route to the same Durable Object
    • non-initialize requests without a session header fail
    • DELETE closes and removes the session
    • Durable Object event store persists and replays events after
      Last-Event-ID
  • Transport tests
    • initialize plus subsequent calls reuse the same session
    • standalone GET SSE stream opens successfully
    • reconnect resumes from stored events
    • tools, resources, and MCP Apps still work on the stateful transport
  • Manual verification
    • run Wrangler preview on 8787
    • connect VS Code/Copilot to http://localhost:8787/api/mcp
    • confirm repeated async notifications retry logs disappear or drop to only
      real disconnects
    • keep the session open for at least 15 minutes and verify no periodic retry
      noise
  • Documentation updates
    • user guide: replace the current stateless local MCP guidance with Wrangler
      preview guidance
    • contributor guide: document the custom worker and Durable Object session
      architecture
    • remove the note that async-notification retry logs are expected for the
      primary runtime

Assumptions And Defaults

  • Keep /api/mcp as the production endpoint.
  • Use a custom OpenNext worker plus Durable Objects, not a separate public MCP
    service.
  • Use Wrangler preview on 8787 as the supported local MCP runtime.
  • Do not change tool schemas, requirement service behavior, or auth scope in
    this phase.
  • Optimize for correct stateful sessions and async notification support rather
    than preserving next dev on port 3000 as the MCP endpoint.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions