Skip to content

Both SSE and StreamableHttp transport require sticky sessions #330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dglozic opened this issue Apr 13, 2025 · 24 comments
Open

Both SSE and StreamableHttp transport require sticky sessions #330

dglozic opened this issue Apr 13, 2025 · 24 comments
Labels
enhancement New feature or request

Comments

@dglozic
Copy link

dglozic commented Apr 13, 2025

Is your feature request related to a problem? Please describe.
Both SSE and StreamableHttp transport rely on caching the transport in memory in the server pod:

const transports: { [sessionId: string]: StreamableHTTPServerTransport } = {};

This does not work when there are multiple replicas/pods behind a load balancer (which is most k8s deployments of MCP servers as services). This means that if the call is not the initial one, and the cached transport is not found in the transports, error is sent out.

We need a solution (and example) that does not involve storing transports in memory because real world deployments of remote MCP servers will have multiple replicas, and demanding sticky session for them is too much of a burden.

If we can ask implementors to cache state that is can survive JSON.stringify/parse roundtrip, they can cache state in distributed caches like Redis session store. But transport object cannot be stored in Redis.

Describe the solution you'd like
A stateless solution that does not involve caching transports in memory, or a stateful solution that can be serialized and stored as strings in caches like Redis.

Describe alternatives you've considered
I seems that streamableHttp transport may be able to work without session ids completely, but the example in code does not shows how. I will try to create an example where sessionID generator is undefined, but if so, we would always need to create a new transport inside request handlers, and not try to reuse them from memory cache. I am not sure this will work.

@dglozic dglozic added the enhancement New feature or request label Apr 13, 2025
@smurf28
Copy link

smurf28 commented Apr 14, 2025

This also troubles me. I want distributed deployment, but it is difficult to achieve.

@dglozic
Copy link
Author

dglozic commented Apr 14, 2025

Nobody deploys a single service replica in a cluster, so this affects all real world remote MCP server implementations. WebSickets based transport would work since it opens and keeps the socket open, but that also does make sense. You want a transport solution that opens the socket for the session duration and then closes it to conserve resources.

@dglozic dglozic changed the title Both SSE and StreamedHttp transport require sticky sessions Both SSE and StreamableHttp transport require sticky sessions Apr 14, 2025
@asprouse
Copy link

It seems possible to use this transport in a "stateless" mode according to comments in the code:

* // Stateless mode - explicitly set session ID to undefined
* const statelessTransport = new StreamableHTTPServerTransport({
* sessionId: undefined,
* });
*
* // Using with pre-parsed request body
* app.post('/mcp', (req, res) => {
* transport.handleRequest(req, res, req.body);
* });
* ```
*
* In stateful mode:
* - Session ID is generated and included in response headers
* - Session ID is always included in initialization responses
* - Requests with invalid session IDs are rejected with 404 Not Found
* - Non-initialization requests without a session ID are rejected with 400 Bad Request
* - State is maintained in-memory (connections, message history)
*
* In stateless mode:
* - Session ID is only included in initialization responses
* - No session validation is performed
*/

I am not sure what you give up going stateless but it seems like this would be preferred for a production server. Do most major clients support this?

@dglozic
Copy link
Author

dglozic commented Apr 16, 2025

It is hard to test until StreamableHttp transport is published in NPM. Right now, the example implies only stateful mode, but description of the transport constructor implies stateless mode is supported.

@asprouse
Copy link

It is hard to test until StreamableHttp transport is published in NPM. Right now, the example implies only stateful mode, but description of the transport constructor implies stateless mode is supported.

@gylove1994 @ihrpr I see that you've been working on this. Is stateless mode really something that is usable?

@jspahrsummers @jerome3o-anthropic Do you know timeline when this stateless code will be published to NPM?

@ihrpr
Copy link
Contributor

ihrpr commented Apr 18, 2025

@asprouse, it's available now in 1.10.1 version package. Also added a folder with examples -- see README and an example for stateless

@ihrpr
Copy link
Contributor

ihrpr commented Apr 18, 2025

@dglozic some hight level notes on distributed deployment are here

@dglozic
Copy link
Author

dglozic commented Apr 18, 2025

Yeah, this seems what I was looking for, let me try it.

@dglozic
Copy link
Author

dglozic commented Apr 18, 2025

Actually, there is still room for clarity in README.md:

you can use a database to persist session data while still allowing any node to handle requests.

In order to persist session data in a database (say, Redis cache), what is the state actually and how can this state be serialized (ideally into JSON that can be cached between nodes)?

@buggyhunter
Copy link

@dglozic did you find a way to use/try non-sticky sessions. As you mentioned in first thread:

 Both SSE and StreamableHttp transport rely on caching the transport in memory in the server pod

I encountered the same problem.

We need to find a way to persist transport objects. (serialization/deserialization)

@ihrpr any insights would be very helpful.

@dglozic
Copy link
Author

dglozic commented Apr 21, 2025

It seems that with the latest release, StreamableHttp can be used stateless (but it is still recommended to serialize state if possible). I am waiting to see how exactly :-).

@cliffhall
Copy link
Contributor

cliffhall commented Apr 24, 2025

I am not sure what you give up going stateless but it seems like this would be preferred for a production server. Do most major clients support this?

@asprouse You give up the ability to have server sent events. For that, there needs to be a session id.

@ihrpr
Copy link
Contributor

ihrpr commented Apr 24, 2025

@dglozic did you find a way to use/try non-sticky sessions. As you mentioned in first thread:

 Both SSE and StreamableHttp transport rely on caching the transport in memory in the server pod

I encountered the same problem.

We need to find a way to persist transport objects. (serialization/deserialization)

@ihrpr any insights would be very helpful.

@buggyhunter why do you need to persist transport objects in stateless server? If you need to have a stateless server, transport (and server object) will be created on every request, here is an example

@ihrpr
Copy link
Contributor

ihrpr commented Apr 24, 2025

I am not sure what you give up going stateless but it seems like this would be preferred for a production server. Do most major clients support this?

@asprouse You give up the ability to have server sent events. For that, there needs to be a session id.

yeah, and in addition, things like resumability, which is useful for long running tasks

@meetAndEgg
Copy link

meetAndEgg commented Apr 27, 2025

I am not sure what you give up going stateless but it seems like this would be preferred for a production server. Do most major clients support this?

@asprouse You give up the ability to have server sent events. For that, there needs to be a session id.

I am not sure what you give up going stateless but it seems like this would be preferred for a production server. Do most major clients support this?

@asprouse You give up the ability to have server sent events. For that, there needs to be a session id.

yeah, and in addition, things like resumability, which is useful for long running tasks

In what cases do we need to keep a session for the server to send events? I mean, most servers don't send notifications.
In addition, the resumability seems to have nothing to do with the session. As long as the server has SSE capabilities (even if the server sets sessionId to undefined), it can be restored through a get request.

@dglozic
Copy link
Author

dglozic commented Apr 27, 2025

I am not saying I want to give up on state, only that the examples showing how state is implemented are not useful for actual implementations in the real world, where multiple server replicas are running. It is relatively easy to set up a Redis service to keep state between replicas (using the session id for the key). But Redis requires this state to be serializable into a string, and examples are cashing entire transport object.

What we would need is an example where the transport state is extracted from the transport as JSON serializable object, and on each request a new transport instance can be created by passing that state. We cannot cache entire transport objects, that's what's broken in the examples. We can even cache these transports in memory once created, but if the request started with one replica hits another, there should be a path of re-creating the transport from string-only state in Redis.

@ihrpr
Copy link
Contributor

ihrpr commented Apr 27, 2025

I am not sure what you give up going stateless but it seems like this would be preferred for a production server. Do most major clients support this?

@asprouse You give up the ability to have server sent events. For that, there needs to be a session id.

I am not sure what you give up going stateless but it seems like this would be preferred for a production server. Do most major clients support this?

@asprouse You give up the ability to have server sent events. For that, there needs to be a session id.

yeah, and in addition, things like resumability, which is useful for long running tasks

In what cases do we need to keep a session for the server to send events? I mean, most servers don't send notifications. In addition, the resumability seems to have nothing to do with the session. As long as the server has SSE capabilities (even if the server sets sessionId to undefined), it can be restored through a get request.

It's not the mcp-session-id (sessionId) you need for resumability; it's the sticky session (or any other way to route to the same node). The GET request should be sent to the same node as the original POST request to resume the long-running task and continue receiving notifications and, eventually, the response.

@ihrpr
Copy link
Contributor

ihrpr commented Apr 27, 2025

I am not saying I want to give up on state, only that the examples showing how state is implemented are not useful for actual implementations in the real world, where multiple server replicas are running. It is relatively easy to set up a Redis service to keep state between replicas (using the session id for the key). But Redis requires this state to be serializable into a string, and examples are cashing entire transport object.

What we would need is an example where the transport state is extracted from the transport as JSON serializable object, and on each request a new transport instance can be created by passing that state. We cannot cache entire transport objects, that's what's broken in the examples. We can even cache these transports in memory once created, but if the request started with one replica hits another, there should be a path of re-creating the transport from string-only state in Redis.

Transport is created on each request only for stateless servers. Here is a very high level example for multi-node deployment with state management. We can try to add a more detailed example in a bit if it's going to be helpful.

@dglozic
Copy link
Author

dglozic commented Apr 27, 2025

I was toying with the idea of pub/sub, but it is inferior for HA to serializing/deserializing state. When pub/sub is used, what we are saying is that there is the 'correct' node to handle each request based on session id, and we pass on handling of the request to that node. The problem here:

  1. We are now engaged in complex piping of requests between nodes (because response needs to be piped back to the original requestor)
  2. If the 'correct' node to handle a particular session id crashes, we are doomed. Node does not need to crash abnormally, it may simply be stopped during the the normal rolling deploy (which happens all the time in SaaS). This means that our MCP Server will be malfunctioning any time something bad happens OR we deploy a new version.

Essentially using pub/sub is just re-implementing sticky session. Kubernetes will handle this automatically for you if you use sticky session support (since load balancer will route requests to nodes acting as upstreams based on varying criteria, and sticky session id would 'rig' that by forcing the load balancer to always pass the requests to the same node based on it).

Meanwhile, if state can be serialized into a string and cached in Redis, any node is the 'right' node because if the transport for that session id is missing, we can recreate it from state in Redis. Crashing or restarting nodes is not a problem because load balancer will re-route new requests to the healthy nodes.

@cliffhall
Copy link
Contributor

cliffhall commented Apr 28, 2025

Regarding the serialization, which does seem like a terrific idea, I had a chat with Gemini (because it was handy in my IDE) that identified a bit of the challenge involved. Since it was fairly on point, I will include it here.

Me

I would like to add instance method marshal and static method unmarshal to the Transport class so its implementors could be serialized and stored. This would avoid the problems of caching transports and requests having to be routed to the "right" server. Theoretically, any server could deserialize the state of a transport stored in a Redis db by session id and respond. How would I implement those methods in the current file (streamableHttps.ts)

Gemini

Okay, that's an interesting approach to achieving statelessness on the server-side by persisting the transport's state! However, there's a significant challenge here: the StreamableHTTPServerTransport inherently manages live network connections (ServerResponse objects), which cannot be serialized and restored across different processes or machines.

Let's break down how you could implement serialization for the non-connection state and discuss the implications:

1. Identify Serializable and Non-Serializable State

1.1. Serializable State

First, identify the properties that represent the logical state of the transport and can be serialized (primitive types, plain objects, arrays, Maps of serializable types):

  • sessionId: string | undefined
  • _started: boolean
  • _initialized: boolean
  • _enableJsonResponse: boolean
  • _standaloneSseStreamId: string
  • _requestToStreamMapping: Map<RequestId, string> (Needs conversion for serialization)
  • _requestResponseMap: Map<RequestId, JSONRPCMessage> (Needs conversion, but see note below)

1.2 Non-Serializable State:

  • _streamMapping: Map<string, ServerResponse> (This holds live connections - cannot be serialized)
  • sessionIdGenerator: Function
  • _eventStore: EventStore instance (You'd need to re-instantiate this based on configuration, not serialize the instance itself)
  • _onsessioninitialized, onclose, onerror, onmessage: Callback functions

Important Note on _requestResponseMap:

This map holds responses potentially waiting to be sent back on a specific connection. Since the connection (ServerResponse) itself cannot be restored, storing these pending responses might not be directly useful upon deserialization. The client would likely need to re-establish a connection anyway. Relying on the EventStore for message replay upon reconnection is probably the more robust pattern here. For simplicity in this example, we'll omit _requestResponseMap from serialization, assuming the EventStore handles message persistence.

2. Define the Serialized State Structure

// Define an interface for the data structure we'll serialize/deserialize
interface StreamableHTTPServerTransportState {
    sessionId: string | undefined;
    started: boolean;
    initialized: boolean;
    enableJsonResponse: boolean;
    standaloneSseStreamId: string;
    // Store maps as arrays of [key, value] pairs for easy JSON serialization
    requestToStreamMapping: [RequestId, string][];
    // We are omitting requestResponseMap as discussed above
}

// Define options needed during deserialization, including non-serializable parts
interface StreamableHTTPServerTransportUnmarshalOptions extends StreamableHTTPServerTransportOptions {
    // Callbacks need to be re-attached
    onclose?: () => void;
    onerror?: (error: Error) => void;
    onmessage?: (message: JSONRPCMessage) => void;
    // The EventStore instance needs to be provided
    eventStore?: EventStore; // Make sure this is the actual instance to use
}

3. Implement marshal (Instance Method)

  /**
   * Serializes the transport's non-connection state.
   * Note: Live connections (ServerResponse) and callbacks are not serialized.
   * @returns A serializable representation of the transport's state.
   */
  public marshal(): StreamableHTTPServerTransportState {
    return {
      sessionId: this.sessionId,
      started: this._started,
      initialized: this._initialized,
      enableJsonResponse: this._enableJsonResponse,
      standaloneSseStreamId: this._standaloneSseStreamId,
      // Convert Map to a serializable array format
      requestToStreamMapping: Array.from(this._requestToStreamMapping.entries()),
    };
  }

4. Implement unmarshal (Static Method)

  /**
   * Deserializes the transport state and creates a new instance.
   * Requires providing runtime dependencies like callbacks and the EventStore.
   * Note: This restores the logical state, but clients need to re-establish connections.
   * @param state The serialized state object.
   * @param options Options containing callbacks and the EventStore instance.
   * @returns A new StreamableHTTPServerTransport instance populated with the deserialized state.
   */
  public static unmarshal(
    state: StreamableHTTPServerTransportState,
    options: StreamableHTTPServerTransportUnmarshalOptions
  ): StreamableHTTPServerTransport {

    // Create a new instance, passing necessary config that might influence initial setup
    // Note: sessionIdGenerator might be irrelevant if state.sessionId exists,
    // but pass it for consistency or potential future use.
    const transport = new StreamableHTTPServerTransport({
        sessionIdGenerator: options.sessionIdGenerator,
        enableJsonResponse: state.enableJsonResponse, // Use state's value
        eventStore: options.eventStore, // Use provided instance
        onsessioninitialized: options.onsessioninitialized, // Use provided callback
    });

    // Restore state from the serialized object
    transport.sessionId = state.sessionId;
    transport._started = state.started;
    transport._initialized = state.initialized;
    // transport._enableJsonResponse is set via constructor options above
    transport._standaloneSseStreamId = state.standaloneSseStreamId;

    // Restore Maps from the serialized array format
    transport._requestToStreamMapping = new Map(state.requestToStreamMapping);

    // Re-attach essential callbacks provided in options
    transport.onclose = options.onclose;
    transport.onerror = options.onerror;
    transport.onmessage = options.onmessage;

    // _streamMapping and _requestResponseMap start empty, as connections
    // need to be re-established by clients.

    return transport;
  }

How to Use and Implications

1. Storage

  • When you want to persist the state (e.g., before a server might shut down or hand off), you'd call transport.marshal() and store the resulting StreamableHTTPServerTransportState object (likely as JSON) in Redis, keyed by the sessionId.

2. Restoration

  • When a request comes in for a specific sessionId
    • Fetch the serialized state from Redis.
    • Prepare the StreamableHTTPServerTransportUnmarshalOptions with the necessary callbacks (likely configured globally on your server) and the appropriate EventStore instance.
    • Call StreamableHTTPServerTransport.unmarshal(state, options) to get a new transport instance.
    • Use this restored transport instance to handle the incoming request (handleRequest).

3. Client Reconnection

  • This is crucial. Since the original ServerResponse connections are lost, the client must reconnect.
    • For SSE: The client would need to make a new GET request with the Mcp-Session-Id and potentially the Last-Event-ID header. The restored transport (using the EventStore via replayEvents) would then handle resuming the message stream.
    • For POST responses: If a POST request was expecting a response via SSE or JSON, and the server handling it changed, the client might not receive that specific response directly. The client-side logic would need to handle potential timeouts and potentially rely on subsequent messages or state updates received after reconnecting via GET.

In Summary

You can serialize the logical state (session ID, initialization status, request-to-stream mappings). However, you cannot serialize the live HTTP connections. This approach shifts the burden of re-establishing the connection to the client, making the EventStore (for resumability) absolutely critical for a seamless experience. Any server can restore the state, but it needs the client to initiate a new connection to actually communicate.

@dglozic
Copy link
Author

dglozic commented Apr 29, 2025

That Gemini guy is smart!

I think this all boils down to intent. If you want to open a bi-directional connection and keep it open, I fear HTTP is the wrong protocol to use. If THAT is the intent, then Web Sockets is the right protocol - it will HTTP Upgrade from HTTP to WS(S) transport and keep the socket open (status code 101) for bidirectional traffic (and reconnect in the case of socket drop).

HTTP is really not intended for opened connections - you send a request, you get your response, and then connection closes (unless you use keepAlive for performance reasons). Load balancers sitting in front of multiple upstream nodes are assuming the can round-robin traffic to any of the upstreams as long as they are alive. Nodes cannot really keep anything in memory - they should be able to process the request from the request itself, plus state that that can be marshalled and unmarshalled safely. As Gemini dude said, open connections cannot be serialized.

@djMax
Copy link

djMax commented May 12, 2025

Seems like there is a very well established pattern for this via express session and its many connectors to things like redis. We don't have to require express per se, but seems like we should match the pattern given that we are already "express adjacent"?

@dglozic
Copy link
Author

dglozic commented May 12, 2025

That was my point - most people using Express also use Redis-based sessions. They work well with multiple instances of services, allowing those instances to 'see' the same cached values. That was my point above, but it seems like resumable and bi-directional message traffic required here is a bad fit for HTTP. Web Sockets seem like a better fit to attempting to cache entire connection in memory (which requires sticky session support).

@alko89
Copy link

alko89 commented May 16, 2025

StreamableHttp can be stateless, the example server works in kube with multiple pods

https://github.com/modelcontextprotocol/typescript-sdk/blob/main/src/examples/server/simpleStatelessStreamableHttp.ts

SSE seems to be deprecated, so I wouldn't count being updated. If anyone needs to implement SSE it will require manually handle sessions in Redis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants