War Story / Infrastructure

MCP stdio vs HTTP Hub: The 128 GB RAM Crash That Changed How We Ship MCP

Running MCP servers as stdio processes feels fine at 3 sessions. At 8 sessions across 50 servers, it crashed the box. Here is the math, the failure mode, and why HTTP hub is the only answer at scale.

April 15, 202610 min readToolRoute Team

We killed a 128 GB Windows box running eight concurrent Claude Code sessions. The culprit was not the models, not our code, not even the tools themselves. It was the transport. Every MCP server in our.mcp.jsonwas configured as a stdio command, and stdio spawns one process per session. This article is the exact failure mode, the arithmetic that explains it, and the migration path that permanently fixed it.

If you run more than one long-lived MCP client (Claude Code, Cursor, Windsurf, Cline, Zed) against more than a handful of servers, this is your future. Read before you hit it.

The Crash: What Actually Happened

The setup: a single Windows 11 workstation, 128 GB RAM, used as a venture factory cockpit. Eight Claude Code sessions live at once, one per company. Each session reads the same~/.mcp.jsonwith 50+ MCP server entries covering Supabase, Vercel, GitHub, Composio, Stripe, Playwright, context7, ref, desktop-commander, screen-capture, and two dozen others.

Every entry looked like this (note: Supabase is a Class-A provider on ToolRoute — BYOK required — this example is just MCP config):

{
  // BYOK required for Class-A providers (Supabase below) on the gateway
  "supabase": {
    "command": "npx",
    "args": ["-y", "@supabase/mcp-server-supabase"]
  },
  "playwright": {
    "command": "npx",
    "args": ["@playwright/mcp"]
  }
  // ...48 more entries, all "command"-style
}

Claude Code boots. The harness reads the config, forks one child process per server. Forty seconds in, Task Manager shows 400+node.exeentries. Some MCP servers ship additional subprocesses - Playwright launches a chromium watcher, desktop-commander runs a shell shim, Supabase dev binaries compile on first call. By the second session opening, we were at 800. By the fifth we passed 2,000. At session eight the page file thrashed, the kernel killed processes at random, and every Claude Code window blue-screened in sequence.

The Arithmetic Nobody Does Up Front

Stdio MCP is process-per-client-per-server. The multiplication compounds faster than any intuition prepares you for.

Variable	Value	Subtotal
MCP servers in config	50	50
Concurrent CC sessions	8	400 top-level node processes
Avg subprocesses per server	~8 (watchers, binaries, workers)	3,200 child processes
Overhead (npx spawners, hooks)	+~424	~3,624 total
Avg RAM per node process	35 MB resident	~127 GB resident
Available on 128 GB box	~112 GB (OS + other apps)	CRASH

The failure mode is O(sessions x servers x subprocesses). On a single-session developer laptop (say, one Claude Code + 10 MCP servers), stdio spawns ~40 processes using maybe 1.5 GB. That is completely fine. The curve only bends against you when you scale either axis.

Why stdio Was Designed This Way (and Why It Does Not Scale)

The MCP spec supports multiple transports on purpose. Stdio was the first transport because it is dead simple: the client forks the server as a child process, communicates over stdin/stdout pipes, and kills it on exit. No networking, no ports, no auth, no cross-origin issues, no daemon lifecycle to manage. For a single developer iterating on a single MCP server during development, it is perfect.

The design assumption is that every client has its own short-lived instance. That assumption breaks the moment you:

Run multiple editors / clients in parallel (Justin runs 8 Claude Code sessions)
Keep sessions alive for hours or days
Register many servers globally instead of per-project
Have MCP servers that ship their own heavyweight dependencies (browsers, compilers, ML models)

Each of those violates the implicit assumption, and each one multiplies the process count.

The Fix: HTTP Hub Transport

The MCP spec defines an HTTP transport (Streamable HTTP / SSE) for exactly this reason. An HTTP hub is one long-running process that hosts every MCP server behind a local port, started once at boot. Every client, every session, connects to the same hub via HTTP. The process count becomes O(servers), independent of sessions.

Our hub lives at~/mcp-hub/start-mcp-hub.shand assigns each server a port in the 18901-18926 range. The startup script is 40 lines. The hub runs as a detached background process that survives reboots via a scheduled task.

The .mcp.jsonentries change from commandto http:

{
  "supabase": {
    "type": "http",
    "url": "http://localhost:18901/mcp"
  },
  "playwright": {
    "type": "http",
    "url": "http://localhost:18902/mcp"
  }
  // ...one port per server, all pointing at the hub
}

When Claude Code boots, instead of forking 50 processes, it opens 50 HTTP connections. Opening a ninth session opens 50 more connections to the same hub - no new processes. Memory stays flat regardless of how many sessions run.

Before and After: The Numbers

Metric	stdio (50 servers, 8 sessions)	HTTP hub (same load)
Node processes	~3,624	~55 (hub + children)
RAM used	~127 GB	~2.1 GB
Cold start per session	42 seconds	1.3 seconds
Per-call latency	~0 ms (stdio pipe)	~2 ms (localhost HTTP)
OS stability at 8 sessions	crashes	stable indefinitely

The per-call latency penalty is 2 ms. Tool execution typically runs 50-5000 ms. The overhead is invisible. What is very visible: cold-start dropped from 42 seconds to 1.3, because new sessions no longer spawn 50 npx installs and module loads - the hub already has everything resident.

How To Build an MCP HTTP Hub (3 Options)

Option 1: supergateway (simplest)

supergatewaywraps any stdio MCP server in HTTP/SSE with one command:supergateway --stdio "npx @supabase/mcp" --port 18901(Supabase is a Class-A provider on ToolRoute — BYOK required — this is the local MCP package, not the gateway). Run one instance per server under a process manager (pm2, systemd, Windows Task Scheduler). Point .mcp.jsonentries at each port.

Option 2: mcp-proxy (Python)

mcp-proxydoes the same thing in Python. Lower memory footprint per shim, better if your box is already node-heavy.

Option 3: Custom Express hub

If you want one port with routing by server name, write a ~150 line Node script that spawns each MCP server as a child process inside the hub and exposes POST /mcp/:server. This is what we run. Benefits: one port instead of 50, one process manager entry, shared logging, shared auth middleware.

Option 4: Skip the hub, use a gateway

If you do not want to operate the hub at all, use a managed gateway like ToolRoute. You get one HTTP endpoint, one API key, 87+ tools, and zero processes running on your machine. We detail the tradeoff in self-hosted vs cloud MCP servers and what is an MCP gateway.

The Hard Rule We Wrote After This

After losing a full day to the crash and subsequent recovery, this became rule #9 in our global agent constitution, loaded into every Claude Code session:

"MCP servers: HTTP hub ONLY, NEVER stdio/command. Every MCP entry in .mcp.json must be type: http pointing to a hub port (18901-18925) or remote URL. If you see commandin MCP config, that is a bug - move it to the hub. Stdio spawns one process PER SESSION. 50 sessions x 72 servers = 3,624 node processes = 128 GB RAM = crash. Hub runs once after boot."

Common Objections (And Honest Answers)

"HTTP adds attack surface."

Bind the hub to 127.0.0.1only. No remote access, no open ports. Localhost-only HTTP has the same trust boundary as localhost stdio.

"Debugging is easier with stdio (I can see stdout)."

True for single-server iteration. At that point, temporarily switch one .mcp.jsonentry back to commandfor that specific server. Debug. Switch back. You do not need to choose globally. Also see our guide to debugging MCP tool calls.

"My team only runs one session."

Then stdio is fine today. Write it down as a known constraint. The moment anyone runs two sessions, or the server count crosses 15, migrate. Do not wait for the crash.

"Isnt this just what SaaS MCP platforms do?"

Yes - a managed gateway is effectively "someone elses HTTP hub, hosted and maintained." If you do not want to run the hub, that is the point of gateways. ToolRoute is one example. Kong MCP Gateway is another. See MCP gateway vs API gateway for the architectural comparison.

Frequently Asked Questions

Why does MCP stdio transport crash at scale?

Stdio MCP servers spawn one process per client session. 50 servers x 8 sessions = 400 top-level node processes, plus ~3,200 child processes from dependencies, totaling ~3,624 and ~127 GB RAM. Windows page file thrashes, processes get OOM-killed, sessions die.

What is an MCP HTTP hub?

A single long-running process that hosts all MCP servers behind local HTTP ports. Clients connect via HTTP instead of spawning stdio subprocesses. RAM scales with servers, not with sessions-times-servers.

How do I convert stdio MCP servers to HTTP hub?

Wrap each stdio server with supergateway, mcp-proxy, or a custom Express shim. Assign ports. Start all shims at boot. Change .mcp.json entries from commandto http with url: http://localhost:PORT/mcp.

When is stdio MCP still acceptable?

Single developer, single session, under 10 MCP servers. Below that threshold stdio stays under 100 processes and 4 GB RAM. Any scale above that, switch to HTTP hub.

Does HTTP hub add latency compared to stdio?

Localhost HTTP adds 1-3 ms per tool call. Tool execution takes 50-5000 ms. The overhead is invisible. Cold-start per session drops from ~42 s to ~1.3 s, which is a much bigger win than the 2 ms penalty.

ToolRoute is a managed MCP gateway, which is effectively a hosted HTTP hub. One API key, 87 tools, zero processes on your machine. Browse the catalog or read the docs.