War Story / Infrastructure
MCP stdio vs HTTP Hub: The 128 GB RAM Crash That Changed How We Ship MCP
Running MCP servers as stdio processes feels fine at 3 sessions. At 8 sessions across 50 servers, it crashed the box. Here is the math, the failure mode, and why HTTP hub is the only answer at scale.
We killed a 128 GB Windows box running eight concurrent Claude Code sessions. The culprit was not the models, not our code, not even the tools themselves. It was the transport. Every MCP server in our.mcp.jsonwas configured as a stdio command, and stdio spawns one process per session. This article is the exact failure mode, the arithmetic that explains it, and the migration path that permanently fixed it.
If you run more than one long-lived MCP client (Claude Code, Cursor, Windsurf, Cline, Zed) against more than a handful of servers, this is your future. Read before you hit it.
The Crash: What Actually Happened
The setup: a single Windows 11 workstation, 128 GB RAM, used as a venture factory cockpit. Eight Claude Code sessions live at once, one per company. Each session reads the same~/.mcp.jsonwith 50+ MCP server entries covering Supabase, Vercel, GitHub, Composio, Stripe, Playwright, context7, ref, desktop-commander, screen-capture, and two dozen others.
Every entry looked like this:
{
"supabase": {
"command": "npx",
"args": ["-y", "@supabase/mcp-server-supabase"]
},
"playwright": {
"command": "npx",
"args": ["@playwright/mcp"]
}
// ...48 more entries, all "command"-style
}Claude Code boots. The harness reads the config, forks one child process per server. Forty seconds in, Task Manager shows 400+node.exeentries. Some MCP servers ship additional subprocesses - Playwright launches a chromium watcher, desktop-commander runs a shell shim, Supabase dev binaries compile on first call. By the second session opening, we were at 800. By the fifth we passed 2,000. At session eight the page file thrashed, the kernel killed processes at random, and every Claude Code window blue-screened in sequence.
The Arithmetic Nobody Does Up Front
Stdio MCP is process-per-client-per-server. The multiplication compounds faster than any intuition prepares you for.
| Variable | Value | Subtotal |
|---|---|---|
| MCP servers in config | 50 | 50 |
| Concurrent CC sessions | 8 | 400 top-level node processes |
| Avg subprocesses per server | ~8 (watchers, binaries, workers) | 3,200 child processes |
| Overhead (npx spawners, hooks) | +~424 | ~3,624 total |
| Avg RAM per node process | 35 MB resident | ~127 GB resident |
| Available on 128 GB box | ~112 GB (OS + other apps) | CRASH |
The failure mode is O(sessions x servers x subprocesses). On a single-session developer laptop (say, one Claude Code + 10 MCP servers), stdio spawns ~40 processes using maybe 1.5 GB. That is completely fine. The curve only bends against you when you scale either axis.
Why stdio Was Designed This Way (and Why It Does Not Scale)
The MCP spec supports multiple transports on purpose. Stdio was the first transport because it is dead simple: the client forks the server as a child process, communicates over stdin/stdout pipes, and kills it on exit. No networking, no ports, no auth, no cross-origin issues, no daemon lifecycle to manage. For a single developer iterating on a single MCP server during development, it is perfect.
The design assumption is that every client has its own short-lived instance. That assumption breaks the moment you:
- Run multiple editors / clients in parallel (Justin runs 8 Claude Code sessions)
- Keep sessions alive for hours or days
- Register many servers globally instead of per-project
- Have MCP servers that ship their own heavyweight dependencies (browsers, compilers, ML models)
Each of those violates the implicit assumption, and each one multiplies the process count.
The Fix: HTTP Hub Transport
The MCP spec defines an HTTP transport (Streamable HTTP / SSE) for exactly this reason. An HTTP hub is one long-running process that hosts every MCP server behind a local port, started once at boot. Every client, every session, connects to the same hub via HTTP. The process count becomes O(servers), independent of sessions.
Our hub lives at~/mcp-hub/start-mcp-hub.shand assigns each server a port in the 18901-18926 range. The startup script is 40 lines. The hub runs as a detached background process that survives reboots via a scheduled task.
The .mcp.jsonentries change from commandto http:
{
"supabase": {
"type": "http",
"url": "http://localhost:18901/mcp"
},
"playwright": {
"type": "http",
"url": "http://localhost:18902/mcp"
}
// ...one port per server, all pointing at the hub
}When Claude Code boots, instead of forking 50 processes, it opens 50 HTTP connections. Opening a ninth session opens 50 more connections to the same hub - no new processes. Memory stays flat regardless of how many sessions run.
Before and After: The Numbers
| Metric | stdio (50 servers, 8 sessions) | HTTP hub (same load) |
|---|---|---|
| Node processes | ~3,624 | ~55 (hub + children) |
| RAM used | ~127 GB | ~2.1 GB |
| Cold start per session | 42 seconds | 1.3 seconds |
| Per-call latency | ~0 ms (stdio pipe) | ~2 ms (localhost HTTP) |
| OS stability at 8 sessions | crashes | stable indefinitely |
The per-call latency penalty is 2 ms. Tool execution typically runs 50-5000 ms. The overhead is invisible. What is very visible: cold-start dropped from 42 seconds to 1.3, because new sessions no longer spawn 50 npx installs and module loads - the hub already has everything resident.
How To Build an MCP HTTP Hub (3 Options)
Option 1: supergateway (simplest)
supergatewaywraps any stdio MCP server in HTTP/SSE with one command:supergateway --stdio "npx @supabase/mcp" --port 18901. Run one instance per server under a process manager (pm2, systemd, Windows Task Scheduler). Point .mcp.jsonentries at each port.
Option 2: mcp-proxy (Python)
mcp-proxydoes the same thing in Python. Lower memory footprint per shim, better if your box is already node-heavy.
Option 3: Custom Express hub
If you want one port with routing by server name, write a ~150 line Node script that spawns each MCP server as a child process inside the hub and exposes POST /mcp/:server. This is what we run. Benefits: one port instead of 50, one process manager entry, shared logging, shared auth middleware.
Option 4: Skip the hub, use a gateway
If you do not want to operate the hub at all, use a managed gateway like ToolRoute. You get one HTTP endpoint, one API key, 87+ tools, and zero processes running on your machine. We detail the tradeoff in self-hosted vs cloud MCP servers and what is an MCP gateway.
The Hard Rule We Wrote After This
After losing a full day to the crash and subsequent recovery, this became rule #9 in our global agent constitution, loaded into every Claude Code session:
"MCP servers: HTTP hub ONLY, NEVER stdio/command. Every MCP entry in .mcp.json must be type: http pointing to a hub port (18901-18925) or remote URL. If you see commandin MCP config, that is a bug - move it to the hub. Stdio spawns one process PER SESSION. 50 sessions x 72 servers = 3,624 node processes = 128 GB RAM = crash. Hub runs once after boot."
Common Objections (And Honest Answers)
"HTTP adds attack surface."
Bind the hub to 127.0.0.1only. No remote access, no open ports. Localhost-only HTTP has the same trust boundary as localhost stdio.
"Debugging is easier with stdio (I can see stdout)."
True for single-server iteration. At that point, temporarily switch one .mcp.jsonentry back to commandfor that specific server. Debug. Switch back. You do not need to choose globally. Also see our guide to debugging MCP tool calls.
"My team only runs one session."
Then stdio is fine today. Write it down as a known constraint. The moment anyone runs two sessions, or the server count crosses 15, migrate. Do not wait for the crash.
"Isnt this just what SaaS MCP platforms do?"
Yes - a managed gateway is effectively "someone elses HTTP hub, hosted and maintained." If you do not want to run the hub, that is the point of gateways. ToolRoute is one example. Kong MCP Gateway is another. See MCP gateway vs API gateway for the architectural comparison.
Frequently Asked Questions
Why does MCP stdio transport crash at scale?
Stdio MCP servers spawn one process per client session. 50 servers x 8 sessions = 400 top-level node processes, plus ~3,200 child processes from dependencies, totaling ~3,624 and ~127 GB RAM. Windows page file thrashes, processes get OOM-killed, sessions die.
What is an MCP HTTP hub?
A single long-running process that hosts all MCP servers behind local HTTP ports. Clients connect via HTTP instead of spawning stdio subprocesses. RAM scales with servers, not with sessions-times-servers.
How do I convert stdio MCP servers to HTTP hub?
Wrap each stdio server with supergateway, mcp-proxy, or a custom Express shim. Assign ports. Start all shims at boot. Change .mcp.json entries from commandto http with url: http://localhost:PORT/mcp.
When is stdio MCP still acceptable?
Single developer, single session, under 10 MCP servers. Below that threshold stdio stays under 100 processes and 4 GB RAM. Any scale above that, switch to HTTP hub.
Does HTTP hub add latency compared to stdio?
Localhost HTTP adds 1-3 ms per tool call. Tool execution takes 50-5000 ms. The overhead is invisible. Cold-start per session drops from ~42 s to ~1.3 s, which is a much bigger win than the 2 ms penalty.
Related Articles
ToolRoute is a managed MCP gateway, which is effectively a hosted HTTP hub. One API key, 87 tools, zero processes on your machine. Browse the catalog or read the docs.