Tutorial
Building Custom MCP Servers: A Complete Guide for 2026
A production-oriented walkthrough for building your own MCP server. Transport, tools, schemas, auth, testing, deployment, and the mistakes worth skipping.
The Model Context Protocol is the universal socket for AI agents. Any tool that speaks MCP can be called by Claude, Cursor, Claude Code, OpenAI Agents SDK, and every other framework that has added MCP support. Building an MCP server is how you expose your own capability to all of them at once.
This guide is the opinionated version of the official docs. It assumes you want to ship a server to real users rather than just experiment, so it covers the operational questions you will hit in production: auth, versioning, testing, and deployment. The code samples are TypeScript; Python looks nearly identical. For what MCP actually is, start with the protocol primer.
Decide Whether to Build at All
Before writing code, check if someone already solved your problem. A good heuristic: search three places in this order. The official MCP Registry, the ToolRoute catalog, and community lists like PulseMCP. If an existing server covers 80 percent of what you need, fork or extend it. You should build your own when:
- You are exposing a private internal API.
- Your capability is proprietary and cannot be shared.
- You need a composition of operations no public server handles.
- You need specific latency or reliability guarantees no public server meets.
If none of those apply, integrate an existing server. A custom server is ongoing maintenance; a public one is somebody else's maintenance.
Choose Your Transport
MCP supports two transports. Pick based on deployment target.
stdio
- Runs as a local subprocess of the client
- Zero network exposure, single-user
- Ideal for IDE plugins and desktop tools
- Config: Claude Desktop, Cursor, Claude Code all support stdio commands
- Limitation: cannot be shared across users
Streamable HTTP
- Runs as a normal HTTP server
- Multi-user, authenticatable, observable
- Ideal for production SaaS and shared tools
- Config: any client that supports remote MCP URLs
- Requires auth, rate limiting, observability
You can expose both from the same codebase. The SDK separates transport from logic, which means a well-written server can run locally for dev and remotely for prod without forking.
A Minimal Working Server
Here is the smallest TypeScript server that does real work. Copy this, replace the tool body with yours, and you have a working MCP server.
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
const server = new McpServer({
name: "my-weather-server",
version: "0.1.0",
});
server.tool(
"get_forecast",
"Returns a 3-day forecast for a city",
{
city: z.string().describe("City name, e.g. 'Austin'"),
units: z.enum(["metric", "imperial"]).default("metric"),
},
async ({ city, units }) => {
const res = await fetch(
`https://api.weather.example/v1/forecast?city=${city}&units=${units}`
);
if (!res.ok) {
return {
content: [{ type: "text", text: `Upstream error: ${res.status}` }],
isError: true,
};
}
const data = await res.json();
return {
content: [{ type: "text", text: JSON.stringify(data, null, 2) }],
};
}
);
const transport = new StdioServerTransport();
await server.connect(transport);Three things worth noticing. First, the Zod schema is the source of truth for both validation and the agent-facing tool description. Second, errors are explicit: return isError: true with a message the agent can reason about. Third, the transport is a separate object, so swapping stdio for Streamable HTTP is a one-line change.
Tool Schema Design: The Part That Matters Most
An agent's ability to use your tool correctly is proportional to how well your schema is written. A weak description gets your tool called at the wrong moments; a strong description keeps the agent on-task.
Three rules we enforce on every adapter at ToolRoute:
- Describe the what, when, and when-not.A tool called "get_forecast" should say "Use for future weather lookups. Do not use for historical or real-time current conditions." Agents use negative examples as much as positive ones.
- Include expected output shape. "Returns JSON with fields
days[].high,days[].low,days[].summary." Agents plan better when they know the shape. - Give each parameter a description and an example."city: City name, e.g. Austin." The example alone reduces input validation errors by a large margin.
For more on how tool selection works from the agent side, see choosing the right MCP tool and building agents with multiple tools.
Authentication: Stdio vs HTTP
Stdio servers inherit the caller's trust. The user has already authenticated to the IDE; the server runs in their process space; credentials usually come from environment variables the user sets locally.
Streamable HTTP servers need real auth. Two patterns work:
- Bearer API keys over TLS. Simplest. Good for server-to-server. Rotate on a schedule. This is what most MCP gateways use.
- OAuth 2.1 with PKCE. Required if your tool acts on behalf of an end user (e.g., a user's Gmail or Notion). The MCP spec includes an authorization flow; the community has battle-tested patterns in Composio vs manual OAuth.
Whatever you choose, log auth failures separately from other failures. 401s look like transient errors to agents unless you surface them explicitly, and agents will retry until the rate limit hammer falls.
Versioning Your Tools
MCP servers are contracts. Clients cache tool schemas. If you change a parameter name, remove a field from a response, or change a type, you can break live agents.
Three rules for safe evolution:
- Additive changes only on a published tool. Add new optional parameters, add new fields, add new tools.
- Version in the tool name for breaking changes:
get_forecast_v2. Keep the old one alive for a deprecation window. - Document deprecations in the tool description: "Deprecated. Use get_forecast_v2 for unit-configurable output."
Error Handling Agents Can Actually Use
The agent reading your error message is a language model. It responds to structure, not stack traces. Return errors with a stable shape:
{
"content": [{
"type": "text",
"text": "Validation failed: 'city' must be a non-empty string"
}],
"isError": true
}Prefer error messages that tell the agent what to do next: "City not found. Try a valid city name or country code." vs "ERROR 404." The first recovers; the second loops. The same principle underlies the reliability patterns every production agent needs.
Testing Your MCP Server
Three testing layers, in order of value.
- Unit tests on tool handlers. Call the handler function directly with fixtures. Test validation, happy paths, and upstream error paths. Fast, deterministic, catches regressions.
- Integration tests via MCP Inspector. Run
npx @modelcontextprotocol/inspectoragainst your server in CI. The Inspector gives you a UI for listing tools, calling them, and inspecting raw frames. It also catches schema problems that unit tests miss. - Live agent smoke tests. A handful of scripted prompts that exercise each tool end-to-end with a real client (Claude Desktop, Claude Code, or OpenAI Agents). Runs once before every release; catches "tool description is unclear" bugs that unit tests cannot.
Deployment Patterns
Where you deploy depends on transport and scale.
- Stdio servers: publish as npm or PyPI packages. Users install and configure their client. Zero infrastructure on your side.
- HTTP servers on Vercel or Fly: works for stateless tools. Cold starts can hurt latency-sensitive workflows.
- HTTP servers on long-running infra (Fly VMs, ECS, Railway): required for stateful servers or streaming workloads.
- Behind a gateway: if your tool is one of many, put it behind an MCP gateway so agents hit one auth surface, one billing flow, one trace format. This is what ToolRoute does for 51 adapters.
Observability Day One, Not Day 90
Every MCP server needs three signals from day one.
- Per-tool latency histograms (p50, p95, p99). You cannot set an agent-side timeout without knowing your p99.
- Per-tool error rate broken down by error type (validation, upstream, auth, timeout). 1 percent validation errors is fine; 1 percent upstream errors is an incident.
- Tool-call traces that include the input, the upstream call, and the final response. Without this, debugging is guesswork. Covered in depth in how to debug MCP tool calls.
Mistakes Worth Skipping
- Printing to stdout in stdio servers. A single stray
console.logcorrupts the JSON-RPC frame and breaks the whole session. Log to stderr only. - Overly generic tool names.
searchis ambiguous;search_internal_wikiis specific. Names are how agents decide which tool to call. - Returning raw HTML or XML. Agents work best with structured JSON or readable markdown. Transform before returning.
- Missing rate limits. A misbehaving agent can DOS your upstream if you do not cap calls per key per minute.
- Not reviewing security. Every remote MCP server should be reviewed against the checklist in MCP server security best practices.
When to Self-Host vs Put It on a Gateway
You built an MCP server. Should you run it yourself or hand it to a gateway? A rough decision table:
| Scenario | Self-Host | Gateway |
|---|---|---|
| Internal-only, single tenant | Fine | Overkill |
| One tool, internal team | Fine | Optional |
| 5+ tools, mixed users | Painful | Recommended |
| Need unified billing | Build yourself | Included |
| Multi-tenant SaaS | High burden | Recommended |
The decision is not religious. Ship the server. If it grows, move it behind a gateway when the operational load justifies it. The self-hosted vs cloud MCP comparison covers the trade-offs in more depth.
Frequently Asked Questions
Do I need to build a custom MCP server?
Only if no existing MCP server covers your capability. Check the official MCP registry, the ToolRoute catalog, and community lists first. Build when you need to expose a private internal API, proprietary data, or a workflow no public server handles.
What language should I use?
TypeScript and Python have the most mature SDKs. Use TypeScript for front-end-adjacent and Vercel-deployable servers, Python when your data logic already lives in Python. Both handle stdio and Streamable HTTP.
Stdio or Streamable HTTP?
Stdio for local developer tools inside an IDE. Streamable HTTP for remote servers that multiple users access over the network. You can expose both transports from the same server.
How do I test without a client?
Use the MCP Inspector: npx @modelcontextprotocol/inspector. For automated tests, unit-test your tool handlers and integration-test a loopback client using the SDK.
Related Articles
Ready to ship it? See the ToolRoute docs for gateway integration, browse the tool catalog for inspiration, or check the glossary and FAQ.