Engineering

How to Debug MCP Tool Calls: A Practical Troubleshooting Guide

MCP tool calls fail in nine repeatable ways. Here is how to recognize each one from the logs, reproduce it in isolation, and fix it fast.

April 15, 202611 min readToolRoute Team

Your agent stops mid-workflow. The UI says "tool call failed." The tool looks fine when you test it by hand. You restart, the error goes away for an hour, and then it comes back. If this sounds familiar, you have an MCP tool call bug.

The Model Context Protocol standardized how agents discover and invoke tools, but it did not standardize how tools fail. Underneath the JSON-RPC envelope, an MCP tool call is still a network request against a third-party service that can time out, rate-limit you, return malformed data, or silently change its contract. This guide covers the nine failure modes we see most often across 51 adapters at ToolRoute, how to reproduce each one deterministically, and the exact fix for each.

Step 1: Get a Trace Before You Guess

Ninety percent of MCP debugging wastes time because engineers guess at the cause before capturing a trace. A good trace has four fields:

Request: the exact JSON payload the agent sent, including tool name, operation, and input.
Upstream status: the HTTP status code, headers, and raw body returned by the underlying service.
Elapsed time: total wall-clock time from agent call to response, broken into adapter overhead and upstream latency.
Run ID:a correlation identifier so you can tie the agent's log line to the adapter log line.

If your stack cannot emit all four, add them before you do anything else. Most of the time the bug reveals itself the moment you can see the raw upstream response.

The 9 Failure Modes of MCP Tool Calls

1. Tool Not Found (Schema Drift)

The agent tries to call tavily.search, but the server only exposes tavily_search. This happens when the tool schema changed after the agent's system prompt was written. The fix: pull the live tool catalog at the start of every session instead of hard-coding names.

2. Invalid Input (Parameter Validation)

The agent sends {"query": 123} to a tool that expects a string. MCP servers usually return a validation error with the offending field. If your tool does not, add Zod or Pydantic validation at the adapter boundary and return a structured error. Agents recover from structured errors 3-4x better than from free-form strings.

3. Auth Expired or Scoped Incorrectly

OAuth tokens expire. API keys get rotated. Long-running agents hit this the hardest because their auth state drifts from the production reality. Two symptoms: HTTP 401 from the upstream, or HTTP 200 with an auth-error envelope. Fix with explicit token refresh before calls, or delegate to a platform like Composio for OAuth. Also verify the token scope matches the operation: a read-only GitHub token will fail on create_issue without looking expired.

4. Rate Limit or Quota Exhausted

HTTP 429 is the honest version. The silent version is HTTP 200 with an empty result array because you hit a soft cap. Log the X-RateLimit-Remaining header from every response, and set alerts when remaining drops below 10 percent of limit. Back off exponentially with jitter; fixed retry intervals make the problem worse. This pattern is covered in depth in our reliability patterns guide.

5. Timeout (Request or Stream)

Two kinds of timeouts. The request timeout fires when the full response takes too long. The stream timeout fires when the tool stops sending bytes for some window. If the agent logs a timeout but the upstream logs a 200, the bottleneck is almost always your adapter's inactivity window. Default 10s request timeouts are wrong for search APIs like Tavily (median 1.1s, p99 18s) and video APIs like Creatify (30-90s is normal).

6. Empty Response That Looks Wrong

The tool returns 200 OK and a well-formed envelope with zero results. The agent, reading the empty array, hallucinates or refuses to proceed. This is not a tool bug; it is a prompt bug. Either the query was wrong (misspelled brand, unsupported region), or the agent needs explicit "what to do when empty" guidance in its system prompt. Compare to the search tool benchmarks to see expected result counts for common queries.

7. Malformed Response (Contract Drift)

The tool added a field, removed a field, or changed a type. Your parser chokes. This is why every MCP adapter should parse with a tolerant schema (extra fields ignored, missing fields defaulted) and log a warning on unknown fields so you find contract drift before it becomes an outage.

8. Transport Layer Failure (stdio or HTTP)

MCP supports stdio and Streamable HTTP transports. Stdio breaks in two fun ways: the subprocess dies silently (check exit code), or a stray stdout print corrupts the JSON-RPC frame (redirect all tool logs to stderr). HTTP breaks in the usual DNS, TLS, and proxy ways. If you cannot tell which transport layer is failing, dump raw bytes at the adapter boundary.

9. Agent-Side Tool Selection Error

Sometimes the tool is fine and the agent chose the wrong one. The agent calls firecrawl.scrape on a JavaScript-rendered page that needs playwright.navigate, or uses web search when it should have hit a known API. Fix this in the tool descriptions, not the tool code. Descriptions should include "use this when" and "do not use this when" guardrails.

Failure-Mode Lookup Table

Symptom	Likely Cause	First Fix
HTTP 404 or unknown tool	Schema drift	Refresh tool catalog on session start
HTTP 400 with field name	Input validation	Log request payload, add Zod at boundary
HTTP 401 or 403	Auth expired or wrong scope	Refresh token, check permission scope
HTTP 429 or silent cap	Rate limit	Log X-RateLimit headers, exponential backoff
Timeout on agent side	Default timeout too aggressive	Raise timeout to match p99 upstream latency
Empty response envelope	Prompt or query issue	Replay query by hand, adjust system prompt
JSON parse error	Contract drift	Tolerant parser, alert on unknown fields
Connection closed unexpectedly	Transport failure	Log raw bytes, check subprocess exit code
Wrong tool chosen	Weak tool description	Add use/do-not-use guardrails in schema

Step 2: Reproduce Without the Agent

Once you have a trace, strip the agent out of the loop. Take the raw request payload and replay it directly against the MCP endpoint with curl:

curl -X POST https://api.toolroute.ai/api/v1/execute \
  -H "Authorization: Bearer $TOOLROUTE_KEY" \
  -H "Content-Type: application/json" \
  -d '{"tool":"tavily/search","input":{"query":"MCP debugging"}}'

If the curl call succeeds and the agent call fails, the bug is in prompt interpretation, tool selection, or input construction on the agent side. If the curl call fails with the same error, the bug is below the agent and you can keep stripping layers.

Step 3: Turn Verbose Logs On (the Right Ones)

Every MCP client has a way to dump the JSON-RPC frames. These are the knobs worth knowing:

Claude Desktop: logs at ~/Library/Logs/Claude/mcp*.log (macOS) or %APPDATA%/Claude/logs (Windows). Set MCP_LOG_LEVEL=debug on the server process.
Claude Code: CLAUDE_MCP_DEBUG=1 emits structured logs per call.
Cursor:open Developer Tools, filter console by "MCP". Tool startup errors show in Output → MCP.
OpenAI Agents SDK: set OPENAI_LOG=debug to see raw function call payloads.

The related walkthroughs on Claude Code MCP setup and Cursor MCP setup cover log locations and auth for each client in more detail.

Step 4: Observability That Pays for Itself

Debugging is faster when every call already has a trace. Three pieces of instrumentation pay for themselves within a week:

Per-tool latency histograms with p50, p95, p99 so you know what "normal" looks like. Sentry, Datadog, or PostHog all do this well; compare them here.
Error rate alerts per tool, per operation. A 2 percent error rate on a fallback tool is fine; a 2 percent rate on your payments tool is an incident.
Run-level traces that link agent reasoning to tool calls. If you can click a failed agent run and see the exact tool payload, your median debug time drops by 5-10x.

Why Gateways Make Debugging Easier

Every failure mode above has one thing in common: you cannot debug it without structured data. Running 51 direct MCP integrations means 51 log formats, 51 error shapes, 51 timeout defaults. That is the hidden cost of doing it yourself.

A gateway normalizes this. At ToolRoute, every tool call emits the same five fields: run ID, tool, operation, upstream status, elapsed time. Errors follow one schema regardless of whether the underlying tool is Tavily, Resend, or Stripe. The docs cover trace headers and the logs endpoint, and the tools directory lists every adapter with its expected latency envelope and known failure modes. The MCP gateway primer explains why this matters architecturally.

A Debugging Checklist You Can Steal

Capture request, upstream status, elapsed time, run ID.
Replay the exact payload with curl. Does it still fail?
If curl works, look at agent prompts and tool selection.
If curl fails, read the raw upstream body for an error hint.
Check auth expiry and scope first, rate limits second.
Compare latency to the tool's historical p99. If spiked, suspect upstream.
If empty response, replay the query manually in the tool UI.
If contract drift, read the provider changelog.
Ship a structured error and a regression test with the fix.

Print this, pin it to your wall, save yourself the next outage. Most MCP bugs are one of these nine, and working the list in order beats guessing every time.

Frequently Asked Questions

Why is my MCP tool call returning an empty response?

Usually one of three things: the tool returned 200 OK with a success envelope but zero results (most common), the adapter silently swallowed an error and returned null, or the agent timed out before the response arrived. Check the upstream HTTP status, the adapter log, and the total elapsed time before assuming the tool is broken.

How do I enable verbose logging for MCP?

Most MCP servers respect MCP_LOG_LEVEL=debug. For Claude Desktop, check ~/Library/Logs/Claude/mcp*.log. Claude Code uses CLAUDE_MCP_DEBUG=1. Cursor exposes MCP logs in Developer Tools.

Why does my MCP tool work locally but fail in production?

Four common causes: missing env vars on the production host, stricter egress rules in the runtime (Vercel Edge vs Node), stdio processes that cannot run on serverless, or auth tokens scoped to a local account. Gateways eliminate the last three.

What is the fastest way to reproduce an MCP failure?

Isolate the tool from the agent. Capture the exact JSON payload, then replay it directly against the MCP endpoint with curl. If curl fails, the bug is in the tool or adapter. If only the agent fails, the bug is in prompt interpretation or tool selection.

Want structured traces across every tool by default? See the ToolRoute docs, browse the glossary, or check the FAQ for platform specifics.