Compliance

How to Audit MCP Tool Calls for Compliance: Schema, Retention, and SIEM Integration

Every auditor, incident responder, and debugging session eventually asks the same question: what did the agent actually do, and when? Here is the exact schema, retention policy, and SIEM pipeline that answers it without gaps.

April 16, 202612 min readToolRoute Team

When an AI agent does something surprising, your first move is always the same: pull the logs. Within thirty seconds you need to see every tool it called, every input it sent, every response it got back, and the precise sequence. If your audit log is missing any of those fields, you are going to spend the next two days reconstructing a timeline from circumstantial evidence. Worse, if the incident turns into a compliance question, you are going to explain to an auditor why your primary control failed.

This is the practical companion to our MCP Server Security Best Practices and MCP Governance and SOC 2 Compliance pieces. If those articles explain why audit logging matters, this one is the schema you paste into your design doc.

1. The Minimum Viable Audit Schema

Every MCP tool call produces exactly one audit record. The record has a small required core and an optional extension area. Start with this minimum schema; you can evolve it without breaking anything if you include a schema version field from day one.

-- mcp_audit_log table, append-only

CREATE TABLE mcp_audit_log (

id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

event_ts TIMESTAMPTZ NOT NULL DEFAULT now(), -- UTC, ms precision

schema_version INT NOT NULL DEFAULT 1,

call_id TEXT NOT NULL, -- unique per invocation

trace_id TEXT, -- links calls in one agent run

caller_id TEXT NOT NULL, -- API key hash or OAuth sub

caller_type TEXT NOT NULL, -- 'agent' | 'user' | 'system'

source_ip INET,

user_agent TEXT,

tool_name TEXT NOT NULL, -- e.g. 'postgres'

operation TEXT NOT NULL, -- e.g. 'query'

input_redacted JSONB NOT NULL, -- secrets stripped

status TEXT NOT NULL, -- 'ok' | 'error' | 'denied'

error_code TEXT,

response_bytes INT,

latency_ms INT NOT NULL,

region TEXT, -- for residency controls

cost_cents NUMERIC(12, 4), -- spend per call, optional

extra JSONB -- future-proofing

);

CREATE INDEX idx_audit_caller_ts ON mcp_audit_log (caller_id, event_ts DESC);

CREATE INDEX idx_audit_tool_ts ON mcp_audit_log (tool_name, event_ts DESC);

CREATE INDEX idx_audit_trace ON mcp_audit_log (trace_id);

Three fields people forget on the first pass, every single time:

  • trace_id. One agent decision can trigger a dozen tool calls. Without a trace ID you cannot reconstruct the chain, and every incident post-mortem turns into a scavenger hunt.
  • schema_version. You will change this table within six months. Without a version field, new fields break old queries and old records break new tooling.
  • region. Auditors ask where a call was processed. Answer from the log, not from memory.

2. Field-By-Field: What Goes In and Why

2a. Identity fields

Store a hash of the API key, never the raw key. Prefer a prefix- preserving hash (keep the first 8 characters, hash the rest) so incident responders can correlate without exposing the credential. For OAuth-based callers, store the subject (sub) claim plus the client ID. If you want to tie calls back to a human developer, add a human_id column that references your SSO directory.

2b. Tool and operation fields

Split tool name and operation into two columns, not one. This lets you filter for "every write to postgres" or "every send on gmail" with a clean index. If your MCP servers return structured metadata (tool version, server commit SHA), log those too in extra.

2c. Input and output

Log redacted input parameters as JSONB. Never log raw output: the output often contains the exact data you are trying to protect. Instead, store response size in bytes and a hash of the output for integrity checks. If a regulator or court requires the actual output, you retrieve it from the upstream system using the tool name, operation, and input parameters from the log.

Why Not Log The Output?

Every byte of logged output is a byte your security boundary now has to protect. If the output was sensitive enough to care about the tool call, it is sensitive enough to care about the log storage. Logging size and hash gives you integrity verification without doubling your data-at-risk. If you need the full output for debugging, route it to a separately-authenticated debug store with shorter retention.

2d. Status and errors

Status should be a small enum: ok, error, denied, timeout. Keep the error code separate so you can GROUP BY it. Your alerting depends on being able to say "five percent of calls from this key are returning denied" in a single query.

2e. Performance fields

Latency in milliseconds and response size in bytes pay for themselves the first time an agent starts behaving weirdly and you want to know whether it is slowness or malice. Add them from day one; adding them later requires backfilling or breaking trend continuity.

3. Storage and Retention Rules

An auditor reading a log they cannot trust is worse than a missing log. Make the log tamper-evident by design.

  • Append-only by permission. The writer role on the audit table must have INSERT only. Nobody gets UPDATE or DELETE. Run a migration to revoke those grants explicitly; do not rely on default role behavior.
  • Hash chain or merkle tree for high-assurance environments. Each record includes the hash of the previous record; tampering with any record breaks the chain. Not every team needs this, but financial services and healthcare often do.
  • Separate storage tier. Stream logs to an immutable object store or SIEM with distinct credentials and distinct incident-response runbook. A compromise of your app database should not compromise your audit trail.
  • Retention matrix that matches the regulatory floor: 12 months for SOC 2, 6 years for HIPAA, 7 years for some financial services frameworks. Set lifecycle rules so deletion happens automatically and consistently.
FrameworkRetention FloorImmediately Accessible
SOC 2 Type II12 months audit periodPractical minimum is hot retrieval in minutes
ISO 27001Policy-driven, typically 12-24 monthsMust be retrievable during audits
HIPAA6 yearsFirst 12 months hot; archive acceptable after
PCI DSS1 yearFirst 3 months hot
SEC / FINRAUp to 7 years WORMImmediate retrieval on demand

4. Redaction That Actually Works

Redaction is the boring, high-stakes layer that turns into a postmortem if you get it wrong. Three rules prevent almost every failure mode:

  • Redact before writing. The gateway strips secrets before the event hits the log. Never rely on a downstream scrubber that runs "eventually." A crash between write and scrub is a breach.
  • Schema-aware redaction. Each tool has an input schema. Mark fields as log-safe, log-hashed, or log-omitted. An API key field is always log-omitted. An email is log-hashed with a rotating salt. A SKU is log-safe.
  • Pattern redaction as a backstop. Even with schema-aware rules, run a regex pass for well-known formats (API key prefixes, JWTs, credit card numbers, connection strings) and nuke anything that matches. Log the redaction event so you know which tool emitted an unexpected secret.

# Input before redaction

{ "query": "SELECT * FROM users WHERE email='alice@example.com'",

"connection": "postgres://admin:P@ssw0rd@db/main" }

# Input after redaction (what goes in the log)

{ "query": "SELECT * FROM users WHERE email='sha256:abc123...'",

"connection": "[REDACTED:connection-string]" }

5. SIEM Integration Patterns

Your audit table is useful for forensics. Your SIEM is useful for detection. You want both, and you want them streaming from the same event source so they never drift.

  • Write once, fan out. Emit each event to a durable queue. One consumer writes to the audit store. Another ships to the SIEM. Backpressure on either consumer never blocks the gateway.
  • Use a consistent event name like mcp.tool.call. Splunk, Datadog, Elastic, Sentinel all let you build content library entries keyed on the event name.
  • Map high-value fields into SIEM-native fields: caller into user, source_ip into src, tool into app, status into outcome. This makes existing detection content work without rewrites.
  • Build these five detections on day one: (a) authentication failures per caller per five minutes; (b) calls denied by the gateway policy engine; (c) response-size anomalies versus the per-tool baseline; (d) tool use outside declared business hours for the caller; (e) a sudden spike in calls per minute from any single key.

6. Real Queries Auditors Actually Run

Practice these queries before your Type II, not during it. If any of them take more than thirty seconds, your indexes are wrong.

-- Every call by a specific caller in a window

SELECT event_ts, tool_name, operation, status, latency_ms

FROM mcp_audit_log

WHERE caller_id = 'sha256:abc...'

AND event_ts BETWEEN '2026-04-01' AND '2026-04-30'

ORDER BY event_ts;

-- Error rate per tool per day

SELECT date_trunc('day', event_ts) AS day, tool_name,

count(*) AS total,

count(*) FILTER (WHERE status = 'error') AS errors

FROM mcp_audit_log

WHERE event_ts > now() - interval '30 days'

GROUP BY 1, 2 ORDER BY 1, 2;

-- Reconstruct an agent run

SELECT event_ts, tool_name, operation, status, input_redacted

FROM mcp_audit_log

WHERE trace_id = 'trc_01HX...'

ORDER BY event_ts;

-- Callers touching a restricted tool

SELECT caller_id, count(*) AS calls

FROM mcp_audit_log

WHERE tool_name = 'stripe' AND operation = 'refund'

AND event_ts > now() - interval '90 days'

GROUP BY caller_id ORDER BY calls DESC;

7. How a Gateway Closes the Loop

Every piece of this schema assumes you control the write path. Self-hosted MCP servers fight you at every step: some log in JSON, some in text, some to stdout, some to files, most with different fields and none with a consistent timestamp format. By the time you normalize everything into a single audit store, you have built a gateway whether you meant to or not.

An MCP gateway lets you skip the detour. Every tool call passes through one process that already knows the schema, already redacts secrets, already emits a normalized event, and already streams to both your audit store and your SIEM. At ToolRoute, this is exactly how the 51 tools in the registry are instrumented by default: one schema, one log, one retention policy, one SIEM pipeline across every call.

8. A Pre-Audit Readiness Checklist

  • [ ] Schema with every required field and a schema_version column.
  • [ ] Append-only writes enforced by database grants or storage immutability.
  • [ ] Gateway-side redaction runs before the event hits the log.
  • [ ] Retention policy implemented as lifecycle rules, not a runbook step.
  • [ ] SIEM ingestion flowing with consistent event naming and field mapping.
  • [ ] Five baseline alerts in place for auth failures, denies, size anomalies, off-hours use, and rate spikes.
  • [ ] Benchmarked queries that complete in under thirty seconds on a realistic dataset.
  • [ ] Tested restore from archive to verify retention media is actually readable.
  • [ ] Documented runbook for producing an auditor sample in under one business hour.

Frequently Asked Questions

What fields should an MCP audit log capture?

At minimum: a unique call ID, UTC timestamp with millisecond precision, caller identity, source IP, tool name, operation, sanitized input, response status, response size, latency, trace ID, and a schema version. Secrets and personal data must be redacted before the log is written.

How long should I retain MCP audit logs?

The baseline is twelve months to cover a full SOC 2 Type II period. Regulated industries need longer: HIPAA six years, PCI DSS one year with at least three months immediately accessible, some financial services up to seven. Set a written policy, enforce it technically, and verify deletions happen on schedule.

Where should MCP audit logs live?

Append-only media. A SIEM that ingests via syslog or HTTPS, an immutable object store with object lock enabled, a purpose- built audit log service, or a WORM-configured log database. Writable SQL tables on your production database are not acceptable.

How do I integrate MCP logs with a SIEM?

Stream every tool call as a structured JSON event over HTTPS or syslog with a consistent event type like mcp.tool.call. Map high-value fields into SIEM-native fields so existing detection content works. Build alerts for auth failures, rate breaches, unusual data volumes, off-hours use, and repeated errors from a single caller.

How do I redact secrets and PII from audit logs?

Redact at the gateway before writing the log. Use deterministic pattern matching for well-known secret formats plus schema- aware rules for tool inputs. Hash identifiers like email with a rotating salt so you can correlate events without storing raw values. Treat redaction misses as incidents, not minor bugs.

ToolRoute ships this exact audit schema out of the box across 51 curated tools. One schema, one log, one retention policy. Read the docs, browse the glossary, or check the FAQ.