Authorization telemetry for AI agents: tracing tool-call decisions

Authorization telemetry is the record of the authorization decision on every tool call an agent makes: which identity acted, which tool it invoked, with what arguments, under which policy version, and whether the call was allowed, sent for review or blocked. It is separate from the traces most teams already collect, and it is the one an auditor, an incident responder or your own postmortem will reach for first.

The gap is easy to miss because agent observability looks healthy. You have spans for the model turn, the retrieval, the tool latency, the token spend. What you usually do not have is a span that answers the security question, and that question does not live in any of the others.

Why isn't tool-call latency enough?

Because latency tells you the call happened and how long it took, not whether it should have happened at all. A tool call can be fast, successful and completely unauthorized. The security event is not the duration; it is the decision that let the call through.

The common fallback is to log the chat: the prompt, the model's reasoning, the response. That is useful for debugging behavior and useless for authorization. The chat is upstream of the action and easy to manipulate. The authorization decision sits right at the boundary between intent and effect, which is exactly where you want your evidence.

Trace the decision, not the conversation. The conversation is why the agent wanted to act; the decision is whether it was allowed to.

What belongs in an authorization span?

Five attributes cover almost every question you will be asked later. Keep them stable across services so the span is queryable as one thing.

Identity: the authenticated agent instance, not the connection or the shared account.
On behalf of: the human or process that delegated the work.
The call: the tool name and the arguments that matter (redact secrets, keep shape).
Policy version: the exact ruleset in force when the decision was made.
Verdict: allowed, review or blocked, plus the rule that decided.

How do you wire it with OpenTelemetry in Python?

Emit one span per tool call, named consistently, before the tool runs. The name matters: pick one identifier (trace_authorization) and use it everywhere, so the whole decision stream is one query in any backend. A minimal shape:

trace_authorization · one span per tool callpython

1tracer = trace.get_tracer("agent.authorization")

2with tracer.start_as_current_span("trace_authorization") as span:

3 span.set_attribute("agent.identity", agent_id)

4 span.set_attribute("agent.on_behalf_of", principal)

5 span.set_attribute("tool.name", tool)

6 span.set_attribute("policy.version", policy_ver)

7 verdict = policy.decide(agent_id, tool, args)

8 span.set_attribute("authz.verdict", verdict.result) # allow · review · block

9 if verdict.result == "block": raise Denied(verdict.rule)

One span, one stable name, emitted before the call runs. Everything downstream queries it as a single stream.

A few decisions make this hold up in production. Emit the span before the tool executes, so a blocked call still leaves a record. Attach the verdict rather than inferring it from an exception later. Use a bearer or workload token to establish agent.identity; positional identity ("it came in on this connection") does not survive shared gateways or retries. And export through your existing OpenTelemetry pipeline: the point is that this lands in the backend your team already watches, not a separate console.

What questions does this answer later?

The ones that are impossible to answer without it. "Which agent modified that record, on whose behalf, under what policy?" is a single query over trace_authorization spans. "Did any agent get a verdict of review that we never actioned?" is a filter. "What changed the week the incident started?" is a diff of policy versions. Session logs cannot answer these once connections are shared or the protocol goes stateless, which is the direction MCP is heading. Per-call decision spans can.

This is the instrumentation view of a larger idea: the authorization decision, not the model output, is the security event worth recording. We wrote about why that record is the product in Agent work needs evidence, not trust, and why the decision has to be deterministic in Detection versus authorization.

The takeaway

If your agents call tools with real credentials, add one span to every call: the authorization decision, named consistently, exported through the telemetry you already run. It costs almost nothing and it is the difference between reconstructing an incident and guessing at it.

Oktsec Control produces this decision record for approved agent environments, verified against the signed policy that was in force. See Control →

Authorization telemetry for AI agents.

The short version

Why isn't tool-call latency enough?

What belongs in an authorization span?

How do you wire it with OpenTelemetry in Python?

What questions does this answer later?

The takeaway

The short version

Why isn't tool-call latency enough?

What belongs in an authorization span?

How do you wire it with OpenTelemetry in Python?

What questions does this answer later?

The takeaway

Agent work needs evidence, not trust.

Detection versus authorization.