AI Agent Guardrails — How to Control Autonomous Agents

Why AI Agents Need Guardrails

AI agents are increasingly autonomous. They make purchases, send emails, book appointments, file documents, and call external APIs — all without human intervention. This autonomy is powerful, but it creates risk. Without guardrails, an agent can overspend, contact the wrong person, or take irreversible actions that its owner never intended.

Guardrails are the policies, limits, and controls that keep an agent operating within acceptable boundaries. They answer the question: "What should this agent be allowed to do, and under what conditions?"

Types of AI Agent Guardrails

Effective guardrails operate at multiple levels, from simple hard limits to context-aware policy evaluation:

Spending limits — Cap how much an agent can spend per transaction, per day, or per vendor. A policy might allow purchases under $50 automatically but require approval for anything higher.
Action allowlists — Define exactly which action types an agent can perform. An email agent might be allowed to draft messages but not send them without approval.
Counterparty trust levels — Differentiate between trusted and untrusted recipients. An agent might auto-approve actions with known vendors but flag interactions with new ones.
Time-based constraints — Restrict when actions can occur. A booking agent might only operate during business hours.
Human-in-the-loop approval — Require explicit human sign-off for high-risk or high-cost actions before the agent proceeds.
Step-up authentication — Require the owner to verify their identity (e.g., via BankID or MFA) before the agent executes sensitive operations like government filings.

Policy-Based Guardrails vs. Hardcoded Limits

The simplest form of guardrail is a hardcoded check: if (amount > 100) deny(). This works for trivial cases, but it doesn't scale. Real-world agent behavior requires nuance — the same action might be fine in one context and risky in another.

Policy-based guardrails express rules as declarative YAML documents that the authorization engine evaluates at runtime. This approach separates the "what's allowed" logic from the agent's application code. Policies can be updated, versioned, and audited without redeploying the agent.

OpenLeash uses this policy-based approach. Each authorization request is evaluated against all applicable policies, considering the action type, cost, counterparty trust, time window, and any custom constraints. The result is a deterministic decision: allow, deny, require approval, or require step-up authentication.

Cryptographic Proof of Authorization

Guardrails are only effective if they're verifiable. When OpenLeash allows an action, it issues a PASETO v4.public proof token — a cryptographically signed record of the authorization decision. This token proves that the agent was authorized by its owner, what action was approved, and when the approval was granted.

Counterparties (the services receiving the agent's requests) can verify this token offline using the owner's public key. This creates a chain of accountability: the agent proves it had permission, and the proof is tamper-evident and independently verifiable.

Guardrails for Multi-Agent Systems

When multiple agents collaborate — for example, a research agent that delegates to a purchasing agent — guardrails become even more critical. Each agent in the chain needs its own authorization boundary. The purchasing agent should not inherit the research agent's permissions, and the research agent should not be able to escalate its own privileges through delegation.

OpenLeash handles this by binding policies to individual agents. Each agent has its own Ed25519 keypair and its own set of applicable policies. Authorization decisions are per-agent, per-action — there's no implicit trust inheritance between agents.

Implementing Guardrails with OpenLeash

Getting started with AI agent guardrails takes three steps:

Define policies — Write YAML policies that express your guardrails: spending limits, allowed actions, trust levels, and escalation rules.
Integrate the SDK — Add the OpenLeash SDK (TypeScript, Python, or Go) to your agent. Call authorize() before any risky action.
Handle decisions — Respect the authorization decision. If the result is ALLOW, proceed and pass the proof token to the counterparty. If it's REQUIRE_APPROVAL, wait for the owner's sign-off.

Explore the documentation for the full policy language reference, or try the policy playground to test guardrails against sample scenarios.