AI Agent Safety: The Substrate Pattern for LLM-Powered Systems
AI agents—systems where LLMs take autonomous actions—are the frontier of AI-assisted development. They promise to move beyond code generation to task completion: deploy this service, fix this bug, respond to this incident. But they also introduce risks that code generation doesn’t: a code generator produces text; an agent takes actions with real-world consequences.
The fundamental problem: AI agents are non-deterministic. The same prompt may produce different outputs. The same context may lead to different actions. And unlike traditional software, you can’t unit test every possible behavior because the behavior space is unbounded.
This article presents the Substrate Pattern—an architectural approach that separates AI proposals from deterministic execution, providing the safety guarantees production systems require while preserving the capabilities that make agents valuable.
The Agent Fallacy
The Agent Fallacy is the belief that AI agents can be trusted to self-constrain. It manifests in several forms:
“The model will follow instructions”: LLMs are trained on human data. They exhibit the full range of human behaviors, including ignoring instructions, misinterpreting context, and producing confident nonsense. Instructions are probabilistic guidance, not deterministic constraints.
“We’ll prompt engineer the risks away”: Prompt engineering is valuable but insufficient. No prompt can anticipate every context. No instruction set can prevent every failure mode. The attack surface is the entire space of possible inputs.
“The agent will ask before doing anything dangerous”: This assumes the agent can identify danger, that its judgment aligns with yours, and that it will consistently choose to ask rather than act. None of these are guaranteed.
Lisanne Bainbridge’s “Ironies of Automation” (1983) applies perfectly: the more autonomous we make systems, the more critical human oversight becomes, yet the harder that oversight is to provide. AI agents amplify this irony.
The Substrate Pattern
The Substrate Pattern addresses the Agent Fallacy by separating concerns:
The Agent (non-deterministic): Proposes actions based on intent and context. Can be creative, exploratory, and unpredictable. This is where LLM capabilities shine.
The Substrate (deterministic): Evaluates proposals against constraints. Executes permitted actions through controlled pathways. Logs everything. This is traditional software engineering.
The agent proposes. The substrate disposes.
Core Architecture
┌─────────────────────────────────────────────────────┐
│ SUBSTRATE │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Permission │ │ Execution │ │ Audit │ │
│ │ Boundary │→→│ Engine │→→│ Log │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ↑ │
│ │ Proposals │
│ ┌─────────────────────────────────────────────────┤
│ │ AGENT │
│ │ Intent → Context → Reasoning → Proposed Action │
│ └─────────────────────────────────────────────────┘
└─────────────────────────────────────────────────────┘
Permission Boundaries
The permission boundary defines what actions are possible, not just what actions are intended. This is the key insight: constraint enforcement must be mechanical, not cultural.
Input Boundaries: What information can the agent access? Scope data access to the minimum required. Don’t give a deployment agent access to production databases if it only needs deployment configurations.
Output Boundaries: What actions can the agent take? Enumerate permitted actions explicitly. If an action isn’t on the list, it can’t happen—regardless of what the agent proposes.
Resource Boundaries: What resources can the agent consume? Limit API calls, compute time, and cost. Unbounded agents become expensive agents.
Blast Radius: If the agent fails catastrophically, what’s the worst outcome? Design the substrate so the worst case is acceptable.
Execution Engine
The execution engine translates validated proposals into actions. It provides:
Deterministic Execution: Same proposal, same action. The execution engine is traditional software—testable, predictable, auditable.
Idempotency: Actions can be safely retried. If the agent proposes the same thing twice, the system handles it gracefully.
Rollback Capability: Actions can be reversed. This is critical for recovery from agent errors that pass permission checks but produce undesired outcomes.
Rate Limiting: Actions are throttled. An agent in a bad loop can’t execute unlimited actions before humans notice.
Audit Log
Every proposal is logged. Every execution is logged. Every rejection is logged. The audit log provides:
Forensics: When something goes wrong, you can reconstruct exactly what happened—the proposal, the context, the decision, the outcome.
Learning: Rejected proposals reveal what the agent is trying to do that it shouldn’t. Accepted proposals that led to bad outcomes reveal gaps in permission boundaries.
Compliance: For regulated systems, the audit log provides the evidence trail that manual oversight can’t.
Implementation Patterns
Pattern 1: Action Enumeration
Don’t allow arbitrary actions. Enumerate them.
PERMITTED_ACTIONS = {
"deploy_staging": DeployStagingAction,
"deploy_production": DeployProductionAction,
"rollback": RollbackAction,
"scale_up": ScaleUpAction,
"scale_down": ScaleDownAction,
}
def execute_proposal(proposal: AgentProposal) -> Result:
if proposal.action not in PERMITTED_ACTIONS:
log_rejection(proposal, "action_not_permitted")
return Rejection("Action not in permitted set")
action_class = PERMITTED_ACTIONS[proposal.action]
action = action_class(proposal.parameters)
if not action.validate():
log_rejection(proposal, "validation_failed")
return Rejection("Action parameters invalid")
result = action.execute()
log_execution(proposal, result)
return result
The agent can propose anything. Only enumerated actions execute.
Pattern 2: Graduated Permissions
Not all actions are equal. Dangerous actions require more validation.
Tier 1 (Automatic): Read-only actions, reversible changes, sandbox operations. Execute immediately after basic validation.
Tier 2 (Confirmed): Production changes, resource allocation, external API calls. Require explicit confirmation or cooldown period.
Tier 3 (Supervised): Destructive operations, security-sensitive actions, compliance-relevant changes. Require human approval before execution.
The permission tier is determined by the action, not the agent’s confidence. High confidence from an LLM is not the same as low risk.
Pattern 3: Capability Tokens
Grant capabilities explicitly, not implicitly.
class AgentCapabilities:
def __init__(self, token: CapabilityToken):
self.can_read_staging = token.has_capability("read:staging")
self.can_write_staging = token.has_capability("write:staging")
self.can_read_production = token.has_capability("read:production")
self.can_write_production = token.has_capability("write:production")
def create_deployment_agent() -> Agent:
# Deployment agent can write staging, read production
token = CapabilityToken([
"read:staging",
"write:staging",
"read:production"
# Note: no write:production
])
return Agent(capabilities=AgentCapabilities(token))
Capabilities are granted at agent creation, not inferred from context. The agent can’t acquire capabilities it wasn’t given.
Pattern 4: Circuit Breakers
Stop runaway agents automatically.
class AgentCircuitBreaker:
def __init__(self, agent_id: str):
self.failure_count = 0
self.failure_threshold = 5
self.reset_timeout = timedelta(minutes=15)
self.last_failure = None
def record_failure(self):
self.failure_count += 1
self.last_failure = datetime.now()
if self.failure_count >= self.failure_threshold:
self.trip_breaker()
def trip_breaker(self):
notify_operators(f"Agent {self.agent_id} circuit breaker tripped")
disable_agent(self.agent_id)
When agents fail repeatedly, stop them. Investigate before resuming.
Common Mistakes in Agent Architecture
Mistake 1: Trust by default Agents should have zero capabilities until explicitly granted. Never start with full access and try to restrict—start with no access and carefully expand.
Mistake 2: Logging as afterthought The audit log is not optional. Design it first. An agent without comprehensive logging is an agent you can’t debug, can’t audit, and can’t trust.
Mistake 3: Assuming rollback is possible Some actions are irreversible—sent emails, deleted data, deployed contracts. Design permission boundaries with irreversibility in mind.
Mistake 4: Human review as security theater If humans must approve every action, agents provide no value. If humans never review anything, agents are unsupervised. Find the right level of oversight for your risk profile.
When to Seek Expert Help
Agent architecture requires getting safety right the first time. Organizations often benefit from external expertise when:
- Deploying production agents for the first time: The patterns that work in demos fail at scale
- Operating in regulated environments: Compliance requirements for autonomous systems are evolving and complex
- Experiencing agent incidents: A misbehaving agent is a sign of architectural gaps
- Scaling agent deployments: More agents means more attack surface
I help engineering teams design and implement the Substrate Pattern through architecture reviews, security assessments, and implementation guidance.
Related Reading
- The Complete Guide to Production-Ready AI Development - The Vibes Inside Guardrails framework
- Invariants in AI-Generated Code - What AI can’t infer
- Scaling AI-Assisted Development - Organizational patterns
Dipankar Sarkar is a technology advisor specializing in AI-native development and production systems. He has architected ML systems serving millions of users and helps organizations build safe, reliable AI agent infrastructure. Learn more →