Building Safe Financial Agents
$45M lost to AI trading agent exploits. 60% of financial firms say agent misconfiguration is their top AI concern. The SEC is watching. Here's how to build financial agents that don't become liabilities.
In March 2025, a misconfigured AI trading agent at a mid-size hedge fund executed a series of unauthorized leveraged positions that resulted in $45M in losses before a human noticed. The agent had passed all functional testing. It had correct API credentials. It was authenticated. What it lacked was authorization: no per-trade limits, no approval thresholds, no segregation between the agent's ability to recommend a trade and its ability to execute one. That gap is not unique to trading. 60% of financial firms now cite agent misconfiguration as their top AI risk, and the SEC has made AI-related examination a priority for 2025 and 2026. The regulatory ground is shifting under financial agents, and the ones that survive will be the ones built with guardrails from the start.
The Regulatory Landscape
Three regulatory frameworks converge on AI agents in financial services, and each demands specific controls:
- SEC AI Examination Priorities (2025-2026). The SEC's Division of Examinations added "AI and emerging technology" as a standalone priority for the first time. Examiners are looking at how firms use AI in trading, portfolio management, and client communications. They specifically flag: whether AI outputs are subject to human review before execution, whether firms have tested AI tools for bias and accuracy, and whether adequate disclosures are made to clients about AI use. An AI agent that autonomously executes trades without documented guardrails is a finding waiting to happen.
- SR 11-7 (Federal Reserve Model Risk Management). The Fed's guidance on model risk management was written for statistical models but maps directly to AI agents. It requires: effective challenge of model outputs (an independent party must validate results), ongoing monitoring with quantitative thresholds, and a model inventory with documented limitations. An AI trading agent is a model. Its tool calls are model outputs. SR 11-7 requires that those outputs be challengeable, monitorable, and documented.
- SOX Sections 302 and 404. For publicly traded companies, Sarbanes-Oxley requires CEO/CFO certification that internal controls over financial reporting are effective (Section 302) and independent auditor attestation of those controls (Section 404). If an AI agent can initiate, approve, or modify financial transactions, it is part of your internal control environment. An agent that can both initiate and approve a transaction violates segregation of duties. An agent without audit logs makes Section 404 attestation impossible.
Five Financial Agent Risks
Financial agents create risk at five specific points. Each requires a distinct control:
- Unauthorized transactions. The agent executes a trade or transfer that no human authorized. This is the $45M scenario. The control: per-transaction authorization with amount thresholds and mandatory human approval above the threshold.
- Limit breaches. The agent stays within per-transaction limits but violates aggregate limits: position concentration, daily volume caps, counterparty exposure. The control: budget-scoped authorization that tracks cumulative exposure, not just individual transactions.
- Regulatory reporting failures. The agent executes reportable transactions without generating the required reports (SARs, CTRs, large trader reports). The control: post-action hooks that trigger reporting workflows when transaction characteristics match reporting thresholds.
- Segregation of duties violations. The agent both recommends and executes, or both initiates and approves. SOX and most internal control frameworks require separation between these functions. The control: role-scoped policies where an agent with "analyst" context can recommend but not execute, and an agent with "trader" context can execute but only pre-approved recommendations.
- Audit trail gaps. The agent takes actions that are not logged, or logged without sufficient context for reconstruction. The control: structured decision logging on every
protect()call, with full tool call arguments, policy evaluation details, and execution results.
Implementation: Financial Agent with protect()
The core pattern wraps every financial tool call in protect(). The agent code is clean. The authorization logic lives entirely in the policy, not in application code:
import anthropic
from veto import Veto, Decision
from decimal import Decimal
client = anthropic.Anthropic()
veto = Veto(api_key="veto_live_xxx", project="trading-agent")
TOOLS = [
{
"name": "get_market_data",
"description": "Fetch current price and volume for a ticker",
"input_schema": {
"type": "object",
"properties": {
"ticker": {"type": "string"},
"exchange": {"type": "string", "enum": ["NYSE", "NASDAQ", "LSE"]},
},
"required": ["ticker"],
},
},
{
"name": "execute_trade",
"description": "Submit a trade order to the broker",
"input_schema": {
"type": "object",
"properties": {
"ticker": {"type": "string"},
"side": {"type": "string", "enum": ["buy", "sell"]},
"quantity": {"type": "integer"},
"order_type": {"type": "string", "enum": ["market", "limit"]},
"limit_price": {"type": "number"},
},
"required": ["ticker", "side", "quantity", "order_type"],
},
},
{
"name": "transfer_funds",
"description": "Transfer funds between accounts",
"input_schema": {
"type": "object",
"properties": {
"from_account": {"type": "string"},
"to_account": {"type": "string"},
"amount": {"type": "number"},
"currency": {"type": "string"},
},
"required": ["from_account", "to_account", "amount", "currency"],
},
},
{
"name": "approve_recommendation",
"description": "Record approval of a trading recommendation",
"input_schema": {
"type": "object",
"properties": {
"recommendation_id": {"type": "string"},
"approved_by": {"type": "string"},
},
"required": ["recommendation_id", "approved_by"],
},
},
]
async def run_financial_agent(user_message: str, user_context: dict):
"""Financial agent with Veto authorization on every tool call."""
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=TOOLS,
messages=messages,
)
if response.stop_reason != "tool_use":
return response
tool_blocks = [b for b in response.content if b.type == "tool_use"]
tool_results = []
for block in tool_blocks:
decision = veto.protect(
tool=block.name,
arguments=block.input,
context={
"user_id": user_context["user_id"],
"role": user_context["role"],
"account_id": user_context["account_id"],
"desk": user_context.get("desk", "general"),
},
)
if decision.action == Decision.ALLOW:
result = await execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result),
})
elif decision.action == Decision.DENY:
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": f"BLOCKED: {decision.reason}",
"is_error": True,
})
elif decision.action == Decision.APPROVAL_REQUIRED:
approval = veto.wait_for_approval(
decision_id=decision.id,
timeout=decision.approval_timeout,
)
if approval.granted:
result = await execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result),
})
else:
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": f"DENIED by {approval.reviewer}: {approval.reason}",
"is_error": True,
})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})Policy: Transaction Limits, Approvals, and Budget Caps
The policy defines the authorization rules declaratively. The agent code never changes regardless of how complex the rules get. Note the budget blocks: these are Veto's economic authorization feature, which tracks cumulative spend across multiple scopes (per-trade, daily, and weekly) without requiring any state management in your application code:
name: trading-agent-production
description: Authorization policy for trading desk agent
rules:
- tool: get_market_data
action: allow
constraints:
rate_limit: 500/hour
- tool: execute_trade
conditions:
# Small trades: auto-approve
- match:
arguments.quantity: "<= 100"
arguments.order_type: "limit"
action: allow
budget:
scope: daily
limit: 50000
currency: USD
track_by: context.account_id
# Medium trades: require senior trader approval
- match:
arguments.quantity: "<= 1000"
action: require_approval
approval:
channel: dashboard
timeout: 300s
reviewers:
- role: senior_trader
context_shown:
- tool_name
- arguments
- session_history
- portfolio_exposure
# Large trades: require desk head + compliance
- match:
arguments.quantity: "> 1000"
action: require_approval
approval:
tiers:
- level: 1
reviewers:
- role: desk_head
timeout: 600s
- level: 2
reviewers:
- role: compliance_officer
timeout: 1800s
final_escalation: deny
- tool: transfer_funds
conditions:
- match:
arguments.amount: "<= 10000"
action: allow
budget:
scope: daily
limit: 100000
currency: USD
track_by: context.account_id
- match:
arguments.amount: "<= 100000"
action: require_approval
approval:
channel: dashboard
timeout: 600s
reviewers:
- role: treasury_ops
- match:
arguments.amount: "> 100000"
action: deny
reason: "Transfers > $100K require manual processing"
budget:
scope: weekly
limit: 500000
currency: USD
track_by: context.account_id
- tool: approve_recommendation
conditions:
# Segregation of duties: analyst role cannot approve
- match:
context.role: "analyst"
action: deny
reason: "Segregation of duties: analysts cannot approve recommendations"
- match:
context.role: "(senior_trader|desk_head)"
action: allow
default_action: deny
logging:
level: full
retention: 7years
reason: "SOX Section 802 — record retention"Economic Authorization: Multi-Scope Budgets
Per-transaction limits are necessary but not sufficient. The $45M incident happened because each individual trade was within limits, but the aggregate exposure was catastrophic. Veto's economic authorization tracks budgets across multiple scopes simultaneously:
# Budget tracking across multiple time windows and dimensions
budgets:
per_trade:
execute_trade:
max_notional: 50000
currency: USD
daily:
execute_trade:
max_notional: 500000
max_trades: 200
currency: USD
reset: "16:00 America/New_York"
transfer_funds:
max_amount: 100000
max_transfers: 20
currency: USD
weekly:
execute_trade:
max_notional: 2000000
currency: USD
transfer_funds:
max_amount: 500000
currency: USD
per_ticker:
execute_trade:
max_position_pct: 25
basis: portfolio_value
track_by: arguments.ticker
per_counterparty:
transfer_funds:
max_exposure: 250000
currency: USD
track_by: arguments.to_accountWhen the daily budget for execute_trade hits $500,000, the next trade is denied regardless of its individual size. When a single ticker exceeds 25% of portfolio value, further buys of that ticker are blocked. The agent does not need to track any of this. Veto maintains the running totals and evaluates them on every protect() call.
DIY Limit Checking vs Declarative Policy
Teams that build financial agents without a policy engine end up with limit checking scattered across application code. Every tool handler has its own validation logic, its own threshold constants, and its own logging format. Changing a limit means deploying new code. Adding a new approval tier means refactoring the execution loop. The comparison:
DIY Limit Checking Declarative Policy (Veto) ───────────────────────────────────── ───────────────────────────────────── Limits hardcoded in application code Limits defined in YAML policy Change requires code deploy Change requires policy update (no deploy) Each tool has its own validation One protect() call per tool Budget tracking is manual state mgmt Budget tracking is automatic Audit log format varies per tool Structured audit log on every decision Segregation of duties is ad-hoc Segregation enforced by role in context No aggregate limit tracking Multi-scope budgets (daily/weekly/ticker) Testing requires mocking business Testing is policy evaluation (unit testable) logic + external services Compliance evidence is scattered Every decision is a compliance record Adding approval tiers = refactor Adding approval tiers = YAML change
The declarative approach is not just cleaner. It is auditable. When your SOX auditor asks "show me the control that prevents unauthorized transactions over $50,000," you point to the YAML policy and the decision log. With DIY limit checking, you point to a code review and hope the reviewer caught every edge case.
SR 11-7 Mapping: Agent Controls as Model Risk Management
SR 11-7 requires three things for every model: effective challenge, ongoing monitoring, and documentation. Here is how Veto maps to each:
- Effective challenge — The
require_approvalaction is effective challenge by definition. An independent human reviews the agent's proposed action and can approve, deny, or modify it. The approval log records who challenged, what they decided, and why. - Ongoing monitoring — Budget tracking provides quantitative monitoring. When the agent approaches a limit, Veto can alert before the limit is hit. Decision logs provide qualitative monitoring: denial rates, approval rates, and patterns that indicate drift in agent behavior.
- Documentation — The YAML policy is the model's documentation. It describes exactly what the agent is authorized to do, under what conditions, with what limits. Policy version history tracks changes over time. Decision logs provide the evidence that the documented controls are actually enforced.
Getting Started
Adding financial controls to an existing agent is a single integration point: wrap your tool execution with protect(), define your limits and approval thresholds in a YAML policy, and Veto handles budget tracking, approval routing, and audit logging. Your agent code stays identical. The controls are entirely external and auditable.
Start free to add financial guardrails to your agent, or read the financial agents implementation guide and SOC 2 compliance documentation for detailed walkthroughs.
Build your first policy