Multi-Tenant AI Agent Architecture
Three isolation models for multi-tenant AI agents, with ASCII architecture diagrams, per-tenant YAML policies, vector database isolation patterns, and defense-in-depth strategies for SaaS at scale.
By the end of 2026, an estimated 40% of enterprise applications will embed task-specific AI agents, up from less than 5% in early 2025. Most of these applications are multi-tenant SaaS. And unlike standard SaaS where a database row-level policy suffices, AI agents introduce a fundamentally new isolation challenge: agents work with vector databases, long-term memory, tool access, and external APIs. A bad permission or a prompt injection can cross tenant boundaries in ways no traditional access control model anticipated.
Why Traditional Tenant Isolation Breaks
In a standard SaaS app, tenant isolation is straightforward: every database query includes a WHERE tenant_id = ? clause, API requests are scoped to an org, and the application never leaks data across tenants. AI agents break this model in five ways:
- Agents don't log in. A traditional user has a session with a tenant context. An agent is invoked on behalf of a user, but it does not authenticate as a tenant. Every tool call must be manually scoped.
- Vector search is approximate. Semantic search returns nearest neighbors, not exact matches. If you mix tenant embeddings in one index without strict metadata filtering, a query from Tenant A can return Tenant B's documents.
- Tool access is lateral. An agent with access to a file system, database, or API can reach across tenant boundaries unless every tool enforces tenant scoping independently.
- Memory persists across sessions. Agent memory systems (conversation history, learned preferences, cached tool results) can leak information between tenants if not isolated.
- Prompt injection crosses boundaries. A malicious user in Tenant A can craft input that causes the agent to access Tenant B's data through tool calls. Prompt-level isolation does not prevent this.
Architecture: Three Isolation Models
There are three patterns for structuring multi-tenant AI agents, each with different cost-isolation tradeoffs:
Model 1: Fully Siloed (highest cost, strongest isolation) ┌─────────────────────────────────────────────────────────┐ │ Tenant A │ Tenant B │ │ ┌─────────────────────────┐ │ ┌─────────────────────────┐ │ │ Agent Instance │ │ │ Agent Instance │ │ │ Vector DB (dedicated) │ │ │ Vector DB (dedicated) │ │ │ File Storage (dedicated) │ │ │ File Storage (dedicated) │ │ │ Memory Store (dedicated) │ │ │ Memory Store (dedicated) │ │ │ Tool Credentials (own) │ │ │ Tool Credentials (own) │ │ └─────────────────────────┘ │ └─────────────────────────┘ └─────────────────────────────────────────────────────────┘ Cost: ~$150-500/tenant/month Blast radius: single tenant Model 2: Shared Infra, Logical Isolation (balanced) ┌─────────────────────────────────────────────────────────┐ │ Shared Infrastructure │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Agent Runtime (shared, context-injected per tenant) │ │ │ └──────────────────────┬──────────────────────────────┘ │ │ │ │ │ ┌──────────────┐ ┌─────┴──────────┐ ┌──────────────┐ │ │ │ Vector DB │ │ Veto Policy │ │ File Storage │ │ │ │ (namespaced) │ │ Engine │ │ (prefixed) │ │ │ │ tenant_a/* │ │ per-tenant │ │ /tenant_a/* │ │ │ │ tenant_b/* │ │ policies │ │ /tenant_b/* │ │ │ └──────────────┘ └────────────────┘ └──────────────┘ │ └─────────────────────────────────────────────────────────┘ Cost: ~$20-50/tenant/month Blast radius: controlled by policy Model 3: Fully Shared (lowest cost, weakest isolation) ┌─────────────────────────────────────────────────────────┐ │ Single shared agent, single DB, tenant_id column only │ │ ⚠ NOT RECOMMENDED for AI agents — too many leak vectors │ └─────────────────────────────────────────────────────────┘
Model 2 is where most production systems land. Shared infrastructure keeps costs manageable. Logical isolation through a policy engine prevents cross-tenant access without the operational burden of managing hundreds of isolated deployments.
Per-Tenant Policies with Veto
Different tenants have different security requirements. An enterprise customer on a SOC 2 compliant plan needs approval workflows and full audit trails. A startup on a free tier needs basic rate limiting. Veto lets you define per-tenant policies that the same agent runtime evaluates at execution time:
# Acme Corp — Enterprise plan, SOC2 compliant
name: acme-corp
tenant_id: tenant_acme
plan: enterprise
rules:
- tool: query_database
conditions:
# Enforce tenant scoping on every query
- match:
arguments.query: "tenant_id\s*=\s*'tenant_acme'"
action: allow
- match:
arguments.query: ".*"
action: deny
reason: "Query must include tenant_id = 'tenant_acme'"
- tool: send_email
constraints:
rate_limit: 50/hour
conditions:
- match:
arguments.to: "@acmecorp.com$"
action: allow
- match:
arguments.to: ".*"
action: require_approval
approval:
channel: slack
webhook: "https://hooks.slack.com/services/acme/xxx"
timeout: 600s
- tool: access_file
conditions:
- match:
arguments.path: "^/data/tenant_acme/"
action: allow
- match:
arguments.path: "^/data/"
action: deny
reason: "Cross-tenant file access denied"
default_action: deny
logging:
level: full
retention: 3years# Startup XYZ — Free tier, basic controls
name: startup-xyz
tenant_id: tenant_xyz
plan: free
rules:
- tool: query_database
conditions:
- match:
arguments.query: "tenant_id\s*=\s*'tenant_xyz'"
action: allow
- match:
arguments.query: ".*"
action: deny
- tool: send_email
constraints:
rate_limit: 10/hour # lower limit for free tier
action: allow
- tool: access_file
conditions:
- match:
arguments.path: "^/data/tenant_xyz/"
action: allow
- match:
arguments.path: ".*"
action: deny
default_action: deny
logging:
level: decisions_only # reduced logging for free tier
retention: 90daysThe Runtime: Tenant Context Injection
The agent runtime is shared. On every request, Veto receives the tenant context and evaluates the correct policy. There is no conditional logic in your agent code. The same protect() call handles every tenant:
from veto import Veto, Decision
import anthropic
client = anthropic.Anthropic()
veto = Veto(api_key="veto_live_xxx", project="multi-tenant-agent")
async def handle_agent_request(user_message: str, tenant_id: str, user_id: str):
"""Same agent code for every tenant. Veto handles policy routing."""
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=SHARED_TOOLS,
messages=messages,
)
if response.stop_reason != "tool_use":
return response
tool_blocks = [b for b in response.content if b.type == "tool_use"]
tool_results = []
for block in tool_blocks:
# Veto looks up the tenant's policy and evaluates against it
decision = veto.protect(
tool=block.name,
arguments=block.input,
context={
"tenant_id": tenant_id, # this determines which policy applies
"user_id": user_id,
"plan": get_tenant_plan(tenant_id),
}
)
if decision.action == Decision.DENY:
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": f"BLOCKED: {decision.reason}",
"is_error": True,
})
elif decision.action == Decision.APPROVAL_REQUIRED:
approval = veto.wait_for_approval(
decision_id=decision.id, timeout=600
)
if approval.granted:
result = await execute_tool(block.name, block.input, tenant_id)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result),
})
else:
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": f"DENIED: {approval.reason}",
"is_error": True,
})
else:
result = await execute_tool(block.name, block.input, tenant_id)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result),
})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})Vector Database Isolation
Vector databases are the most dangerous point of cross-tenant leakage. Semantic search is approximate by design. Two approaches prevent leakage:
- Namespaces — Create a unique namespace per tenant. Queries are scoped to the namespace. No possibility of cross-tenant results. Higher cost (one index per tenant) but strongest guarantee.
- Metadata filtering — Stamp every vector with a
tenant_idmetadata field and enforce a mandatory filter on every query. Lower cost but requires discipline: every write must include the metadata, every read must include the filter. A single missing filter leaks data.
Veto enforces the second approach at the policy level. If an agent's vector_search tool call does not include the correct tenant filter in its arguments, the call is denied before it reaches the database.
Rate Limiting Per Tenant
Global rate limits are useless in multi-tenant systems. If your limit is 1,000 tool calls per hour across all tenants, one noisy tenant can starve the others. Veto enforces rate limits per tenant, per tool, tracked by tenant context:
# Rate limits scale with plan tier
rate_limits:
free:
query_database: 100/hour
send_email: 10/hour
vector_search: 200/hour
total_tool_calls: 500/hour
pro:
query_database: 1000/hour
send_email: 100/hour
vector_search: 2000/hour
total_tool_calls: 5000/hour
enterprise:
query_database: 10000/hour
send_email: 1000/hour
vector_search: 20000/hour
total_tool_calls: 50000/hourAudit Trails Per Tenant
Enterprise tenants expect dedicated audit logs. When Acme Corp's compliance team asks "show me every action the agent took on our data last quarter," you need to produce that report instantly. Veto's decision logs are inherently tenant-scoped. Every log entry includes the tenant context, and logs can be exported per tenant for compliance review.
Defense in Depth
No single isolation mechanism is sufficient. A production multi-tenant agent system layers defenses:
- Network layer — Tenant-scoped credentials for external APIs. An agent acting on behalf of Tenant A cannot use Tenant B's API keys.
- Storage layer — Namespaced or prefixed access to files, vectors, and databases. The storage system enforces boundaries independently.
- Policy layer (Veto) — Runtime authorization on every tool call. Even if the storage layer has a misconfiguration, the policy engine blocks cross-tenant access.
- Monitoring layer — Anomaly detection on cross-tenant access patterns. If Tenant A's agent suddenly starts querying paths outside its prefix, alert immediately.
If network filtering fails, the storage layer still protects data. If a container escape happens, the host has no cross-tenant credentials. Each layer reduces the blast radius.
Getting Started
Adding multi-tenant support to an existing agent is one configuration change in Veto: include tenant_id in your protect() context and define tenant-specific policies. Your agent code stays identical across tenants.
Start free to add tenant isolation to your agent, or read about audit trails for per-tenant compliance logging.
Build your first policy