Architecture

Multi-Tenant AI Agent Architecture

Three isolation models for multi-tenant AI agents, with ASCII architecture diagrams, per-tenant YAML policies, vector database isolation patterns, and defense-in-depth strategies for SaaS at scale.

Anirudh PatelFebruary 7, 202618 min

By the end of 2026, an estimated 40% of enterprise applications will embed task-specific AI agents, up from less than 5% in early 2025. Most of these applications are multi-tenant SaaS. And unlike standard SaaS where a database row-level policy suffices, AI agents introduce a fundamentally new isolation challenge: agents work with vector databases, long-term memory, tool access, and external APIs. A bad permission or a prompt injection can cross tenant boundaries in ways no traditional access control model anticipated.

Why Traditional Tenant Isolation Breaks

In a standard SaaS app, tenant isolation is straightforward: every database query includes a WHERE tenant_id = ? clause, API requests are scoped to an org, and the application never leaks data across tenants. AI agents break this model in five ways:

  1. Agents don't log in. A traditional user has a session with a tenant context. An agent is invoked on behalf of a user, but it does not authenticate as a tenant. Every tool call must be manually scoped.
  2. Vector search is approximate. Semantic search returns nearest neighbors, not exact matches. If you mix tenant embeddings in one index without strict metadata filtering, a query from Tenant A can return Tenant B's documents.
  3. Tool access is lateral. An agent with access to a file system, database, or API can reach across tenant boundaries unless every tool enforces tenant scoping independently.
  4. Memory persists across sessions. Agent memory systems (conversation history, learned preferences, cached tool results) can leak information between tenants if not isolated.
  5. Prompt injection crosses boundaries. A malicious user in Tenant A can craft input that causes the agent to access Tenant B's data through tool calls. Prompt-level isolation does not prevent this.

Architecture: Three Isolation Models

There are three patterns for structuring multi-tenant AI agents, each with different cost-isolation tradeoffs:

isolation_models.txttext
Model 1: Fully Siloed (highest cost, strongest isolation)
┌─────────────────────────────────────────────────────────┐
│ Tenant A                    │ Tenant B                  │
│ ┌─────────────────────────┐ │ ┌─────────────────────────┐
│ │ Agent Instance           │ │ │ Agent Instance           │
│ │ Vector DB (dedicated)    │ │ │ Vector DB (dedicated)    │
│ │ File Storage (dedicated) │ │ │ File Storage (dedicated) │
│ │ Memory Store (dedicated) │ │ │ Memory Store (dedicated) │
│ │ Tool Credentials (own)   │ │ │ Tool Credentials (own)   │
│ └─────────────────────────┘ │ └─────────────────────────┘
└─────────────────────────────────────────────────────────┘
Cost: ~$150-500/tenant/month   Blast radius: single tenant

Model 2: Shared Infra, Logical Isolation (balanced)
┌─────────────────────────────────────────────────────────┐
│                    Shared Infrastructure                  │
│ ┌─────────────────────────────────────────────────────┐  │
│ │ Agent Runtime (shared, context-injected per tenant)  │  │
│ └──────────────────────┬──────────────────────────────┘  │
│                        │                                  │
│ ┌──────────────┐ ┌─────┴──────────┐ ┌──────────────┐    │
│ │ Vector DB    │ │ Veto Policy    │ │ File Storage  │    │
│ │ (namespaced) │ │ Engine         │ │ (prefixed)    │    │
│ │ tenant_a/*   │ │ per-tenant     │ │ /tenant_a/*   │    │
│ │ tenant_b/*   │ │ policies       │ │ /tenant_b/*   │    │
│ └──────────────┘ └────────────────┘ └──────────────┘    │
└─────────────────────────────────────────────────────────┘
Cost: ~$20-50/tenant/month     Blast radius: controlled by policy

Model 3: Fully Shared (lowest cost, weakest isolation)
┌─────────────────────────────────────────────────────────┐
│ Single shared agent, single DB, tenant_id column only    │
│ ⚠ NOT RECOMMENDED for AI agents — too many leak vectors │
└─────────────────────────────────────────────────────────┘

Model 2 is where most production systems land. Shared infrastructure keeps costs manageable. Logical isolation through a policy engine prevents cross-tenant access without the operational burden of managing hundreds of isolated deployments.

Per-Tenant Policies with Veto

Different tenants have different security requirements. An enterprise customer on a SOC 2 compliant plan needs approval workflows and full audit trails. A startup on a free tier needs basic rate limiting. Veto lets you define per-tenant policies that the same agent runtime evaluates at execution time:

policies/tenant-acme-corp.yamlyaml
# Acme Corp — Enterprise plan, SOC2 compliant
name: acme-corp
tenant_id: tenant_acme
plan: enterprise

rules:
  - tool: query_database
    conditions:
      # Enforce tenant scoping on every query
      - match:
          arguments.query: "tenant_id\s*=\s*'tenant_acme'"
        action: allow
      - match:
          arguments.query: ".*"
        action: deny
        reason: "Query must include tenant_id = 'tenant_acme'"

  - tool: send_email
    constraints:
      rate_limit: 50/hour
    conditions:
      - match:
          arguments.to: "@acmecorp.com$"
        action: allow
      - match:
          arguments.to: ".*"
        action: require_approval
        approval:
          channel: slack
          webhook: "https://hooks.slack.com/services/acme/xxx"
          timeout: 600s

  - tool: access_file
    conditions:
      - match:
          arguments.path: "^/data/tenant_acme/"
        action: allow
      - match:
          arguments.path: "^/data/"
        action: deny
        reason: "Cross-tenant file access denied"

  default_action: deny
  logging:
    level: full
    retention: 3years
policies/tenant-startup-xyz.yamlyaml
# Startup XYZ — Free tier, basic controls
name: startup-xyz
tenant_id: tenant_xyz
plan: free

rules:
  - tool: query_database
    conditions:
      - match:
          arguments.query: "tenant_id\s*=\s*'tenant_xyz'"
        action: allow
      - match:
          arguments.query: ".*"
        action: deny

  - tool: send_email
    constraints:
      rate_limit: 10/hour    # lower limit for free tier
    action: allow

  - tool: access_file
    conditions:
      - match:
          arguments.path: "^/data/tenant_xyz/"
        action: allow
      - match:
          arguments.path: ".*"
        action: deny

  default_action: deny
  logging:
    level: decisions_only     # reduced logging for free tier
    retention: 90days

The Runtime: Tenant Context Injection

The agent runtime is shared. On every request, Veto receives the tenant context and evaluates the correct policy. There is no conditional logic in your agent code. The same protect() call handles every tenant:

multi_tenant_runtime.pypython
from veto import Veto, Decision
import anthropic

client = anthropic.Anthropic()
veto = Veto(api_key="veto_live_xxx", project="multi-tenant-agent")

async def handle_agent_request(user_message: str, tenant_id: str, user_id: str):
    """Same agent code for every tenant. Veto handles policy routing."""
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            tools=SHARED_TOOLS,
            messages=messages,
        )

        if response.stop_reason != "tool_use":
            return response

        tool_blocks = [b for b in response.content if b.type == "tool_use"]
        tool_results = []

        for block in tool_blocks:
            # Veto looks up the tenant's policy and evaluates against it
            decision = veto.protect(
                tool=block.name,
                arguments=block.input,
                context={
                    "tenant_id": tenant_id,  # this determines which policy applies
                    "user_id": user_id,
                    "plan": get_tenant_plan(tenant_id),
                }
            )

            if decision.action == Decision.DENY:
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": f"BLOCKED: {decision.reason}",
                    "is_error": True,
                })
            elif decision.action == Decision.APPROVAL_REQUIRED:
                approval = veto.wait_for_approval(
                    decision_id=decision.id, timeout=600
                )
                if approval.granted:
                    result = await execute_tool(block.name, block.input, tenant_id)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": str(result),
                    })
                else:
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": f"DENIED: {approval.reason}",
                        "is_error": True,
                    })
            else:
                result = await execute_tool(block.name, block.input, tenant_id)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": str(result),
                })

        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})

Vector Database Isolation

Vector databases are the most dangerous point of cross-tenant leakage. Semantic search is approximate by design. Two approaches prevent leakage:

  • Namespaces — Create a unique namespace per tenant. Queries are scoped to the namespace. No possibility of cross-tenant results. Higher cost (one index per tenant) but strongest guarantee.
  • Metadata filtering — Stamp every vector with a tenant_id metadata field and enforce a mandatory filter on every query. Lower cost but requires discipline: every write must include the metadata, every read must include the filter. A single missing filter leaks data.

Veto enforces the second approach at the policy level. If an agent's vector_search tool call does not include the correct tenant filter in its arguments, the call is denied before it reaches the database.

Rate Limiting Per Tenant

Global rate limits are useless in multi-tenant systems. If your limit is 1,000 tool calls per hour across all tenants, one noisy tenant can starve the others. Veto enforces rate limits per tenant, per tool, tracked by tenant context:

rate_limiting.yamlyaml
# Rate limits scale with plan tier
rate_limits:
  free:
    query_database: 100/hour
    send_email: 10/hour
    vector_search: 200/hour
    total_tool_calls: 500/hour

  pro:
    query_database: 1000/hour
    send_email: 100/hour
    vector_search: 2000/hour
    total_tool_calls: 5000/hour

  enterprise:
    query_database: 10000/hour
    send_email: 1000/hour
    vector_search: 20000/hour
    total_tool_calls: 50000/hour

Audit Trails Per Tenant

Enterprise tenants expect dedicated audit logs. When Acme Corp's compliance team asks "show me every action the agent took on our data last quarter," you need to produce that report instantly. Veto's decision logs are inherently tenant-scoped. Every log entry includes the tenant context, and logs can be exported per tenant for compliance review.

Defense in Depth

No single isolation mechanism is sufficient. A production multi-tenant agent system layers defenses:

  1. Network layer — Tenant-scoped credentials for external APIs. An agent acting on behalf of Tenant A cannot use Tenant B's API keys.
  2. Storage layer — Namespaced or prefixed access to files, vectors, and databases. The storage system enforces boundaries independently.
  3. Policy layer (Veto) — Runtime authorization on every tool call. Even if the storage layer has a misconfiguration, the policy engine blocks cross-tenant access.
  4. Monitoring layer — Anomaly detection on cross-tenant access patterns. If Tenant A's agent suddenly starts querying paths outside its prefix, alert immediately.

If network filtering fails, the storage layer still protects data. If a container escape happens, the host has no cross-tenant credentials. Each layer reduces the blast radius.

Getting Started

Adding multi-tenant support to an existing agent is one configuration change in Veto: include tenant_id in your protect() context and define tenant-specific policies. Your agent code stays identical across tenants.

Start free to add tenant isolation to your agent, or read about audit trails for per-tenant compliance logging.

Build your first policy