Tenant Isolation Patterns for Multi-Tenant LLM Infrastructure

Kevin McGrath November 14, 2025 12 min read

Multi-tenant LLM infrastructure is the default architecture for enterprises deploying AI across multiple internal departments or serving multiple external clients from a shared platform. It's also one of the more underspecified parts of enterprise AI governance discussions, which tend to focus on model behavior and data privacy while treating the isolation layer between tenants as a deployment detail.

Tenant bleed — where one tenant's data, context, or inference behavior leaks into another tenant's session — is not theoretical. It's an architectural inevitability without explicit, layered isolation controls. The question isn't whether shared LLM infrastructure can bleed; it's whether your isolation model is sufficient to make bleed a detectable, auditable, bounded event rather than an invisible, persistent condition.

This post compares three isolation models at three different layers of the stack: the prompt/context layer, the vector store layer, and the log layer. Each has different cost and latency implications, and the right combination depends on the trust boundaries between your tenants.

Why Tenant Isolation in LLM Infrastructure Is Different from Traditional Multi-Tenancy

In traditional multi-tenant SaaS, isolation is primarily a data access problem: tenant A's database rows should not be accessible to tenant B's queries. The isolation boundary is the access control layer in front of the database. The data itself is relatively inert — it doesn't interpret or respond to what tenant B does.

LLM multi-tenancy has an additional problem: the model itself is stateful in ways that can carry information across tenant boundaries. Specifically:

Prompt injection cross-tenant leakage: If tenant B can cause the model to receive content from a system prompt or context intended for tenant A — whether through a direct injection attack or through a misconfigured shared system prompt — tenant B has access to tenant A's operational context, not just their data.
KV cache sharing: Some LLM providers cache key-value computations across requests for performance. If system prompt prefixes are shared across tenants to reduce latency, the KV cache may contain representations derived from one tenant's system prompt that influence another tenant's inference. This is a subtle channel that most teams aren't thinking about.
RAG corpus cross-contamination: If tenants share a vector store without namespace isolation, a query from tenant B may retrieve documents indexed by tenant A, even if tenant B has no business accessing tenant A's knowledge base.

NIST 800-53 SC-4 (Information in Shared Resources) requires that systems prevent unauthorized and unintended information transfer via shared resources. LLM inference infrastructure — particularly shared context windows and vector stores — is a shared resource in the SC-4 sense, and it requires explicit isolation controls to meet that control baseline.

Model 1: Namespace Isolation (Soft Isolation)

The lightest-weight isolation model uses per-tenant namespaces within a shared infrastructure stack. At the vector store layer: Pinecone namespace-per-tenant, Weaviate multi-tenancy with tenant-specific shard allocation, or pgvector with schema-per-tenant on a shared Postgres instance. At the prompt layer: per-tenant system prompt configurations that don't share content between tenants. At the log layer: all log entries tagged with tenant_id and query interfaces that enforce tenant scoping without allowing cross-tenant queries except by explicitly authorized admin roles.

Namespace isolation is appropriate when:

Tenants are internal departments with low adversarial threat levels against each other
The data processed per tenant doesn't carry distinct regulatory classification (no tenant is HIPAA-covered while another is not)
The cost of stronger isolation at scale is prohibitive

The blast radius of a failure in the namespace isolation model is bounded by the namespace enforcement mechanism. A misconfigured query filter — a WHERE clause missing the tenant_id predicate, for example — can expose cross-tenant documents. Mitigation requires consistent filter enforcement at every query path, which is why the policy plane is the right enforcement point: if all LLM calls pass through a central proxy that appends the tenant namespace predicate before dispatching to the vector store, no application code path can accidentally omit it.

Model 2: Schema-Per-Tenant with KMS Key Isolation (Mid-Tier Isolation)

The mid-tier model extends namespace isolation with dedicated encryption keys per tenant, stored in a KMS (AWS KMS, Azure Key Vault, or equivalent) with per-tenant key policies. This means that even if a query filter is misconfigured and tenant B's query retrieves a document from tenant A's schema, the document cannot be decrypted without tenant A's KMS key, which tenant B's service credentials don't have access to.

The KMS-per-tenant model adds meaningful cryptographic assurance on top of the access control isolation. For a reinsurer top-20 managing LLM deployments across multiple lines of business — property, casualty, life, specialty — where each line of business has distinct data classification requirements and potentially distinct regulatory obligations, KMS key isolation ensures that a misconfiguration at the access control layer doesn't immediately become a data exposure event. The cryptographic layer is the defense-in-depth catch.

At the vector store layer, schema-per-tenant (pgvector) or dedicated collection-per-tenant (Weaviate) with per-tenant encryption at rest ensures that the storage layer is isolated even from storage-layer administrators. The blast radius of a credential compromise is bounded to the KMS keys accessible to the compromised credential.

Latency implication: KMS key operations for encryption and decryption add latency per operation — typically in the 1–5ms range per call for modern KMS implementations — which is acceptable for most enterprise use cases but meaningful for high-throughput real-time inference pipelines.

Model 3: Full Infrastructure Isolation (Hard Isolation)

The strongest isolation model dedicates separate inference endpoints, vector stores, and log storage to each tenant. This is the architecture appropriate when tenants have adversarial trust relationships, when regulatory requirements prohibit shared infrastructure between tenant classes (for example, a public sector agency whose data has FedRAMP authorization requirements cannot share infrastructure with a commercial tenant), or when a data breach in one tenant must have zero blast radius on others.

In deployments we've reviewed, full infrastructure isolation is most commonly required for:

Multi-tenant platforms where external clients — not internal departments — are the tenants, and where each client has its own data processing agreement and regulatory posture
Public sector deployments where a single state agency cannot share LLM infrastructure with other agencies because of distinct data sensitivity classifications
Healthcare deployments where different hospital systems are tenants and where cross-hospital PHI leakage is a HIPAA BAA violation regardless of how small the probability

The cost of full infrastructure isolation is significant: dedicated model endpoints mean higher inference costs (no request batching across tenants), dedicated vector stores mean higher storage and index maintenance costs, and dedicated log storage means higher operational overhead. For a platform serving 50 external enterprise clients with full isolation, the infrastructure cost profile is meaningfully higher than a shared namespace model. That cost is the price of the isolation guarantee — and for the tenant types that require it, there is no negotiating the guarantee down on cost grounds.

Log Isolation: The Layer Most Teams Forget

Tenant isolation at the compute and storage layers is commonly understood. Log isolation is less often implemented. An enterprise LLM deployment where all tenants' audit logs land in a single log aggregation system with a shared query interface has effectively failed tenant isolation at the observability layer, even if the inference and storage layers are fully isolated.

The log isolation requirement has two components:

Storage isolation: Tenant logs should be stored in tenant-scoped partitions, with access controls that prevent tenant B's authorized users from querying tenant A's logs. For compliance purposes, this is a data segregation requirement: a Tier-1 bank deploying an LLM for wealth advisory applications should not have its audit log data co-mingled with the log data of another financial institution on the same platform.

Query interface isolation: The interface through which compliance personnel query audit logs must enforce tenant scoping. A log query API that requires authentication and returns only results tagged with the authenticated user's tenant is technically straightforward but requires deliberate design — it doesn't happen automatically.

We're not saying that shared log infrastructure with tenant-scoped filters is always wrong. We're saying that when tenants have distinct regulatory audit obligations — a financial services tenant under OCC Heightened Standards, a healthcare tenant under HIPAA, a public sector tenant under state data governance requirements — shared log storage with filter-based isolation creates a dependency on filter correctness that can be litigated in an audit. Dedicated log storage per tenant eliminates that dependency.

Prompt Injection as a Cross-Tenant Attack Vector

Prompt injection in a multi-tenant LLM deployment is qualitatively different from prompt injection in a single-tenant deployment. In a single-tenant context, a successful injection attack compromises the attacker's own session or escalates their privileges within the same tenant. In a multi-tenant context, a prompt injection attack that reaches the system prompt or crosses a namespace boundary can expose another tenant's configuration or data.

The concrete scenario: tenant B's user submits a prompt that, through indirect injection in a retrieved document, causes the model to output content from tenant A's system prompt configuration. This requires that the system prompt injection vulnerability and the tenant namespace misconfiguration both be present simultaneously — defense in depth means that closing either gap prevents the cross-tenant exposure even if the other gap exists.

This is why Meibel's isolation model enforces tenant boundaries at the proxy layer, before the prompt reaches the model. The proxy layer is the only place in the stack where all three isolation requirements — prompt isolation, retrieval isolation, and log isolation — can be enforced consistently and atomically. Application-level isolation that depends on each application correctly implementing tenant filters is not a durable architecture; the proxy layer that all applications must pass through is.

Choosing the Right Model for Your Context

The three isolation models represent a spectrum from cost-efficient to guarantee-strong. Most enterprise deployments don't need full infrastructure isolation for all their tenants — but almost no enterprise deployment can justify namespace isolation alone when the tenant population includes external clients or regulated data classifications that differ between tenants.

The practical decision process:

Map each tenant's data classification: PHI, PII, CUI (Controlled Unclassified Information), financial account data, attorney-client privileged material. If any two tenants have different classification requirements, they need at minimum schema/KMS isolation — not just namespace isolation.
Assess adversarial threat model between tenants: are tenants internal departments with low adversarial motivation, or external clients who have commercial interests in accessing each other's data? External clients require stronger isolation.
Review regulatory requirements for your tenant population: if any tenant is subject to a data isolation requirement (FedRAMP, HIPAA BAA, OCC Heightened Standards data segregation expectations), that requirement sets the floor for all tenants on the platform — you cannot offer a weaker isolation model to one tenant without it affecting the assurance level you can claim for others.

The isolation model you choose determines not just your technical architecture but the content of your security documentation, your DPAs, and the answers you give in vendor security questionnaires. Getting it right at the architecture stage is significantly less expensive than retrofitting it after you've already told enterprise clients that their data is isolated.

Meibel's tenant isolation is enforced at the infrastructure layer. See how it works or request access.

PII in RAG Pipelines