Platform / Rate Limiting

Policy-based rate limits and cost controls

Per-tenant, per-model, and global call budgets enforced at the proxy layer. Soft warnings before hard limits. Cost attribution by department that finance teams can actually read.

Policy types

Combine multiple limit types to match your cost governance structure.

Policy type Scope Window On exceed
per_tenant One tenant's calls across all models hourly / daily / monthly soft-warn / hard-block
per_model All tenants calling a specific model hourly / daily soft-warn / hard-block
per_tenant_model One tenant × one model combination hourly / daily hard-block
global Entire project across all tenants monthly hard-block
cost_budget Estimated token cost per tenant monthly soft-warn at 80% / hard at 100%
Rate limit configuration

Declare all limits in the same policy YAML as your redaction and isolation rules. Single source of truth.

policy.yaml
rate_limiting:
  per_tenant:
    window: hour
    limit: 500
    action: soft-warn
  global_monthly:
    limit: 500000
    action: hard-block
  cost_budget:
    per_tenant_monthly_usd: 150

Stop surprises on your LLM bill.

Soft warnings before hard cutoffs. Cost attribution finance can read. Setup in under a day.