Platform / Rate Limiting

Policy-based rate limits and cost controls

Per-tenant, per-model, and global call budgets enforced at the proxy layer. Soft warnings before hard limits. Cost attribution by department that finance teams can actually read.

Request Access API reference

Policy types

Combine multiple limit types to match your cost governance structure.

Policy type	Scope	Window	On exceed
per_tenant	One tenant's calls across all models	hourly / daily / monthly	soft-warn / hard-block
per_model	All tenants calling a specific model	hourly / daily	soft-warn / hard-block
per_tenant_model	One tenant × one model combination	hourly / daily	hard-block
global	Entire project across all tenants	monthly	hard-block
cost_budget	Estimated token cost per tenant	monthly	soft-warn at 80% / hard at 100%

Rate limit configuration

Declare all limits in the same policy YAML as your redaction and isolation rules. Single source of truth.

policy.yaml

rate_limiting:
  per_tenant:
    window: hour
    limit: 500
    action: soft-warn
  global_monthly:
    limit: 500000
    action: hard-block
  cost_budget:
    per_tenant_monthly_usd: 150

Stop surprises on your LLM bill.

Soft warnings before hard cutoffs. Cost attribution finance can read. Setup in under a day.

Request Access View pricing