Per-tenant, per-model, and global call budgets enforced at the proxy layer. Soft warnings before hard limits. Cost attribution by department that finance teams can actually read.
Combine multiple limit types to match your cost governance structure.
| Policy type | Scope | Window | On exceed |
|---|---|---|---|
| per_tenant | One tenant's calls across all models | hourly / daily / monthly | soft-warn / hard-block |
| per_model | All tenants calling a specific model | hourly / daily | soft-warn / hard-block |
| per_tenant_model | One tenant × one model combination | hourly / daily | hard-block |
| global | Entire project across all tenants | monthly | hard-block |
| cost_budget | Estimated token cost per tenant | monthly | soft-warn at 80% / hard at 100% |
Declare all limits in the same policy YAML as your redaction and isolation rules. Single source of truth.
rate_limiting: per_tenant: window: hour limit: 500 action: soft-warn global_monthly: limit: 500000 action: hard-block cost_budget: per_tenant_monthly_usd: 150
Soft warnings before hard cutoffs. Cost attribution finance can read. Setup in under a day.