Smart Model Routing Policy Template
Automatically route AI agent requests to the most cost-effective model based on the task complexity, token count, or agent role. Simple queries go to fast, cheap models while complex tasks use premium models. Cut your LLM API costs by 60-80% without changing a single line of agent code.
What this prevents
A team was running all their AI agent queries through GPT-4o at $5/million input tokens. After analyzing their logs, they found that 70% of requests were simple classifications or short Q&A that GPT-4o-mini ($0.15/million tokens) could handle equally well. By adding smart routing, they redirected the simple queries to the cheaper model — cutting their monthly LLM bill from $2,400 to $680 while maintaining the same output quality for complex tasks.
Policy template
Copy this into your govyn.yaml and adjust the values to match your requirements.
routing:
rules:
- name: short_queries_to_mini
condition:
input_tokens_below: 500
route_to: gpt-4o-mini
- name: code_to_sonnet
condition:
content_contains: ["```", "function", "class", "def "]
route_to: claude-sonnet-4-20250514
- name: default_to_haiku
condition:
always: true
route_to: claude-haiku-4-5-20251001
agents:
analyst_agent:
routing: auto
models:
allow: [gpt-4o, gpt-4o-mini, claude-sonnet-4-20250514, claude-haiku-4-5-20251001]
budget:
daily: $10.00
fallback:
model: gpt-4o-mini
on: [rate_limit, timeout] How it works
Agent sends a request through Govyn
The agent requests a model as usual (e.g. gpt-4o). Govyn intercepts the request before forwarding it to the provider.
Routing rules are evaluated in order
Govyn checks each routing rule against the request. Rules can match on input token count, content patterns, agent identity, or custom conditions. The first matching rule determines the target model.
Request is rerouted to the optimal model
If a rule matches, Govyn silently rewrites the model field to the cheaper alternative. The agent doesn't know or care — it gets back a response in the same format regardless of which model served it.
Fallback on errors
If the routed model hits a rate limit or timeout, Govyn automatically falls back to the configured fallback model, ensuring the agent always gets a response.
Configuration options
| Option | Description | Example |
|---|---|---|
routing.rules.condition | Match criteria for routing (token count, content, etc.) | input_tokens_below: 500 |
routing.rules.route_to | Target model when the condition matches | gpt-4o-mini |
agents.*.routing | Enable auto-routing for a specific agent | auto |
fallback.model | Model to use when the primary fails | gpt-4o-mini |
fallback.on | Error conditions that trigger fallback | [rate_limit, timeout] |
Add this policy to your config
Start Govyn with this policy in under 5 minutes. No code changes needed.
Get startedRelated policy templates
Set daily and monthly spending limits for AI agents. Prevent runaway costs with hard budget caps enforced at the proxy level.
Detect and stop AI agent infinite loops automatically. Prevent runaway tool calls and recursive chains with proxy-level loop detection.
Protect production environments from AI agent damage. Model restrictions, rate limits, and approval gates for high-risk operations.
Explore more
How smart model routing through a proxy cut our OpenAI and Anthropic bill from $2,140/mo to $578/mo. Zero code changes. Just YAML.
INTEGRATIONAdd budget limits, policy enforcement, and full replay to LangChain agents using OpenAI. Five-minute setup, zero code changes.
INTEGRATIONGovern local LLM agents running on Ollama. Enforce model policies, rate limits, and full audit trails for self-hosted models.
COMPARISONCompare Govyn and LiteLLM for AI agent governance. See how a governance-first proxy differs from a multi-provider routing gateway.