Smart Model Routing Policy Template

Automatically route AI agent requests to the most cost-effective model based on the task complexity, token count, or agent role. Simple queries go to fast, cheap models while complex tasks use premium models. Cut your LLM API costs by 60-80% without changing a single line of agent code.

What this prevents

A team was running all their AI agent queries through GPT-4o at $5/million input tokens. After analyzing their logs, they found that 70% of requests were simple classifications or short Q&A that GPT-4o-mini ($0.15/million tokens) could handle equally well. By adding smart routing, they redirected the simple queries to the cheaper model — cutting their monthly LLM bill from $2,400 to $680 while maintaining the same output quality for complex tasks.

Policy template

Copy this into your govyn.yaml and adjust the values to match your requirements.

govyn.yaml

routing:
  rules:
    - name: short_queries_to_mini
      condition:
        input_tokens_below: 500
      route_to: gpt-4o-mini
      
    - name: code_to_sonnet
      condition:
        content_contains: ["```", "function", "class", "def "]
      route_to: claude-sonnet-4-20250514
      
    - name: default_to_haiku
      condition:
        always: true
      route_to: claude-haiku-4-5-20251001

agents:
  analyst_agent:
    routing: auto
    models:
      allow: [gpt-4o, gpt-4o-mini, claude-sonnet-4-20250514, claude-haiku-4-5-20251001]
    budget:
      daily: $10.00
    fallback:
      model: gpt-4o-mini
      on: [rate_limit, timeout]

How it works

Agent sends a request through Govyn

The agent requests a model as usual (e.g. gpt-4o). Govyn intercepts the request before forwarding it to the provider.

Routing rules are evaluated in order

Govyn checks each routing rule against the request. Rules can match on input token count, content patterns, agent identity, or custom conditions. The first matching rule determines the target model.

Request is rerouted to the optimal model

If a rule matches, Govyn silently rewrites the model field to the cheaper alternative. The agent doesn't know or care — it gets back a response in the same format regardless of which model served it.

Fallback on errors

If the routed model hits a rate limit or timeout, Govyn automatically falls back to the configured fallback model, ensuring the agent always gets a response.

Configuration options

Option	Description	Example
`routing.rules.condition`	Match criteria for routing (token count, content, etc.)	`input_tokens_below: 500`
`routing.rules.route_to`	Target model when the condition matches	`gpt-4o-mini`
`agents.*.routing`	Enable auto-routing for a specific agent	`auto`
`fallback.model`	Model to use when the primary fails	`gpt-4o-mini`
`fallback.on`	Error conditions that trigger fallback	`[rate_limit, timeout]`

Add this policy to your config

Start Govyn with this policy in under 5 minutes. No code changes needed.

Get started

Related policy templates

Budget Control

Set daily and monthly spending limits for AI agents. Prevent runaway costs with hard budget caps enforced at the proxy level.

Loop Detection

Detect and stop AI agent infinite loops automatically. Prevent runaway tool calls and recursive chains with proxy-level loop detection.

Production Safety

Protect production environments from AI agent damage. Model restrictions, rate limits, and approval gates for high-risk operations.

Explore more

FROM OUR BLOG

We Cut Our AI API Bill by 73% Without Changing a Single Line of Agent Code

How smart model routing through a proxy cut our OpenAI and Anthropic bill from $2,140/mo to $578/mo. Zero code changes. Just YAML.

INTEGRATION

LangChain + OpenAI Integration

Add budget limits, policy enforcement, and full replay to LangChain agents using OpenAI. Five-minute setup, zero code changes.

INTEGRATION

Ollama Integration

Govern local LLM agents running on Ollama. Enforce model policies, rate limits, and full audit trails for self-hosted models.

COMPARISON

Govyn vs LiteLLM

Compare Govyn and LiteLLM for AI agent governance. See how a governance-first proxy differs from a multi-provider routing gateway.