Smart Model Routing Policy Template

Automatically route AI agent requests to the most cost-effective model based on the task complexity, token count, or agent role. Simple queries go to fast, cheap models while complex tasks use premium models. Cut your LLM API costs by 60-80% without changing a single line of agent code.

What this prevents

A team was running all their AI agent queries through GPT-4o at $5/million input tokens. After analyzing their logs, they found that 70% of requests were simple classifications or short Q&A that GPT-4o-mini ($0.15/million tokens) could handle equally well. By adding smart routing, they redirected the simple queries to the cheaper model — cutting their monthly LLM bill from $2,400 to $680 while maintaining the same output quality for complex tasks.

Policy template

Copy this into your govyn.yaml and adjust the values to match your requirements.

govyn.yaml
routing:
  rules:
    - name: short_queries_to_mini
      condition:
        input_tokens_below: 500
      route_to: gpt-4o-mini
      
    - name: code_to_sonnet
      condition:
        content_contains: ["```", "function", "class", "def "]
      route_to: claude-sonnet-4-20250514
      
    - name: default_to_haiku
      condition:
        always: true
      route_to: claude-haiku-4-5-20251001

agents:
  analyst_agent:
    routing: auto
    models:
      allow: [gpt-4o, gpt-4o-mini, claude-sonnet-4-20250514, claude-haiku-4-5-20251001]
    budget:
      daily: $10.00
    fallback:
      model: gpt-4o-mini
      on: [rate_limit, timeout]

How it works

1

Agent sends a request through Govyn

The agent requests a model as usual (e.g. gpt-4o). Govyn intercepts the request before forwarding it to the provider.

2

Routing rules are evaluated in order

Govyn checks each routing rule against the request. Rules can match on input token count, content patterns, agent identity, or custom conditions. The first matching rule determines the target model.

3

Request is rerouted to the optimal model

If a rule matches, Govyn silently rewrites the model field to the cheaper alternative. The agent doesn't know or care — it gets back a response in the same format regardless of which model served it.

4

Fallback on errors

If the routed model hits a rate limit or timeout, Govyn automatically falls back to the configured fallback model, ensuring the agent always gets a response.

Configuration options

Option Description Example
routing.rules.condition Match criteria for routing (token count, content, etc.) input_tokens_below: 500
routing.rules.route_to Target model when the condition matches gpt-4o-mini
agents.*.routing Enable auto-routing for a specific agent auto
fallback.model Model to use when the primary fails gpt-4o-mini
fallback.on Error conditions that trigger fallback [rate_limit, timeout]

Add this policy to your config

Start Govyn with this policy in under 5 minutes. No code changes needed.

Get started

Related policy templates

Explore more