Ollama + Govyn — Govern Your Local LLM Agents

Running local LLMs with Ollama gives you privacy and cost savings, but it also means no visibility into what your agents are doing. Without governance, you can't enforce model restrictions, rate limit agents, or maintain audit trails — making compliance and debugging a guessing game.

How it works

Ollama

Your agents

─ HTTPS →

Govyn Proxy

Policy · Budget · Logs

─ API →

Ollama API

LLM provider

Step-by-step setup

Start Ollama with your model

bash

ollama pull llama3.1
ollama serve

Configure Govyn to route to Ollama

yaml

# govyn.yaml
routing:
  ollama:
    upstream: http://localhost:11434
    format: openai-compatible

Point your agent at Govyn

python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:4111/v1",
    api_key="gvn_agent_ollama_01"
)

response = client.chat.completions.create(
    model="llama3.1",
    messages=[{"role": "user", "content": "Summarize this document"}]
)

Example policy

Define governance rules for your Ollama agents in a simple YAML file.

govyn.yaml

routing:
  ollama:
    upstream: http://localhost:11434
    format: openai-compatible

agents:
  ollama_01:
    models:
      allow: [llama3.1, codellama, mistral]
      deny: [llama3.1:70b]
    rate_limit:
      requests_per_minute: 20
      concurrent: 2
    logging:
      replay: true
      log_prompts: true

Why use Govyn with Ollama?

Model allowlists for local models

Rate limiting to protect GPU resources

Concurrency limits per agent

Full prompt and response logging

Works with Ollama's OpenAI-compatible API

Combine local and cloud models through one proxy

Get started in 5 minutes

Add governance to your Ollama agents with a single config change. No code rewrites.

Read the docs

Frequently asked questions

Why do I need governance for free local models?

Even though local models don't have per-token costs, they consume GPU/CPU resources. Govyn lets you rate limit agents, restrict model sizes (blocking 70B models, for example), limit concurrency, and maintain audit trails — all critical for production local LLM deployments.

Can I route some requests to Ollama and others to OpenAI?

Yes. Govyn's smart routing lets you send requests to different backends based on model name or agent key. You can run cheap tasks on local Ollama models and route complex tasks to OpenAI — all through a single proxy endpoint.

Does Govyn work with Ollama's OpenAI-compatible API?

Yes. Ollama exposes an OpenAI-compatible API, and Govyn supports it natively. Your agents talk to Govyn using the standard OpenAI SDK, and Govyn forwards to Ollama — no custom adapters needed.

Related integrations

Azure OpenAI

Add governance to Azure OpenAI deployments. Enforce budgets, model policies, and compliance logging for enterprise AI agents.

Node.js + Any Provider

Govern custom Node.js AI agents with Govyn. Works with the OpenAI SDK, fetch, and Vercel AI SDK. Budget limits and full replay.

Python + Any Provider

Add governance to any Python AI agent. Works with requests, httpx, and the OpenAI SDK. Budget limits, policy enforcement, full replay.

Explore more

FROM OUR BLOG

We Cut Our AI API Bill by 73% Without Changing a Single Line of Agent Code

How smart model routing through a proxy cut our OpenAI and Anthropic bill from $2,140/mo to $578/mo. Zero code changes. Just YAML.

POLICY TEMPLATE

Smart Model Routing Policy

Automatically route AI agent requests to cheaper models when possible. Cut LLM costs by 60-80% with smart model routing policies.

POLICY TEMPLATE

Loop Detection Policy

Detect and stop AI agent infinite loops automatically. Prevent runaway tool calls and recursive chains with proxy-level loop detection.

COMPARISON

Govyn vs LiteLLM

Compare Govyn and LiteLLM for AI agent governance. See how a governance-first proxy differs from a multi-provider routing gateway.