Ollama + Govyn — Govern Your Local LLM Agents

Running local LLMs with Ollama gives you privacy and cost savings, but it also means no visibility into what your agents are doing. Without governance, you can't enforce model restrictions, rate limit agents, or maintain audit trails — making compliance and debugging a guessing game.

How it works

Ollama
Your agents
HTTPS
Govyn Proxy
Policy · Budget · Logs
API
Ollama API
LLM provider

Step-by-step setup

1

Start Ollama with your model

bash
ollama pull llama3.1
ollama serve
2

Configure Govyn to route to Ollama

yaml
# govyn.yaml
routing:
  ollama:
    upstream: http://localhost:11434
    format: openai-compatible
3

Point your agent at Govyn

python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:4111/v1",
    api_key="gvn_agent_ollama_01"
)

response = client.chat.completions.create(
    model="llama3.1",
    messages=[{"role": "user", "content": "Summarize this document"}]
)

Example policy

Define governance rules for your Ollama agents in a simple YAML file.

govyn.yaml
routing:
  ollama:
    upstream: http://localhost:11434
    format: openai-compatible

agents:
  ollama_01:
    models:
      allow: [llama3.1, codellama, mistral]
      deny: [llama3.1:70b]
    rate_limit:
      requests_per_minute: 20
      concurrent: 2
    logging:
      replay: true
      log_prompts: true

Why use Govyn with Ollama?

Model allowlists for local models
Rate limiting to protect GPU resources
Concurrency limits per agent
Full prompt and response logging
Works with Ollama's OpenAI-compatible API
Combine local and cloud models through one proxy

Get started in 5 minutes

Add governance to your Ollama agents with a single config change. No code rewrites.

Read the docs

Frequently asked questions

Why do I need governance for free local models?
Even though local models don't have per-token costs, they consume GPU/CPU resources. Govyn lets you rate limit agents, restrict model sizes (blocking 70B models, for example), limit concurrency, and maintain audit trails — all critical for production local LLM deployments.
Can I route some requests to Ollama and others to OpenAI?
Yes. Govyn's smart routing lets you send requests to different backends based on model name or agent key. You can run cheap tasks on local Ollama models and route complex tasks to OpenAI — all through a single proxy endpoint.
Does Govyn work with Ollama's OpenAI-compatible API?
Yes. Ollama exposes an OpenAI-compatible API, and Govyn supports it natively. Your agents talk to Govyn using the standard OpenAI SDK, and Govyn forwards to Ollama — no custom adapters needed.

Related integrations

Explore more