Ollama Provider

Ollama is a lightweight framework for running large language models locally. Rensei integrates with Ollama to enable local model serving without cloud dependency or API costs.

Provider Summary

Attribute	Value
Provider ID	`ollama`
Display name	Ollama
Config namespace	`openai` (OpenAI-compatible endpoint config)
Supported auth modes	`local`
Requires endpoint	Yes (must provide Ollama server URL)
Category	Local

Ollama only supports the local auth mode. Profiles must include local in their auth_modes array. The org and project access policy must also allow local for this mode to be reachable. See Auth Modes.

When to Use Ollama

Use Ollama when you want:

No API costs - Models run on your hardware; you pay zero per-token.
Privacy - All data stays on your machine or network; nothing leaves your infrastructure.
Full control - You choose which models run, when, and how.
Offline capability - Once models are downloaded, Ollama works without internet.

Prerequisites

Ollama installed - Download from ollama.ai.
Ollama server running - Start with ollama serve (default: localhost:11434).
A model downloaded - Pull a model with ollama pull mistral (or any supported model).

Auth Mode: Local Only

Ollama only supports local auth mode because it's a local endpoint. No API keys required.

Setup

1. Start Ollama Server

ollama serve

This starts the Ollama HTTP server on localhost:11434 (or your configured port).

2. Pull a Model

ollama pull mistral          # ~4GB
ollama pull neural-chat      # ~5GB
ollama pull orca-mini        # ~1.5GB

List available models:

ollama list

3. Create a Rensei Profile

In Settings → Integrations, ensure you have a local capacity pool configured.
In Settings → Model Profiles, click New Profile.
Fill in:
- Name: e.g., "local-mistral"
- Provider: Ollama
- Auth mode: local
- Endpoint: http://localhost:11434 (or your server URL)
- Model ID: mistral (or the exact name from ollama list)
Click Test Connection - Rensei will call GET /api/tags to list your models.
Click Create.

4. Dispatch

In a workflow LLM node:

nodeId: local_inference
nodeType: action/llm.inference
config:
  profileId: prof_local_mistral
  systemPrompt: "You are a helpful assistant."

Or dispatch from CLI:

rensei dispatch \
  --project my-project \
  --profile prof_local_mistral \
  "What is the capital of France?"

Available Models

Popular Ollama models:

Model	Size	Speed	Quality	Use case
mistral	4GB	⚡⚡⚡ Fast	⭐⭐⭐ Good	General purpose; recommended default
neural-chat	5GB	⚡⚡ Medium	⭐⭐⭐⭐ Excellent	Chat and reasoning
orca-mini	1.5GB	⚡⚡⚡⚡ Very fast	⭐⭐ Adequate	Lightweight; mobile/edge
llama2	7GB	⚡⚡ Medium	⭐⭐⭐⭐ Excellent	Code + text; versatile
code-llama	6GB	⚡⚡⚡ Fast	⭐⭐⭐⭐ Excellent	Code generation specialist
dolphin-mixtral	26GB	⚡ Slow	⭐⭐⭐⭐⭐ Excellent	Advanced reasoning

Run ollama list to see everything available.

Configuration

Ollama shares the openai config namespace (since it mimics OpenAI's API):

{
  "providerConfig": {
    "openai": {
      "temperature": 0.7,
      "topP": 0.9,
      "contextWindow": 4096
    }
  }
}

Supported fields:

Field	Type	Description
`temperature`	number	Randomness (0=deterministic, 1=creative; default 0.7)
`topP`	number	Nucleus sampling (0-1; default 0.9)
`topK`	number	Top-K sampling
`contextWindow`	number	Max context tokens (must not exceed model's max)
`numPredict`	number	Max output tokens (optional)
`repeatPenalty`	number	Penalize repetition (0-2; default 1.0)

Example: Code Generation Locally

# Pull the code-specific model
ollama pull code-llama

# Create a Rensei profile
rensei profile create \
  --name "local-code-llama" \
  --provider ollama \
  --model-id "code-llama" \
  --auth-mode local \
  --scope project \
  --project-id my-project

# Dispatch code generation
rensei dispatch \
  --project my-project \
  --profile prof_local_code_llama \
  "Write a Python function to compute Fibonacci numbers"

Network Setup

By default, Ollama listens on localhost:11434. To expose it over the network:

On macOS / Linux

Bind to 0.0.0.0:

OLLAMA_HOST=0.0.0.0:11434 ollama serve

Then from another machine:

# Rensei profile endpoint
endpoint: "http://192.168.1.100:11434"

Security Note

Ollama has no built-in authentication. Only expose it on trusted networks. For security:

Don't expose on the public internet.
Use a private VPN or corporate network.
Place behind an authenticating reverse proxy if exposed remotely.

Resource Management

Ollama will consume CPU/GPU when models run. To limit resource usage:

Limit CPU threads:

OLLAMA_NUM_THREAD=8 ollama serve

Limit model memory (GPU):

# Load only 2GB of model weights into VRAM
OLLAMA_NUM_GPU=2 ollama serve

Unload model after timeout:

# Unload after 5 minutes of inactivity
OLLAMA_KEEP_ALIVE=5m ollama serve

Troubleshooting

"Connection refused at http://localhost:11434"

Ollama server is not running. Start it:

ollama serve

"Model not found: mistral"

The model is not downloaded. Pull it first:

ollama pull mistral

"Out of memory"

Your machine doesn't have enough RAM for the model. Either:

Switch to a smaller model (orca-mini instead of mistral).
Reduce OLLAMA_NUM_THREAD or OLLAMA_NUM_GPU.
Add more RAM.

Inference is very slow

Running without GPU acceleration. Check if you have a supported GPU (NVIDIA CUDA, Apple Metal) and ensure Ollama is configured to use it. Models run 5-20× faster with GPU.

"Endpoint validation failed"

Endpoint URL is wrong or Ollama is not responding. Verify:

curl http://localhost:11434/api/tags

Should return a JSON list of models.

Pricing & Cost

Ollama incurs zero platform cost beyond your machine's electricity. Compute is entirely local.

Cost events still emit for audit purposes, but you see $0.00 usage since there's no provider billing.