Ollama Provider
Ollama model provider configuration.
Ollama is a lightweight framework for running large language models locally. Rensei integrates with Ollama to enable local model serving without cloud dependency or API costs.
Provider Summary
| Attribute | Value |
|---|---|
| Provider ID | ollama |
| Display name | Ollama |
| Config namespace | openai (OpenAI-compatible endpoint config) |
| Supported auth modes | local |
| Requires endpoint | Yes (must provide Ollama server URL) |
| Category | Local |
Ollama only supports the local auth mode. Profiles must include local in their auth_modes array. The org and project access policy must also allow local for this mode to be reachable. See Auth Modes.
When to Use Ollama
Use Ollama when you want:
- No API costs - Models run on your hardware; you pay zero per-token.
- Privacy - All data stays on your machine or network; nothing leaves your infrastructure.
- Full control - You choose which models run, when, and how.
- Offline capability - Once models are downloaded, Ollama works without internet.
Prerequisites
- Ollama installed - Download from ollama.ai.
- Ollama server running - Start with
ollama serve(default:localhost:11434). - A model downloaded - Pull a model with
ollama pull mistral(or any supported model).
Auth Mode: Local Only
Ollama only supports local auth mode because it's a local endpoint. No API keys required.
Setup
1. Start Ollama Server
ollama serveThis starts the Ollama HTTP server on localhost:11434 (or your configured port).
2. Pull a Model
ollama pull mistral # ~4GB
ollama pull neural-chat # ~5GB
ollama pull orca-mini # ~1.5GBList available models:
ollama list3. Create a Rensei Profile
- In Settings → Integrations, ensure you have a local capacity pool configured.
- In Settings → Model Profiles, click New Profile.
- Fill in:
- Name: e.g., "local-mistral"
- Provider: Ollama
- Auth mode:
local - Endpoint:
http://localhost:11434(or your server URL) - Model ID:
mistral(or the exact name fromollama list)
- Click Test Connection - Rensei will call
GET /api/tagsto list your models. - Click Create.
4. Dispatch
In a workflow LLM node:
nodeId: local_inference
nodeType: action/llm.inference
config:
profileId: prof_local_mistral
systemPrompt: "You are a helpful assistant."Or dispatch from CLI:
rensei dispatch \
--project my-project \
--profile prof_local_mistral \
"What is the capital of France?"Available Models
Popular Ollama models:
| Model | Size | Speed | Quality | Use case |
|---|---|---|---|---|
| mistral | 4GB | ⚡⚡⚡ Fast | ⭐⭐⭐ Good | General purpose; recommended default |
| neural-chat | 5GB | ⚡⚡ Medium | ⭐⭐⭐⭐ Excellent | Chat and reasoning |
| orca-mini | 1.5GB | ⚡⚡⚡⚡ Very fast | ⭐⭐ Adequate | Lightweight; mobile/edge |
| llama2 | 7GB | ⚡⚡ Medium | ⭐⭐⭐⭐ Excellent | Code + text; versatile |
| code-llama | 6GB | ⚡⚡⚡ Fast | ⭐⭐⭐⭐ Excellent | Code generation specialist |
| dolphin-mixtral | 26GB | ⚡ Slow | ⭐⭐⭐⭐⭐ Excellent | Advanced reasoning |
Run ollama list to see everything available.
Configuration
Ollama shares the openai config namespace (since it mimics OpenAI's API):
{
"providerConfig": {
"openai": {
"temperature": 0.7,
"topP": 0.9,
"contextWindow": 4096
}
}
}Supported fields:
| Field | Type | Description |
|---|---|---|
temperature | number | Randomness (0=deterministic, 1=creative; default 0.7) |
topP | number | Nucleus sampling (0-1; default 0.9) |
topK | number | Top-K sampling |
contextWindow | number | Max context tokens (must not exceed model's max) |
numPredict | number | Max output tokens (optional) |
repeatPenalty | number | Penalize repetition (0-2; default 1.0) |
Example: Code Generation Locally
# Pull the code-specific model
ollama pull code-llama
# Create a Rensei profile
rensei profile create \
--name "local-code-llama" \
--provider ollama \
--model-id "code-llama" \
--auth-mode local \
--scope project \
--project-id my-project
# Dispatch code generation
rensei dispatch \
--project my-project \
--profile prof_local_code_llama \
"Write a Python function to compute Fibonacci numbers"Network Setup
By default, Ollama listens on localhost:11434. To expose it over the network:
On macOS / Linux
Bind to 0.0.0.0:
OLLAMA_HOST=0.0.0.0:11434 ollama serveThen from another machine:
# Rensei profile endpoint
endpoint: "http://192.168.1.100:11434"Security Note
Ollama has no built-in authentication. Only expose it on trusted networks. For security:
- Don't expose on the public internet.
- Use a private VPN or corporate network.
- Place behind an authenticating reverse proxy if exposed remotely.
Resource Management
Ollama will consume CPU/GPU when models run. To limit resource usage:
Limit CPU threads:
OLLAMA_NUM_THREAD=8 ollama serveLimit model memory (GPU):
# Load only 2GB of model weights into VRAM
OLLAMA_NUM_GPU=2 ollama serveUnload model after timeout:
# Unload after 5 minutes of inactivity
OLLAMA_KEEP_ALIVE=5m ollama serveTroubleshooting
"Connection refused at http://localhost:11434"
Ollama server is not running. Start it:
ollama serve"Model not found: mistral"
The model is not downloaded. Pull it first:
ollama pull mistral"Out of memory"
Your machine doesn't have enough RAM for the model. Either:
- Switch to a smaller model (
orca-miniinstead ofmistral). - Reduce
OLLAMA_NUM_THREADorOLLAMA_NUM_GPU. - Add more RAM.
Inference is very slow
Running without GPU acceleration. Check if you have a supported GPU (NVIDIA CUDA, Apple Metal) and ensure Ollama is configured to use it. Models run 5-20× faster with GPU.
"Endpoint validation failed"
Endpoint URL is wrong or Ollama is not responding. Verify:
curl http://localhost:11434/api/tagsShould return a JSON list of models.
Pricing & Cost
Ollama incurs zero platform cost beyond your machine's electricity. Compute is entirely local.
Cost events still emit for audit purposes, but you see $0.00 usage since there's no provider billing.
Further Reading
- Auth Modes - Local auth mode details
- Model Catalog & Routing - Profile management
- OpenCode Provider - Alternative local provider
- Ollama Official Docs - Full Ollama reference