Rensei docs
Providers

Ollama Provider

Ollama model provider configuration.

Ollama is a lightweight framework for running large language models locally. Rensei integrates with Ollama to enable local model serving without cloud dependency or API costs.

Provider Summary

AttributeValue
Provider IDollama
Display nameOllama
Config namespaceopenai (OpenAI-compatible endpoint config)
Supported auth modeslocal
Requires endpointYes (must provide Ollama server URL)
CategoryLocal

Ollama only supports the local auth mode. Profiles must include local in their auth_modes array. The org and project access policy must also allow local for this mode to be reachable. See Auth Modes.

When to Use Ollama

Use Ollama when you want:

  • No API costs - Models run on your hardware; you pay zero per-token.
  • Privacy - All data stays on your machine or network; nothing leaves your infrastructure.
  • Full control - You choose which models run, when, and how.
  • Offline capability - Once models are downloaded, Ollama works without internet.

Prerequisites

  1. Ollama installed - Download from ollama.ai.
  2. Ollama server running - Start with ollama serve (default: localhost:11434).
  3. A model downloaded - Pull a model with ollama pull mistral (or any supported model).

Auth Mode: Local Only

Ollama only supports local auth mode because it's a local endpoint. No API keys required.

Setup

1. Start Ollama Server

ollama serve

This starts the Ollama HTTP server on localhost:11434 (or your configured port).

2. Pull a Model

ollama pull mistral          # ~4GB
ollama pull neural-chat      # ~5GB
ollama pull orca-mini        # ~1.5GB

List available models:

ollama list

3. Create a Rensei Profile

  1. In Settings → Integrations, ensure you have a local capacity pool configured.
  2. In Settings → Model Profiles, click New Profile.
  3. Fill in:
    • Name: e.g., "local-mistral"
    • Provider: Ollama
    • Auth mode: local
    • Endpoint: http://localhost:11434 (or your server URL)
    • Model ID: mistral (or the exact name from ollama list)
  4. Click Test Connection - Rensei will call GET /api/tags to list your models.
  5. Click Create.

4. Dispatch

In a workflow LLM node:

nodeId: local_inference
nodeType: action/llm.inference
config:
  profileId: prof_local_mistral
  systemPrompt: "You are a helpful assistant."

Or dispatch from CLI:

rensei dispatch \
  --project my-project \
  --profile prof_local_mistral \
  "What is the capital of France?"

Available Models

Popular Ollama models:

ModelSizeSpeedQualityUse case
mistral4GB⚡⚡⚡ Fast⭐⭐⭐ GoodGeneral purpose; recommended default
neural-chat5GB⚡⚡ Medium⭐⭐⭐⭐ ExcellentChat and reasoning
orca-mini1.5GB⚡⚡⚡⚡ Very fast⭐⭐ AdequateLightweight; mobile/edge
llama27GB⚡⚡ Medium⭐⭐⭐⭐ ExcellentCode + text; versatile
code-llama6GB⚡⚡⚡ Fast⭐⭐⭐⭐ ExcellentCode generation specialist
dolphin-mixtral26GB⚡ Slow⭐⭐⭐⭐⭐ ExcellentAdvanced reasoning

Run ollama list to see everything available.

Configuration

Ollama shares the openai config namespace (since it mimics OpenAI's API):

{
  "providerConfig": {
    "openai": {
      "temperature": 0.7,
      "topP": 0.9,
      "contextWindow": 4096
    }
  }
}

Supported fields:

FieldTypeDescription
temperaturenumberRandomness (0=deterministic, 1=creative; default 0.7)
topPnumberNucleus sampling (0-1; default 0.9)
topKnumberTop-K sampling
contextWindownumberMax context tokens (must not exceed model's max)
numPredictnumberMax output tokens (optional)
repeatPenaltynumberPenalize repetition (0-2; default 1.0)

Example: Code Generation Locally

# Pull the code-specific model
ollama pull code-llama

# Create a Rensei profile
rensei profile create \
  --name "local-code-llama" \
  --provider ollama \
  --model-id "code-llama" \
  --auth-mode local \
  --scope project \
  --project-id my-project

# Dispatch code generation
rensei dispatch \
  --project my-project \
  --profile prof_local_code_llama \
  "Write a Python function to compute Fibonacci numbers"

Network Setup

By default, Ollama listens on localhost:11434. To expose it over the network:

On macOS / Linux

Bind to 0.0.0.0:

OLLAMA_HOST=0.0.0.0:11434 ollama serve

Then from another machine:

# Rensei profile endpoint
endpoint: "http://192.168.1.100:11434"

Security Note

Ollama has no built-in authentication. Only expose it on trusted networks. For security:

  • Don't expose on the public internet.
  • Use a private VPN or corporate network.
  • Place behind an authenticating reverse proxy if exposed remotely.

Resource Management

Ollama will consume CPU/GPU when models run. To limit resource usage:

Limit CPU threads:

OLLAMA_NUM_THREAD=8 ollama serve

Limit model memory (GPU):

# Load only 2GB of model weights into VRAM
OLLAMA_NUM_GPU=2 ollama serve

Unload model after timeout:

# Unload after 5 minutes of inactivity
OLLAMA_KEEP_ALIVE=5m ollama serve

Troubleshooting

"Connection refused at http://localhost:11434"

Ollama server is not running. Start it:

ollama serve

"Model not found: mistral"

The model is not downloaded. Pull it first:

ollama pull mistral

"Out of memory"

Your machine doesn't have enough RAM for the model. Either:

  • Switch to a smaller model (orca-mini instead of mistral).
  • Reduce OLLAMA_NUM_THREAD or OLLAMA_NUM_GPU.
  • Add more RAM.

Inference is very slow

Running without GPU acceleration. Check if you have a supported GPU (NVIDIA CUDA, Apple Metal) and ensure Ollama is configured to use it. Models run 5-20× faster with GPU.

"Endpoint validation failed"

Endpoint URL is wrong or Ollama is not responding. Verify:

curl http://localhost:11434/api/tags

Should return a JSON list of models.

Pricing & Cost

Ollama incurs zero platform cost beyond your machine's electricity. Compute is entirely local.

Cost events still emit for audit purposes, but you see $0.00 usage since there's no provider billing.

Further Reading

On this page