Local LLM Setup Guide

Run AI models locally for privacy and cost savings using Ollama.

Why Use Local LLMs?

Advantages

Privacy: Code never leaves your machine
Cost: Free after initial download
No API Keys: No account setup needed
Offline: Works without internet
Unlimited: No rate limits

Trade-offs

Performance: Slower than cloud APIs
Quality: May not match GPT-4/Claude level
Hardware: Requires sufficient RAM

Prerequisites

Hardware Requirements

Model Size	RAM Required	Recommended
3B params	6 GB	Older MacBooks, 8GB RAM systems
7B params	10 GB	Modern laptops, 16GB RAM recommended
14B+ params	20+ GB	Desktop computers, M1/M2/M3 Macs

Supported Operating Systems

macOS: 11+ (Big Sur or later)
Linux: All modern distributions
Windows: WSL2 or native (experimental)

Installing Ollama

macOS

bash

# Download and install
curl -fsSL https://ollama.ai/install.sh | sh

Or download from ollama.ai

Linux

bash

curl -fsSL https://ollama.ai/install.sh | sh

Windows

Download Ollama for Windows from ollama.ai
Run the installer
Start Ollama from Start Menu

Recommended Models

Qwen2.5-Coder (Recommended)

Best for: Coding, development, technical tasks

bash

# Download 7B version (10GB RAM)
ollama pull qwen2.5-coder:7b

# Download 3B version (6GB RAM)
ollama pull qwen2.5-coder:3b

Llama 3.2

Best for: General purpose, chat

bash

# 3B version
ollama pull llama3.2:3b

# Latest Llama
ollama pull llama3.2

DeepSeek-Coder

Best for: Code generation, debugging

bash

ollama pull deepseek-coder:6.7b

Configuring IfAI

Step 1: Start Ollama

Make sure Ollama is running:

bash

# Check Ollama status
ollama list

# Should show available models

Step 2: Configure in IfAI

Open IfAI Settings (Cmd+,)
Go to AI Provider > Ollama
Select model from dropdown:
- qwen2.5-coder:7b
- qwen2.5-coder:3b
- llama3.2
- Or any model you've downloaded

Step 3: Test Connection

Open AI Chat (Cmd+K)
Send a message:
```
Hello! Can you hear me?
```
1
Should receive a response

Performance Optimization

Model Selection

For coding tasks:

qwen2.5-coder:7b > deepseek-coder:6.7b > llama3.2

For general chat:

llama3.2 > qwen2.5-coder:7b

For low-memory systems:

qwen2.5-coder:3b > phi3

Reducing Memory Usage

Use smaller models: 3B instead of 7B
Reduce context: Limit to 2048 tokens
Adjust num_ctx: In settings, reduce context window
Quantization: Use Q4 quantized models

Improving Speed

Use GPU acceleration: Ollama auto-detects GPU
Reduce context size: Fewer tokens = faster
Close other apps: Free up RAM
Use smaller model: 3B vs 7B

Advanced Configuration

Custom Model Parameters

In IfAI Settings > Ollama > Advanced:

json

{
  "num_ctx": 2048,
  "temperature": 0.7,
  "top_p": 0.9,
  "num_predict": 512
}

Multiple Models

Switch between models in Settings:

Use qwen2.5-coder for coding
Use llama3.2 for explanation
Use deepseek-coder for debugging

Troubleshooting

"Ollama not detected"

Solutions:

Verify Ollama is running: ollama list
Restart IfAI
Check Ollama is in PATH
Manual Ollama URL in settings: http://localhost:11434

"Out of memory"

Solutions:

Use smaller model (3B instead of 7B)
Close other applications
Reduce context window
Restart Ollama

Very slow responses

Solutions:

Check if GPU is being used
Use smaller model
Reduce context size
Try different quantization

Model download fails

Solutions:

Check internet connection
Try again later (server may be busy)
Download manually:
bash
```
ollama pull qwen2.5-coder:7b --verbose
```
1

Hybrid Mode

Configure IfAI to use both local and cloud:

Smart routing:

Local: Simple tasks, quick questions
Cloud: Complex tasks, large codebases

Setup:

Settings > AI Provider > Hybrid
Set local as primary
Configure cloud as fallback

Next Steps

AI Chat Guide - Using AI with local models
Settings Reference - Advanced configuration
Basic Usage - Core features

Local LLM Setup Guide ​

Why Use Local LLMs? ​

Advantages ​

Trade-offs ​

Prerequisites ​

Hardware Requirements ​

Supported Operating Systems ​

Installing Ollama ​

macOS ​

Linux ​

Windows ​

Recommended Models ​

Qwen2.5-Coder (Recommended) ​

Llama 3.2 ​

DeepSeek-Coder ​

Configuring IfAI ​

Step 1: Start Ollama ​

Step 2: Configure in IfAI ​

Step 3: Test Connection ​

Performance Optimization ​

Model Selection ​

Reducing Memory Usage ​

Improving Speed ​

Advanced Configuration ​

Custom Model Parameters ​

Multiple Models ​

Troubleshooting ​

"Ollama not detected" ​

"Out of memory" ​

Very slow responses ​

Model download fails ​

Hybrid Mode ​

Next Steps ​

Local LLM Setup Guide

Why Use Local LLMs?

Advantages

Trade-offs

Prerequisites

Hardware Requirements

Supported Operating Systems

Installing Ollama

macOS

Linux

Windows

Recommended Models

Qwen2.5-Coder (Recommended)

Llama 3.2

DeepSeek-Coder

Configuring IfAI

Step 1: Start Ollama

Step 2: Configure in IfAI

Step 3: Test Connection

Performance Optimization

Model Selection

Reducing Memory Usage

Improving Speed

Advanced Configuration

Custom Model Parameters

Multiple Models

Troubleshooting

"Ollama not detected"

"Out of memory"

Very slow responses

Model download fails

Hybrid Mode

Next Steps