Local LLM Setup Guide
Run AI models locally for privacy and cost savings using Ollama.
Why Use Local LLMs?
Advantages
- Privacy: Code never leaves your machine
- Cost: Free after initial download
- No API Keys: No account setup needed
- Offline: Works without internet
- Unlimited: No rate limits
Trade-offs
- Performance: Slower than cloud APIs
- Quality: May not match GPT-4/Claude level
- Hardware: Requires sufficient RAM
Prerequisites
Hardware Requirements
| Model Size | RAM Required | Recommended |
|---|---|---|
| 3B params | 6 GB | Older MacBooks, 8GB RAM systems |
| 7B params | 10 GB | Modern laptops, 16GB RAM recommended |
| 14B+ params | 20+ GB | Desktop computers, M1/M2/M3 Macs |
Supported Operating Systems
- macOS: 11+ (Big Sur or later)
- Linux: All modern distributions
- Windows: WSL2 or native (experimental)
Installing Ollama
macOS
bash
# Download and install
curl -fsSL https://ollama.ai/install.sh | shOr download from ollama.ai
Linux
bash
curl -fsSL https://ollama.ai/install.sh | shWindows
- Download Ollama for Windows from ollama.ai
- Run the installer
- Start Ollama from Start Menu
Recommended Models
Qwen2.5-Coder (Recommended)
Best for: Coding, development, technical tasks
bash
# Download 7B version (10GB RAM)
ollama pull qwen2.5-coder:7b
# Download 3B version (6GB RAM)
ollama pull qwen2.5-coder:3bLlama 3.2
Best for: General purpose, chat
bash
# 3B version
ollama pull llama3.2:3b
# Latest Llama
ollama pull llama3.2DeepSeek-Coder
Best for: Code generation, debugging
bash
ollama pull deepseek-coder:6.7bConfiguring IfAI
Step 1: Start Ollama
Make sure Ollama is running:
bash
# Check Ollama status
ollama list
# Should show available modelsStep 2: Configure in IfAI
- Open IfAI Settings (
Cmd+,) - Go to AI Provider > Ollama
- Select model from dropdown:
qwen2.5-coder:7bqwen2.5-coder:3bllama3.2- Or any model you've downloaded
Step 3: Test Connection
- Open AI Chat (
Cmd+K) - Send a message:
Hello! Can you hear me? - Should receive a response
Performance Optimization
Model Selection
For coding tasks:
qwen2.5-coder:7b > deepseek-coder:6.7b > llama3.2For general chat:
llama3.2 > qwen2.5-coder:7bFor low-memory systems:
qwen2.5-coder:3b > phi3Reducing Memory Usage
- Use smaller models: 3B instead of 7B
- Reduce context: Limit to 2048 tokens
- Adjust num_ctx: In settings, reduce context window
- Quantization: Use Q4 quantized models
Improving Speed
- Use GPU acceleration: Ollama auto-detects GPU
- Reduce context size: Fewer tokens = faster
- Close other apps: Free up RAM
- Use smaller model: 3B vs 7B
Advanced Configuration
Custom Model Parameters
In IfAI Settings > Ollama > Advanced:
json
{
"num_ctx": 2048,
"temperature": 0.7,
"top_p": 0.9,
"num_predict": 512
}Multiple Models
Switch between models in Settings:
- Use qwen2.5-coder for coding
- Use llama3.2 for explanation
- Use deepseek-coder for debugging
Troubleshooting
"Ollama not detected"
Solutions:
- Verify Ollama is running:
ollama list - Restart IfAI
- Check Ollama is in PATH
- Manual Ollama URL in settings:
http://localhost:11434
"Out of memory"
Solutions:
- Use smaller model (3B instead of 7B)
- Close other applications
- Reduce context window
- Restart Ollama
Very slow responses
Solutions:
- Check if GPU is being used
- Use smaller model
- Reduce context size
- Try different quantization
Model download fails
Solutions:
- Check internet connection
- Try again later (server may be busy)
- Download manually:bash
ollama pull qwen2.5-coder:7b --verbose
Hybrid Mode
Configure IfAI to use both local and cloud:
Smart routing:
- Local: Simple tasks, quick questions
- Cloud: Complex tasks, large codebases
Setup:
- Settings > AI Provider > Hybrid
- Set local as primary
- Configure cloud as fallback
Next Steps
- AI Chat Guide - Using AI with local models
- Settings Reference - Advanced configuration
- Basic Usage - Core features