Deep Dive7 min read2026-03-05

Ollama Setup Guide: From Zero to Running AI in 10 Minutes

The no-bullshit guide to running your first local LLM. Install Ollama, pick a model, and start chatting — with zero cloud dependencies.

Contents

1. What is Ollama?
2. Install (30 seconds)
3. Pick Your First Model
4. Add a Web UI (5 minutes)
5. Essential Commands

What is Ollama?

Ollama is the easiest way to run AI models on your own computer. One command to install, one command to run any model. No Python environment, no Docker, no config files.

It wraps llama.cpp (the fastest open-source inference engine) with a simple CLI and an OpenAI-compatible API. Most people on r/LocalLLaMA use Ollama as their daily driver.

Install (30 seconds)

macOS/Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows: Download from ollama.com/download

That's it. Ollama auto-detects your GPU (NVIDIA, AMD, Apple Silicon) and configures everything.

Pick Your First Model

Based on your VRAM:

VRAM	Best starter model	Command
4-6 GB	Phi-4 Mini 3.8B	`ollama run phi4-mini`
8 GB	Qwen3 8B	`ollama run qwen3:8b`
12 GB	Qwen3 14B	`ollama run qwen3:14b`
16 GB	Mistral Small 24B (Q4)	`ollama run mistral-small`
24 GB	Qwen 2.5 Coder 32B	`ollama run qwen2.5-coder:32b`

Don't know your VRAM? Use FitMyLLM — it auto-detects your GPU and recommends the best model.

Add a Web UI (5 minutes)

Ollama runs in the terminal by default. For a ChatGPT-like interface:

Open WebUI (the most popular option):

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main

Then open http://localhost:3000. You now have a fully private ChatGPT running on your machine.

No Docker? Try LM Studio — a desktop app with built-in GUI. Or Jan — another great GUI option.

Essential Commands

ollama list — see downloaded models
ollama run qwen3:8b — chat with a model
ollama pull llama3.3:70b — download without running
ollama rm model-name — delete a model
ollama ps — see what's running (and if it's using GPU)
ollama run model --verbose — show speed stats

API access: Ollama runs an OpenAI-compatible server on localhost:11434. Any tool that works with OpenAI API works with Ollama — just change the base URL.

References & Further Reading

[1]Ollama (2026). Ollama Official Documentation
[2]SitePoint (2026). Ollama Setup Guide 2026
[3]Open WebUI (2026). Open WebUI

Find the best model for your hardware

Use FitMyLLM to get personalized recommendations based on your GPU, use case, and speed requirements.

Try FitMyLLM

▸ DISPATCH

The weekly briefing.

New models · GPU deals · benchmark updates. Once a week. Unsubscribe with one click.