Deep Dive7 min read2026-03-05

Ollama Setup Guide: From Zero to Running AI in 10 Minutes

The no-bullshit guide to running your first local LLM. Install Ollama, pick a model, and start chatting — with zero cloud dependencies.

What is Ollama?

Ollama is the easiest way to run AI models on your own computer. One command to install, one command to run any model. No Python environment, no Docker, no config files.

It wraps llama.cpp (the fastest open-source inference engine) with a simple CLI and an OpenAI-compatible API. Most people on r/LocalLLaMA use Ollama as their daily driver.

Install (30 seconds)

macOS/Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows: Download from ollama.com/download

That's it. Ollama auto-detects your GPU (NVIDIA, AMD, Apple Silicon) and configures everything.

Pick Your First Model

Based on your VRAM:

VRAMBest starter modelCommand
4-6 GBPhi-4 Mini 3.8Bollama run phi4-mini
8 GBQwen3 8Bollama run qwen3:8b
12 GBQwen3 14Bollama run qwen3:14b
16 GBMistral Small 24B (Q4)ollama run mistral-small
24 GBQwen 2.5 Coder 32Bollama run qwen2.5-coder:32b

Don't know your VRAM? Use FitMyLLM — it auto-detects your GPU and recommends the best model.

Add a Web UI (5 minutes)

Ollama runs in the terminal by default. For a ChatGPT-like interface:

Open WebUI (the most popular option):

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main

Then open http://localhost:3000. You now have a fully private ChatGPT running on your machine.

No Docker? Try LM Studio — a desktop app with built-in GUI. Or Jan — another great GUI option.

Essential Commands

  • ollama list — see downloaded models
  • ollama run qwen3:8b — chat with a model
  • ollama pull llama3.3:70b — download without running
  • ollama rm model-name — delete a model
  • ollama ps — see what's running (and if it's using GPU)
  • ollama run model --verbose — show speed stats

API access: Ollama runs an OpenAI-compatible server on localhost:11434. Any tool that works with OpenAI API works with Ollama — just change the base URL.

References & Further Reading

  1. [1]Ollama (2026). Ollama Official Documentation
  2. [2]SitePoint (2026). Ollama Setup Guide 2026
  3. [3]Open WebUI (2026). Open WebUI

Find the best model for your hardware

Use FitMyLLM to get personalized recommendations based on your GPU, use case, and speed requirements.

Try FitMyLLM

Get weekly updates on new models, GPU deals, and benchmark results.

FitMyLLM — Find the best local AI model for your computer.