fitmyllm CLI

Run the right LLM locally. Detects your GPU, recommends the best model, downloads it, and starts chatting — all from your terminal.

pip install fitmyllmPyPI

Python 3.10+ • Works on Linux, macOS, Windows (WSL) • Free & open source

▸ GET STARTED IN 30 SECONDS

pip install fitmyllm

fitmyllm setuppaste your free API key

fitmyllmopens the interactive TUI

▸ FEATURES

Quick Run

Detect GPU, pick best model, download GGUF, start server, chat. Zero config.

Find Models

18+ filters: size, family, quant, speed, capabilities, benchmark minimums.

Tier List

Models and GPUs ranked S-F with scores and cloud alternatives.

Live tok/s

Real-time speed counter during chat. See exactly how fast your model runs.

Benchmarks

Run standardized speed tests. Compare your results with the community.

GGUF Downloads

Download models from HuggingFace by quant level. No Ollama required.

Compare

Side-by-side comparison of up to 4 models with all metrics.

Enterprise

10-tab deployment analysis: TCO, scaling, SLA, GPU matrix.

▸ COMMANDS

Command	Description
`fitmyllm`	Interactive TUI with 9 modes
`fitmyllm chat <model>`	Chat directly with a model
`fitmyllm benchmark`	Run a speed benchmark
`fitmyllm my-benchmarks`	View your submitted results
`fitmyllm telemetry on\|off`	Toggle anonymous speed sharing
`fitmyllm setup`	Save your API key

▸ SUPPORTED ENGINES

Auto-detects running backends. Works with any of these:

Ollamallama-servervLLMLM StudioKoboldCppJanDocker Model Runnerllama.cpp

▸ COMMUNITY SPEED TELEMETRY

Opt in to anonymously share tok/s and TTFT while you chat. This data improves speed predictions for your GPU and helps other users with similar hardware. No message content is ever sent — only model name, GPU, and speed metrics.

fitmyllm telemetry on

pip install fitmyllm

Get your free API key to unlock all features.