fitmyllm CLI

Run the right LLM locally. Detects your GPU, recommends the best model, downloads it, and starts chatting — all from your terminal.

pip install fitmyllmPyPI

Python 3.10+ • Works on Linux, macOS, Windows (WSL) • Free & open source

▸ GET STARTED IN 30 SECONDS
1.
pip install fitmyllm
2.
fitmyllm setuppaste your free API key
3.
fitmyllmopens the interactive TUI
▸ FEATURES
>
Quick Run
Detect GPU, pick best model, download GGUF, start server, chat. Zero config.
?
Find Models
18+ filters: size, family, quant, speed, capabilities, benchmark minimums.
S
Tier List
Models and GPUs ranked S-F with scores and cloud alternatives.
T
Live tok/s
Real-time speed counter during chat. See exactly how fast your model runs.
B
Benchmarks
Run standardized speed tests. Compare your results with the community.
D
GGUF Downloads
Download models from HuggingFace by quant level. No Ollama required.
C
Compare
Side-by-side comparison of up to 4 models with all metrics.
E
Enterprise
10-tab deployment analysis: TCO, scaling, SLA, GPU matrix.
▸ COMMANDS
CommandDescription
fitmyllmInteractive TUI with 9 modes
fitmyllm chat <model>Chat directly with a model
fitmyllm benchmarkRun a speed benchmark
fitmyllm my-benchmarksView your submitted results
fitmyllm telemetry on|offToggle anonymous speed sharing
fitmyllm setupSave your API key
▸ SUPPORTED ENGINES

Auto-detects running backends. Works with any of these:

Ollamallama-servervLLMLM StudioKoboldCppJanDocker Model Runnerllama.cpp
▸ COMMUNITY SPEED TELEMETRY

Opt in to anonymously share tok/s and TTFT while you chat. This data improves speed predictions for your GPU and helps other users with similar hardware. No message content is ever sent — only model name, GPU, and speed metrics.

fitmyllm telemetry on
pip install fitmyllm

Get your free API key to unlock all features.