fitmyllm CLI
Run the right LLM locally. Detects your GPU, recommends the best model, downloads it, and starts chatting — all from your terminal.
pip install fitmyllmPyPIPython 3.10+ • Works on Linux, macOS, Windows (WSL) • Free & open source
▸ GET STARTED IN 30 SECONDS
▸ FEATURES
>
Quick Run
Detect GPU, pick best model, download GGUF, start server, chat. Zero config.
?
Find Models
18+ filters: size, family, quant, speed, capabilities, benchmark minimums.
S
Tier List
Models and GPUs ranked S-F with scores and cloud alternatives.
T
Live tok/s
Real-time speed counter during chat. See exactly how fast your model runs.
B
Benchmarks
Run standardized speed tests. Compare your results with the community.
D
GGUF Downloads
Download models from HuggingFace by quant level. No Ollama required.
C
Compare
Side-by-side comparison of up to 4 models with all metrics.
E
Enterprise
10-tab deployment analysis: TCO, scaling, SLA, GPU matrix.
▸ COMMANDS
| Command | Description |
|---|---|
fitmyllm | Interactive TUI with 9 modes |
fitmyllm chat <model> | Chat directly with a model |
fitmyllm benchmark | Run a speed benchmark |
fitmyllm my-benchmarks | View your submitted results |
fitmyllm telemetry on|off | Toggle anonymous speed sharing |
fitmyllm setup | Save your API key |
▸ SUPPORTED ENGINES
Auto-detects running backends. Works with any of these:
Ollamallama-servervLLMLM StudioKoboldCppJanDocker Model Runnerllama.cpp
▸ COMMUNITY SPEED TELEMETRY
Opt in to anonymously share tok/s and TTFT while you chat. This data improves speed predictions for your GPU and helps other users with similar hardware. No message content is ever sent — only model name, GPU, and speed metrics.
fitmyllm telemetry onpip install fitmyllmGet your free API key to unlock all features.