56 open-weight models — Qwen, Llama, DeepSeek, Mistral, Gemma and more — all served as 2-bit (Q2_K) compressed weights. Prices are per 1M tokens. Call any of them at /v1/chat/completions with your API key.
Quality: independent benchmarks (Unsloth Dynamic 2.0, 5-shot MMLU) show 2-bit Q2_K_XL retains ≈96% of full-precision accuracy (e.g. Gemma-3-27B: 68.7% vs 71.5%), with perplexity within ~2%. A deliberate cost/quality trade-off — great for high-volume and cost-sensitive workloads.
Tiny · ≤2B8
Model
Input
Output
Tier
smollm2-360m-q2
$0.20
$0.60
L4 · standard
qwen3-0.6b-q2
$0.20
$0.60
L4 · standard
gemma-3-1b-q2
$0.20
$0.60
L4 · standard
llama-3.2-1b-q2
$0.20
$0.60
L4 · standard
qwen3-1.7b-q2
$0.20
$0.60
L4 · standard
smollm2-1.7b-q2
$0.20
$0.60
L4 · standard
granite-3.2-2b-q2
$0.20
$0.60
L4 · standard
qwen3.5-2b-q2
$0.20
$0.60
L4 · standard
Small · 3–9B23
Model
Input
Output
Tier
llama-3.2-3b-q2
$0.50
$1.50
L4 · standard
phi-4-mini-q2
$0.50
$1.50
L4 · standard
gemma-3-4b-q2
$0.50
$1.50
L4 · standard
minicpm-3-4b-q2
$0.50
$1.50
L4 · standard
qwen3-4b-q2
$0.50
$1.50
L4 · standard
qwen3.5-4b-q2
$0.50
$1.50
L4 · standard
yi-1.5-6b-q2
$0.50
$1.50
L4 · standard
deepseek-r1-distill-qwen-7b-q2
$0.50
$1.50
L4 · standard
internlm-2.5-7b-q2
$0.50
$1.50
L4 · standard
marco-o1-7b-q2
$0.50
$1.50
L4 · standard
mathstral-7b-q2
$0.50
$1.50
L4 · standard
mistral-7b-v0.3-q2
$0.50
$1.50
L4 · standard
olmo-2-7b-q2
$0.50
$1.50
L4 · standard
olmoe-7b-q2
$0.50
$1.50
L4 · standard
granite-3.2-8b-q2
$0.50
$1.50
L4 · standard
llama-3.1-8b-q2
$0.50
$1.50
L4 · standard
ministral-8b-q2
$0.50
$1.50
L4 · standard
qwen3-8b-q2
$0.50
$1.50
L4 · standard
gemma-2-9b-q2
$0.50
$1.50
L4 · standard
glm-4-9b-q2
$0.50
$1.50
L4 · standard
qwen3.5-9b-q2
$0.50
$1.50
L4 · standard
yi-1.5-9b-q2
$0.50
$1.50
L4 · standard
yi-coder-9b-q2
$0.50
$1.50
L4 · standard
Mid · 10–20B8
Model
Input
Output
Tier
gemma-3-12b-q2
$1.00
$3.00
L4 · standard
mistral-nemo-12b-q2
$1.00
$3.00
L4 · standard
olmo-2-13b-q2
$1.00
$3.00
L4 · standard
deepseek-r1-distill-qwen-14b-q2
$1.00
$3.00
L4 · standard
phi-4-14b-q2
$1.00
$3.00
L4 · standard
qwen3-14b-q2
$1.00
$3.00
L4 · standard
deepseek-coder-v2-lite-16b-q2
$1.00
$3.00
L4 · standard
internlm-2.5-20b-q2
$1.00
$3.00
L4 · standard
Large · 22–40B12
Model
Input
Output
Tier
codestral-22b-q2
$1.60
$4.80
L4 · standard
mistral-small-24b-q2
$1.60
$4.80
L4 · standard
gemma-2-27b-q2
$1.60
$4.80
L4 · standard
gemma-3-27b-q2
$1.60
$4.80
L4 · standard
qwen3.6-27b-q2
$1.60
$4.80
L4 · standard
qwen3-coder-30b-a3b-q2
$1.60
$4.80
L4 · standard
deepseek-r1-distill-qwen-32b-q2
$1.60
$4.80
L4 · standard
olmo-2-32b-q2
$1.60
$4.80
L4 · standard
qwen3-32b-q2
$1.60
$4.80
L4 · standard
qwq-32b-q2
$1.60
$4.80
L4 · standard
yi-1.5-34b-q2
$1.60
$4.80
L4 · standard
qwen3.6-35b-a3b-q2
$1.60
$4.80
L4 · standard
XL · 70B+5
Model
Input
Output
Tier
mixtral-8x7b-q2
$3.00
$9.00
L4 · standard
deepseek-r1-distill-llama-70b-q2
$3.00
$9.00
A100 · premium
llama-3.3-70b-q2
$3.00
$9.00
A100 · premium
qwen3.5-122b-a10b-q2
$3.00
$9.00
A100 · premium
mixtral-8x22b-q2
$3.00
$9.00
A100 · premium
All models are 2-bit (Q2_K-family) quantized — a deliberate cost/quality trade-off, not the vendors' full-precision endpoints. The live list is always at GET /v1/models. See the docs to start.