Models

8 open-weight models — Qwen, Llama, DeepSeek, Mistral, Gemma and more — all served as 2-bit (Q2_K) compressed weights. Prices are per 1M tokens. Call any of them at /v1/chat/completions with your API key.

Quality: independent benchmarks (Unsloth Dynamic 2.0, 5-shot MMLU) show 2-bit Q2_K_XL retains ≈96% of full-precision accuracy (e.g. Gemma-3-27B: 68.7% vs 71.5%), with perplexity within ~2%. A deliberate cost/quality trade-off — great for high-volume and cost-sensitive workloads.

Tiny · ≤2B1

Model	Input	Output	Tier
qwen3-1.7b-q2	$0.20	$0.60	L4 · standard

Small · 3–9B2

Model	Input	Output	Tier
gemma-3-4b-vision	$1.00	$3.00	L4 · standard
qwen3-8b-q2	$0.50	$1.50	L4 · standard

Mid · 10–20B2

Model	Input	Output	Tier
deepseek-r1-distill-qwen-14b-q2	$1.00	$3.00	L4 · standard
qwen3-14b-q2	$1.00	$3.00	L4 · standard

Large · 22–40B3

Model	Input	Output	Tier
gemma-3-27b-q2	$1.60	$4.80	L4 · standard
qwen3-coder-30b-a3b-q2	$1.60	$4.80	L4 · standard
qwen3-32b-q2	$1.60	$4.80	L4 · standard

All models are 2-bit (Q2_K-family) quantized — a deliberate cost/quality trade-off, not the vendors' full-precision endpoints. The live list is always at GET /v1/models. See the docs to start.