← Home

API Reference

Quick start

Our API is fully OpenAI-compatible. Switch your base_url and you're done.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.example.com/v1",  # your gateway URL
    api_key="sk-..."                          # the key from /dashboard
)

resp = client.chat.completions.create(
    model="qwen3.6-27b-q2",                   # see full list at GET /v1/models
    messages=[{"role": "user", "content": "Hello!"}]
)

print(resp.choices[0].message.content)

Endpoints

Models & pricing

We offer 56 open-weight models, all served as 2-bit (Q2_K) compressed weights — optimized for low cost and high throughput. Output quality is close to, but slightly below, the full-precision originals; pick the size/price that fits your use case. Billed per token (no rounding). Rates per 1M tokens:

ModelSizeInputOutputQuant
smollm2-360m-q20.36B$0.20$0.602-bit
qwen3-0.6b-q20.6B$0.20$0.602-bit
gemma-3-1b-q21B$0.20$0.602-bit
llama-3.2-1b-q21B$0.20$0.602-bit
qwen3-1.7b-q21.7B$0.20$0.602-bit
smollm2-1.7b-q21.7B$0.20$0.602-bit
granite-3.2-2b-q22B$0.20$0.602-bit
qwen3.5-2b-q22B$0.20$0.602-bit
llama-3.2-3b-q23B$0.50$1.502-bit
phi-4-mini-q23.8B$0.50$1.502-bit
gemma-3-4b-q24B$0.50$1.502-bit
minicpm-3-4b-q24B$0.50$1.502-bit
qwen3-4b-q24B$0.50$1.502-bit
qwen3.5-4b-q24B$0.50$1.502-bit
yi-1.5-6b-q26B$0.50$1.502-bit
deepseek-r1-distill-qwen-7b-q27B$0.50$1.502-bit
internlm-2.5-7b-q27B$0.50$1.502-bit
marco-o1-7b-q27B$0.50$1.502-bit
mathstral-7b-q27B$0.50$1.502-bit
mistral-7b-v0.3-q27B$0.50$1.502-bit
olmo-2-7b-q27B$0.50$1.502-bit
olmoe-7b-q27B$0.50$1.502-bit
granite-3.2-8b-q28B$0.50$1.502-bit
llama-3.1-8b-q28B$0.50$1.502-bit
ministral-8b-q28B$0.50$1.502-bit
qwen3-8b-q28B$0.50$1.502-bit
gemma-2-9b-q29B$0.50$1.502-bit
glm-4-9b-q29B$0.50$1.502-bit
qwen3.5-9b-q29B$0.50$1.502-bit
yi-1.5-9b-q29B$0.50$1.502-bit
yi-coder-9b-q29B$0.50$1.502-bit
gemma-3-12b-q212B$1.00$3.002-bit
mistral-nemo-12b-q212B$1.00$3.002-bit
olmo-2-13b-q213B$1.00$3.002-bit
deepseek-r1-distill-qwen-14b-q214B$1.00$3.002-bit
phi-4-14b-q214B$1.00$3.002-bit
qwen3-14b-q214B$1.00$3.002-bit
deepseek-coder-v2-lite-16b-q216B$1.00$3.002-bit
internlm-2.5-20b-q220B$1.00$3.002-bit
codestral-22b-q222B$1.60$4.802-bit
mistral-small-24b-q224B$1.60$4.802-bit
gemma-2-27b-q227B$1.60$4.802-bit
gemma-3-27b-q227B$1.60$4.802-bit
qwen3.6-27b-q227B$1.60$4.802-bit
qwen3-coder-30b-a3b-q230B$1.60$4.802-bit
deepseek-r1-distill-qwen-32b-q232B$1.60$4.802-bit
olmo-2-32b-q232B$1.60$4.802-bit
qwen3-32b-q232B$1.60$4.802-bit
qwq-32b-q232B$1.60$4.802-bit
yi-1.5-34b-q234B$1.60$4.802-bit
qwen3.6-35b-a3b-q235B$1.60$4.802-bit
mixtral-8x7b-q247B$3.00$9.002-bit
deepseek-r1-distill-llama-70b-q270B$3.00$9.002-bit
llama-3.3-70b-q270B$3.00$9.002-bit
qwen3.5-122b-a10b-q2122B$3.00$9.002-bit
mixtral-8x22b-q2141B$3.00$9.002-bit

Quantization note: these are 2-bit (Q2_K-family) versions of the named open models — a deliberate cost/quality trade-off, not the vendors' full-precision endpoints. The full live list is always at GET /v1/models.

Top up via the dashboard. Larger top-up tiers include bonus credit (+20% on $5, +25% on $20, +30% on $100). See our Refund Policy for credit refund terms.