API Reference
Quick start
Our API is fully OpenAI-compatible. Switch your base_url and you're done.
from openai import OpenAI
client = OpenAI(
base_url="https://api.example.com/v1", # your gateway URL
api_key="sk-..." # the key from /dashboard
)
resp = client.chat.completions.create(
model="qwen3.6-27b-q2", # see full list at GET /v1/models
messages=[{"role": "user", "content": "Hello!"}]
)
print(resp.choices[0].message.content)Endpoints
POST /v1/chat/completions— streaming + non-streamingPOST /v1/completionsPOST /v1/embeddingsGET /v1/models
Models & pricing
We offer 56 open-weight models, all served as 2-bit (Q2_K) compressed weights — optimized for low cost and high throughput. Output quality is close to, but slightly below, the full-precision originals; pick the size/price that fits your use case. Billed per token (no rounding). Rates per 1M tokens:
| Model | Size | Input | Output | Quant |
|---|---|---|---|---|
| smollm2-360m-q2 | 0.36B | $0.20 | $0.60 | 2-bit |
| qwen3-0.6b-q2 | 0.6B | $0.20 | $0.60 | 2-bit |
| gemma-3-1b-q2 | 1B | $0.20 | $0.60 | 2-bit |
| llama-3.2-1b-q2 | 1B | $0.20 | $0.60 | 2-bit |
| qwen3-1.7b-q2 | 1.7B | $0.20 | $0.60 | 2-bit |
| smollm2-1.7b-q2 | 1.7B | $0.20 | $0.60 | 2-bit |
| granite-3.2-2b-q2 | 2B | $0.20 | $0.60 | 2-bit |
| qwen3.5-2b-q2 | 2B | $0.20 | $0.60 | 2-bit |
| llama-3.2-3b-q2 | 3B | $0.50 | $1.50 | 2-bit |
| phi-4-mini-q2 | 3.8B | $0.50 | $1.50 | 2-bit |
| gemma-3-4b-q2 | 4B | $0.50 | $1.50 | 2-bit |
| minicpm-3-4b-q2 | 4B | $0.50 | $1.50 | 2-bit |
| qwen3-4b-q2 | 4B | $0.50 | $1.50 | 2-bit |
| qwen3.5-4b-q2 | 4B | $0.50 | $1.50 | 2-bit |
| yi-1.5-6b-q2 | 6B | $0.50 | $1.50 | 2-bit |
| deepseek-r1-distill-qwen-7b-q2 | 7B | $0.50 | $1.50 | 2-bit |
| internlm-2.5-7b-q2 | 7B | $0.50 | $1.50 | 2-bit |
| marco-o1-7b-q2 | 7B | $0.50 | $1.50 | 2-bit |
| mathstral-7b-q2 | 7B | $0.50 | $1.50 | 2-bit |
| mistral-7b-v0.3-q2 | 7B | $0.50 | $1.50 | 2-bit |
| olmo-2-7b-q2 | 7B | $0.50 | $1.50 | 2-bit |
| olmoe-7b-q2 | 7B | $0.50 | $1.50 | 2-bit |
| granite-3.2-8b-q2 | 8B | $0.50 | $1.50 | 2-bit |
| llama-3.1-8b-q2 | 8B | $0.50 | $1.50 | 2-bit |
| ministral-8b-q2 | 8B | $0.50 | $1.50 | 2-bit |
| qwen3-8b-q2 | 8B | $0.50 | $1.50 | 2-bit |
| gemma-2-9b-q2 | 9B | $0.50 | $1.50 | 2-bit |
| glm-4-9b-q2 | 9B | $0.50 | $1.50 | 2-bit |
| qwen3.5-9b-q2 | 9B | $0.50 | $1.50 | 2-bit |
| yi-1.5-9b-q2 | 9B | $0.50 | $1.50 | 2-bit |
| yi-coder-9b-q2 | 9B | $0.50 | $1.50 | 2-bit |
| gemma-3-12b-q2 | 12B | $1.00 | $3.00 | 2-bit |
| mistral-nemo-12b-q2 | 12B | $1.00 | $3.00 | 2-bit |
| olmo-2-13b-q2 | 13B | $1.00 | $3.00 | 2-bit |
| deepseek-r1-distill-qwen-14b-q2 | 14B | $1.00 | $3.00 | 2-bit |
| phi-4-14b-q2 | 14B | $1.00 | $3.00 | 2-bit |
| qwen3-14b-q2 | 14B | $1.00 | $3.00 | 2-bit |
| deepseek-coder-v2-lite-16b-q2 | 16B | $1.00 | $3.00 | 2-bit |
| internlm-2.5-20b-q2 | 20B | $1.00 | $3.00 | 2-bit |
| codestral-22b-q2 | 22B | $1.60 | $4.80 | 2-bit |
| mistral-small-24b-q2 | 24B | $1.60 | $4.80 | 2-bit |
| gemma-2-27b-q2 | 27B | $1.60 | $4.80 | 2-bit |
| gemma-3-27b-q2 | 27B | $1.60 | $4.80 | 2-bit |
| qwen3.6-27b-q2 | 27B | $1.60 | $4.80 | 2-bit |
| qwen3-coder-30b-a3b-q2 | 30B | $1.60 | $4.80 | 2-bit |
| deepseek-r1-distill-qwen-32b-q2 | 32B | $1.60 | $4.80 | 2-bit |
| olmo-2-32b-q2 | 32B | $1.60 | $4.80 | 2-bit |
| qwen3-32b-q2 | 32B | $1.60 | $4.80 | 2-bit |
| qwq-32b-q2 | 32B | $1.60 | $4.80 | 2-bit |
| yi-1.5-34b-q2 | 34B | $1.60 | $4.80 | 2-bit |
| qwen3.6-35b-a3b-q2 | 35B | $1.60 | $4.80 | 2-bit |
| mixtral-8x7b-q2 | 47B | $3.00 | $9.00 | 2-bit |
| deepseek-r1-distill-llama-70b-q2 | 70B | $3.00 | $9.00 | 2-bit |
| llama-3.3-70b-q2 | 70B | $3.00 | $9.00 | 2-bit |
| qwen3.5-122b-a10b-q2 | 122B | $3.00 | $9.00 | 2-bit |
| mixtral-8x22b-q2 | 141B | $3.00 | $9.00 | 2-bit |
Quantization note: these are 2-bit (Q2_K-family) versions of the named open models — a deliberate cost/quality trade-off, not the vendors' full-precision endpoints. The full live list is always at GET /v1/models.
Top up via the dashboard. Larger top-up tiers include bonus credit (+20% on $5, +25% on $20, +30% on $100). See our Refund Policy for credit refund terms.