API Reference

Quick start

Our API is fully OpenAI-compatible. Switch your base_url and you're done.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.example.com/v1",  # your gateway URL
    api_key="sk-..."                          # the key from /dashboard
)

resp = client.chat.completions.create(
    model="qwen3.6-27b-q2",                   # see full list at GET /v1/models
    messages=[{"role": "user", "content": "Hello!"}]
)

print(resp.choices[0].message.content)

Endpoints

POST /v1/chat/completions — streaming + non-streaming
POST /v1/completions
POST /v1/embeddings
GET /v1/models

Models & pricing

We offer 8 open-weight models, all served as 2-bit (Q2_K) compressed weights — optimized for low cost and high throughput. Output quality is close to, but slightly below, the full-precision originals; pick the size/price that fits your use case. Billed per token (no rounding). Rates per 1M tokens:

Model	Size	Input	Output	Quant
qwen3-1.7b-q2	1.7B	$0.20	$0.60	2-bit
gemma-3-4b-vision	4B	$1.00	$3.00	2-bit
qwen3-8b-q2	8B	$0.50	$1.50	2-bit
deepseek-r1-distill-qwen-14b-q2	14B	$1.00	$3.00	2-bit
qwen3-14b-q2	14B	$1.00	$3.00	2-bit
gemma-3-27b-q2	27B	$1.60	$4.80	2-bit
qwen3-coder-30b-a3b-q2	30B	$1.60	$4.80	2-bit
qwen3-32b-q2	32B	$1.60	$4.80	2-bit

Quantization note: these are 2-bit (Q2_K-family) versions of the named open models — a deliberate cost/quality trade-off, not the vendors' full-precision endpoints. The full live list is always at GET /v1/models.

Top up via the dashboard. Larger top-up tiers include bonus credit (+20% on $5, +25% on $20, +30% on $100). See our Refund Policy for credit refund terms.