8 curated models · 2-bit + vision · live now

Frontier open models.
Compressed to pennies.

Pay-as-you-go API access to a curated set of open-weight LLMs — Qwen, DeepSeek, Gemma, plus a vision model — served as 2-bit compressed weights on on-demand GPUs. Same OpenAI SDK, a fraction of the bill.

Get an API key Read docs

Models are 2-bit (Q2_K) quantized — optimized for cost, with quality slightly below the full-precision originals.

Up to 90% cheaper

A curated set of open models — including a vision model — served as 2-bit compressed weights. The lowest-cost way to run them; trade a little quality for a big cost cut.

OpenAI-compatible

Drop-in /v1/chat/completions — switch your base_url and keep your existing SDK calls.

Scale-to-zero GPUs

Models spin up on demand and idle back to zero — you pay only for the tokens you actually generate.

Pay-as-you-go

Top up by USD, EUR, or crypto. No subscription. No commitment. Refundable balance.

Pricing that scales with you

Top up any amount. Unused balance carries over forever. No subscriptions.

Starter

$5+20% free

~50M tokens included
All 8 models (incl. vision)
Email support

Start with Starter

Pro

$20+25% free

~250M tokens included
Priority routing
Usage exports

Start with Pro

Scale

$100+30% free

~1.5B tokens included
Dedicated channel
Direct Slack channel

Start with Scale

Per-1M-token rates: from $0.20 (input) / $0.60 (output) — up to 90% cheaper than frontier APIs. All models are 2-bit (Q2_K) compressed. See docs for the full price table.

Drop-in in 30 seconds

Switch base_url, keep your code.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aiqa-inference.example/v1",
    api_key="sk-..."
)

print(client.chat.completions.create(
    model="qwen3.6-27b-q2",
    messages=[{"role": "user", "content": "Hello!"}]
).choices[0].message.content)

Frontier open models.Compressed to pennies.