Live in production

Smart routing for every major LLM.
One API. Best model per prompt.

The router classifies every prompt by task and dispatches to the best-fit model — so you stop paying flagship rates for chat. Every major provider, one bill. 0% token markup. $3 free to start.

Get started free See how it works
0% Token markup
5% Top-up fee
2% Per BYOK request
$3 Free credit
quickstart.py
# pip install openai

client = OpenAI(
  base_url="https://api.gammainfra.com/v1",
  api_key="sk-gammainfra-...",
)

response = client.chat.completions.create(
  model="gammainfra/auto",
  messages=[{
    "role": "user",
    "content": "Write a binary search in Python"
  }]
)

# Response headers tell you exactly what served you:
#   X-GammaInfra-Endpoint:       anthropic/claude-sonnet-4-6
#   X-GammaInfra-Cost-USD:       0.000087
#   X-GammaInfra-Router-Version: v2

We charge on top-ups, not tokens. See pricing →

How the router decides.

1

Classify

Every prompt is classified by task type: code generation, reasoning, summarisation, multimodal, tool use, or chat.

2

Score

Providers are scored on live p50 latency (30-second refresh, 5-min window), token cost, and per-task quality.

3

Dispatch

The request routes to the best-fit model. Every decision is logged and transparent.

4

Track

Latency stats refresh every 30 seconds. Provider health propagates to every worker. The router's picture of the upstream stays current — without touching your prompt content.

Right model for the job.

Select a task type to see how GammaInfra routes. Chains are prioritised by quality with automatic fallback.

Write a binary search function code_gen
Explain why recursion can overflow reasoning
TL;DR this article for me summarisation
Hey, what can you help me with? chat
Search the web and summarise results tool_use
Routing decision — code_gen

Every major LLM. One key.

Access every major LLM through a single API. Direct-route to any provider, or let the router decide.

OpenAI
gpt-5.5, gpt-5.4
gpt-5.4-mini, gpt-5-mini
Anthropic
claude-opus-4-7/4-6
claude-sonnet-4-6
claude-haiku-4-5
Google
gemini-3.1-pro
gemini-3-flash
gemini-2.5-pro/flash
xAI
grok-4-1-fast
grok-code-fast
Mistral
mistral-large, devstral
codestral
DeepSeek
deepseek-v4-pro
deepseek-v4-flash
Groq
llama-3.3-70b
llama-3.1-8b, qwen3
Amazon Bedrock
Claude opus/sonnet/haiku
Nova pro/lite

Three numbers. Zero subscriptions.

Pay only for what you use. Credits never expire.

Launch special: 3% top-up fee · 1% on BYOK during launch window.
Free trial
$0
$3 welcome credit — no card required
  • Smart routing across every major LLM
  • 50 requests / day
  • Automatic fallback on provider failure
Get started
BYOK
2% per request
Prepaid balance · $5 min top-up · no top-up fee
  • Bring your own provider keys
  • Deducted from prepaid balance
  • Smart routing, unlimited
  • Providers bill you direct
  • Best for higher-volume usage
  • Unlimited routing, no caps
Get started

Everything in one API. No add-ons, no premium tiers.

Smart routing
Classifies every prompt by task and picks the best model automatically. Saves ~20% on typical workloads vs always defaulting to the flagship.
Every major LLM
OpenAI, Anthropic, Google, Mistral, Groq, DeepSeek, xAI, Amazon Bedrock — all behind one key. Add a provider, no new integration.
Transparent decisions
Every routing choice exposed in response headers: which model served you, cost in USD, fallback chain if a provider failed. No black boxes.
0% token markup
Pass-through provider pricing. You pay exactly what the upstream charges. Our fee is on the top-up, visible upfront.
Automatic fallback
Provider down? Rate limited? The router hops to the next best model in real time — your call still succeeds.
Bring Your Own Keys
Use your own provider accounts — GammaInfra still routes, you bill upstream direct. 2% routing fee, separate prepaid balance.
Cost + usage dashboard
Live breakdown by model, task type, and provider. See which routes save you money — and which don't.
No subscriptions
Pay only for what you use. Credits never expire. Cancel by not topping up — there's nothing to unsubscribe from.

Common questions about smart LLM routing.

What is GammaInfra?
GammaInfra is a managed LLM routing service. Send any OpenAI-format chat completion request and the router picks the best-fit model per prompt across every major LLM provider (OpenAI, Anthropic, Google, Mistral, Groq, DeepSeek, xAI, Amazon Bedrock), falls back across providers on failure, and reports per-request cost in response headers.
How does GammaInfra route between LLM providers?
Every request is classified by task type — one of 8 labels (reasoning, code, creative, rewrite, chat, extraction, summarize, translation) — and dispatched to the best-fit model for that task. The router uses live p50 latency (refreshed every 30 seconds, 5-minute window) and configured cost or quality preferences to pick the endpoint within the task's fallback chain. Use gammainfra/auto for smart routing, gammainfra/fast to optimize for latency, or gammainfra/cheap to optimize for cost.
Is GammaInfra OpenAI-compatible?
Yes. The API endpoint is wire-format compatible with OpenAI's /v1/chat/completions. Replace https://api.openai.com/v1 with https://api.gammainfra.com/v1 in your existing OpenAI SDK code — every other field (messages, temperature, stream, tools, tool_choice, response_format) works as-is.
How much does GammaInfra cost?
Token pricing is pass-through from each provider with 0% markup. The fee is on top-up, not per-token: 3% during the launch window (through 2026-06-23) and 5% afterward, minimum $10 top-up. There is also a BYOK option that uses your own provider API keys at a 1% per-request fee during launch (2% standard), with a separate prepaid balance and $5 minimum top-up.
Can I bring my own provider API keys (BYOK)?
Yes. Configure your OpenAI, Anthropic, Google, or other provider keys in the dashboard and GammaInfra will use them for the matching provider in any fallback chain. BYOK has a separate prepaid balance and a per-request fee of 1% during launch (2% standard). When the BYOK balance hits zero, requests return 402 byok_balance_empty — never a silent fallback to managed credits.
How do I see the cost of an individual API call?
Every successful response carries X-GammaInfra-Cost-USD (total) plus X-GammaInfra-Input-Cost-USD and X-GammaInfra-Output-Cost-USD for the per-direction split. The X-GammaInfra-Endpoint header tells you which provider/model actually served the request. Sum these across a session to know exactly what your workload cost.
Which LLM providers does GammaInfra support?
Every major LLM provider: OpenAI (gpt-5.5 family, gpt-5.4 family, gpt-5 family, gpt-oss), Anthropic (Claude Opus 4.7 and 4.6, Claude Sonnet 4.6, Claude Haiku 4.5), Google (Gemini 3.1, 3, and 2.5 family), Mistral (Large, Small, Devstral), Groq (Llama 3.x), DeepSeek (V4 Pro and Flash, V3 legacy), xAI (Grok 4 family, Grok 3, Grok Code), and Amazon Bedrock (Claude, Llama, Mistral, Amazon Nova).
What happens if a provider goes down or rate-limits a request?
Every task type has a 3-to-4 endpoint fallback chain across distinct providers. If the primary fails (timeout, rate limit, 5xx error), the router cascades to the next endpoint and the actual cascade is reported in the X-GammaInfra-Fallback-Chain response header. The customer never sees the failure unless every chain entry fails.

Your API key in seconds.

No credit card. $3 free credit. Verify your email, get your key, start in 60 seconds.

Smart routing built-in · Prompts never logged · Credits never expire · No subscriptions