Question 1

What is GammaInfra?

Accepted Answer

GammaInfra is a managed LLM routing service. Send any OpenAI-format chat completion request and the router picks the best-fit model per prompt across every major LLM provider (OpenAI, Anthropic, Google, Mistral, Groq, DeepSeek, xAI, Amazon Bedrock), falls back across providers on failure, and reports per-request cost in response headers.

Question 2

How does GammaInfra route between LLM providers?

Accepted Answer

Every request is classified by task type — one of 8 labels (reasoning, code, creative, rewrite, chat, extraction, summarize, translation) — and dispatched to the best-fit model for that task. The router uses live p50 latency (refreshed every 30 seconds, 5-minute window) and configured cost or quality preferences to pick the endpoint within the task's fallback chain. Use gammainfra/auto for smart routing, gammainfra/fast to optimize for latency, or gammainfra/cheap to optimize for cost.

Question 3

Is GammaInfra OpenAI-compatible?

Accepted Answer

Yes. The API endpoint is wire-format compatible with OpenAI's /v1/chat/completions. Replace https://api.openai.com/v1 with https://api.gammainfra.com/v1 in your existing OpenAI SDK code — every other field (messages, temperature, stream, tools, tool_choice, response_format) works as-is.

Question 4

How much does GammaInfra cost?

Accepted Answer

Token pricing is pass-through from each provider with 0% markup. The fee is on top-up, not per-token: 3% during the launch window (through 2026-06-23) and 5% afterward, minimum $10 top-up. There is also a BYOK option that uses your own provider API keys at a 1% per-request fee during launch (2% standard), with a separate prepaid balance and $5 minimum top-up.

Question 5

Can I bring my own provider API keys (BYOK)?

Accepted Answer

Yes. Configure your OpenAI, Anthropic, Google, or other provider keys in the dashboard and GammaInfra will use them for the matching provider in any fallback chain. BYOK has a separate prepaid balance and a per-request fee of 1% during launch (2% standard). When the BYOK balance hits zero, requests return 402 byok_balance_empty — never a silent fallback to managed credits.

Question 6

How do I see the cost of an individual API call?

Accepted Answer

Every successful response carries X-GammaInfra-Cost-USD (total) plus X-GammaInfra-Input-Cost-USD and X-GammaInfra-Output-Cost-USD for the per-direction split. The X-GammaInfra-Endpoint header tells you which provider/model actually served the request. Sum these across a session to know exactly what your workload cost.

Question 7

Which LLM providers does GammaInfra support?

Accepted Answer

Every major LLM provider: OpenAI (gpt-5.5 family, gpt-5.4 family, gpt-5 family, gpt-oss), Anthropic (Claude Opus 4.7 and 4.6, Claude Sonnet 4.6, Claude Haiku 4.5), Google (Gemini 3.1, 3, and 2.5 family), Mistral (Large, Small, Devstral), Groq (Llama 3.x), DeepSeek (V4 Pro and Flash, V3 legacy), xAI (Grok 4 family, Grok 3, Grok Code), and Amazon Bedrock (Claude, Llama, Mistral, Amazon Nova).

Question 8

What happens if a provider goes down or rate-limits a request?

Accepted Answer

Every task type has a 3-to-4 endpoint fallback chain across distinct providers. If the primary fails (timeout, rate limit, 5xx error), the router cascades to the next endpoint and the actual cascade is reported in the X-GammaInfra-Fallback-Chain response header. The customer never sees the failure unless every chain entry fails.

Smart routing for every major LLM.
One API. Best model per prompt.

How the router decides.

Classify

Score

Dispatch

Track

Right model for the job.

Every major LLM. One key.

Three numbers. Zero subscriptions.

Everything in one API. No add-ons, no premium tiers.

Common questions about smart LLM routing.

Smart routing for every major LLM.One API. Best model per prompt.