Question 1

What triggers a fallback?

Accepted Answer

HTTP 5xx errors, 429 rate-limit errors, connection timeouts, exceeded per-provider timeout (30 seconds default), and exceeded caller's X-GammaInfra-Max-Latency-Ms budget if set. The provider's health-check failures also drop it from candidate selection for the next 30 seconds.

Question 2

Can I customize the chain per request?

Accepted Answer

Yes. Pass models: [list, of, models, in, order] in the request body to use that explicit chain (fails 503 on exhaustion, no auto-router). Or use provider.only and provider.ignore to filter the default chain. Or X-GammaInfra-Routing: literal to force literal endpoint selection from a bare model name.

Question 3

What happens when every chain entry fails?

Accepted Answer

The gateway returns 503 with code providers_down and includes the full cascade in X-GammaInfra-Fallback-Chain and X-GammaInfra-Fallback-Reason. This is rare in practice — chains span 3+ providers, and simultaneous outages across distinct vendors are statistically uncommon. When it does happen, retry with exponential backoff; the issue usually clears within seconds.

Question 4

Does fallback work for streaming responses?

Accepted Answer

Fallback only fires before the first byte of response data has been sent. Once the gateway has begun streaming, mid-stream provider failures cannot fall back — the caller would already have received partial data, and switching providers would produce inconsistent output. Pre-stream errors (auth, rate-limit, immediate 5xx) do cascade normally.

Question 5

What's the latency cost of a fallback?

Accepted Answer

Each failed attempt adds whatever time the failed provider took to return its error — typically 0.5 to 3 seconds for upstream 5xx, up to the per-provider timeout (30 seconds default) for hangs. Setting X-GammaInfra-Max-Latency-Ms caps the total time the gateway will spend cascading.

What is an LLM fallback chain?

Why fallback chains exist

What gets included in a chain

GammaInfra's chains per task label

How callers see the cascade

Common questions

Try the gateway