Question 1

Should I always use hedged requests?

Accepted Answer

No. Hedging roughly doubles token cost. Use it when latency variance hurts your application — interactive chat with strict p95 SLAs, real-time tools, autocomplete-style features. Skip it for batch processing, background workflows, and cost-sensitive workloads where median latency is fine.

Question 2

How much does hedging actually reduce latency?

Accepted Answer

Depends on provider-pair correlation. On uncorrelated providers (different vendors, different clouds), hedging typically drops p95 by 30-60% in practice. On correlated providers (e.g. two endpoints on the same vendor sharing infrastructure), the win is smaller because both tend to be slow at the same times. GammaInfra picks the top-2 across distinct providers to maximize the independence assumption.

Question 3

What if both hedged providers fail?

Accepted Answer

The gateway re-raises the error and falls through to the standard fallback chain. So a hedged request that fails on both primary endpoints still tries chain entries 3, 4, etc. The customer never sees the failure unless every chain entry fails.

Question 4

Does hedging count once or twice against my rate limit?

Accepted Answer

Twice — both upstream provider calls are real. The 240-rpm per-API-key cap on the gateway counts the customer's request once (it's one gateway call), but the gateway makes two upstream calls so your provider-side rate limit on each provider is hit independently. If you BYOK, this means hedged requests effectively halve your per-provider rate-limit headroom.

Question 5

Is hedging the same as fallback?

Accepted Answer

No. Fallback is sequential: try provider A, if it fails try provider B. Hedging is parallel: try both A and B simultaneously, take the first success. Fallback adds latency on failure (you waited for A to fail before starting B). Hedging adds cost on success (both A and B ran, only one was needed). The two are complementary — hedging the head of a chain, fallback the tail.

What are hedged LLM requests?

Why hedging works

The trade-off

How GammaInfra hedges

Why streaming hedging is deferred

Common questions

Try the gateway