Question 1

What is the difference between an LLM gateway and an LLM proxy?

Accepted Answer

A proxy forwards requests verbatim. A gateway adds value: it normalizes wire format across providers, classifies prompts by task, picks the best-fit model, falls back across providers on failure, attaches per-request observability, and enforces caller-side budgets like max latency or cost-quality preference. Most production LLM gateways are also proxies, but proxies are not necessarily gateways.

Question 2

Why use an LLM gateway instead of calling each provider directly?

Accepted Answer

Three reasons. One: a single integration surface (one API key, one wire format) replaces N provider integrations. Two: provider rate-limits and outages become transparent — the gateway cascades through a fallback chain instead of failing the request. Three: per-request cost and routing decisions become observable in response headers without a separate accounting pipeline.

Question 3

Is an LLM gateway the same as a model router?

Accepted Answer

A model router is a component inside a gateway. The router decides which provider and model to dispatch a given request to. The gateway is the entire system around it — authentication, rate limiting, request logging, billing, the OpenAI-compatible wire surface, and the router itself. See the LLM router entry for the routing-decision component in isolation.

Question 4

Does an LLM gateway add latency?

Accepted Answer

Yes — typically 10 to 50 ms of overhead per request from the additional network hop and the routing decision. In return, a well-built gateway can hedge requests (race two providers in parallel and take the first success), which often reduces p95 latency relative to going direct. See the hedged requests entry.

Question 5

What features should an LLM gateway expose?

Accepted Answer

At minimum: OpenAI-compatible wire format, multi-provider fallback, per-request cost tracking, observability headers (cost, routing decision, fallback cascade), max-latency budgets, and BYOK support. Optional but valuable: task-aware routing, hedged requests, region constraints, cost-quality dial, structured response-format mode, tool-call ID translation across providers.

What is an LLM gateway?

The shape of the problem

What an LLM gateway typically does

How GammaInfra implements an LLM gateway

Common questions

Try the gateway