What is the LLM cost-quality dial?
Cost-quality dial — a continuous request-time parameter, typically in [0.0, 1.0], that biases an LLM router's model selection along the cost-quality trade-off. 0.0 means "pure quality — pick the strongest model available for this task." 1.0 means "pure cost — pick the cheapest viable model." Intermediate values bias proportionally.
The problem the dial solves
Without a dial, model selection is discrete: cheap, balanced, quality. Three buckets, no in-between. But real applications have hundreds of distinct call sites with different tolerance for the trade-off — one might want "mostly cheap but spend a little for quality on tool-heavy steps," another might want "90% quality bias, but drop to cheap if cost crosses a threshold."
The continuous dial lets each call site express its own position. The router can then make per-request model choices that aren't limited to three buckets.
How to use it on GammaInfra
Pass the value as a request header:
Per-request:
curl -X POST https://api.gammainfra.com/v1/chat/completions \
-H "Authorization: Bearer sk-gammainfra-..." \
-H "X-GammaInfra-Cost-Quality: 0.3" \
-H "Content-Type: application/json" \
-d '{"model": "gammainfra/auto", "messages": [...]}'
0.3 means "mostly quality-biased — pick a stronger model when one is materially better, drop to cheap when the difference is marginal."
Per-SDK-client:
from openai import OpenAI
client = OpenAI(
api_key="sk-gammainfra-...",
base_url="https://api.gammainfra.com/v1",
default_headers={"X-GammaInfra-Cost-Quality": "0.5"},
)
Every request from this client carries the dial value.
The response echo
When the dial drives a routing decision, the response carries X-GammaInfra-Cost-Quality-Applied: 0.3 — the actual value the router used. If the request also sets an explicit X-GammaInfra-Preference, that takes precedence and the dial echo is absent.
This lets caller-side tracing confirm the dial actually drove the decision. If it's absent on a request that set the header, something upstream stripped or overrode it.
Precedence rules
- Explicit model pin always wins. If the request specifies
model=anthropic/claude-opus-4-7, no dial value changes the outcome. - Explicit
X-GammaInfra-Preferencebeats the dial.quality,cost, orlatencyas preference forces that bias regardless of dial. - Dial wins over default. When neither explicit pin nor preference is set, the dial drives routing.
- Malformed dial silently falls through. A non-numeric value, NaN, or out-of-range value drops the header silently and the request falls back to default preference. Never returns 400 for a bad dial.
Common questions
What value should I pass for the dial?
0.3 is a sensible production default — mostly quality-biased, drops to cheap when the model-fit difference is marginal. 0.0 forces flagship-only and gets expensive fast. 1.0 forces cheap-only and risks quality regressions on hard prompts. 0.5 is the neutral midpoint, equivalent to no dial at all.Is the dial actually continuous or does it bucket internally?
0.5 maps to the quality preset, any value at or above 0.5 maps to the cost preset. The applied value is echoed back. Phase 2 (when the oracle response grid is fully populated) computes a continuous score per model from the grid — at that point any dial value can produce a unique model choice.Can the dial override task-aware classification?
cost-quality=1.0 still gets routed to the reasoning chain, but to its cheap-tier entry rather than the flagship. The chain itself is task-defined; the dial picks which entry within the chain runs first.What happens if I pass an out-of-range value like -0.5 or 2.7?
X-GammaInfra-Cost-Quality-Applied on the response to confirm.How does the dial relate to the auto / fast / cheap aliases?
gammainfra/auto with no dial is equivalent to dial=0.5 (neutral). gammainfra/cheap is equivalent to setting preference=cost; gammainfra/fast is equivalent to preference=latency. The dial sits between these — it's the way to express a precise cost/quality position rather than picking one of three buckets.Try the gateway
$3 free trial credit on signup, $10 minimum top-up. Pass-through provider rates plus 3% top-up fee during the launch window (5% after 2026-06-23).
Last updated 2026-05-15.