What is the LLM cost-quality dial?

Cost-quality dial — a continuous request-time parameter, typically in [0.0, 1.0], that biases an LLM router's model selection along the cost-quality trade-off. 0.0 means "pure quality — pick the strongest model available for this task." 1.0 means "pure cost — pick the cheapest viable model." Intermediate values bias proportionally.

The problem the dial solves

Without a dial, model selection is discrete: cheap, balanced, quality. Three buckets, no in-between. But real applications have hundreds of distinct call sites with different tolerance for the trade-off — one might want "mostly cheap but spend a little for quality on tool-heavy steps," another might want "90% quality bias, but drop to cheap if cost crosses a threshold."

The continuous dial lets each call site express its own position. The router can then make per-request model choices that aren't limited to three buckets.

How to use it on GammaInfra

Pass the value as a request header:

Per-request:

curl -X POST https://api.gammainfra.com/v1/chat/completions \
  -H "Authorization: Bearer sk-gammainfra-..." \
  -H "X-GammaInfra-Cost-Quality: 0.3" \
  -H "Content-Type: application/json" \
  -d '{"model": "gammainfra/auto", "messages": [...]}'

0.3 means "mostly quality-biased — pick a stronger model when one is materially better, drop to cheap when the difference is marginal."

Per-SDK-client:

from openai import OpenAI
client = OpenAI(
    api_key="sk-gammainfra-...",
    base_url="https://api.gammainfra.com/v1",
    default_headers={"X-GammaInfra-Cost-Quality": "0.5"},
)

Every request from this client carries the dial value.

The response echo

When the dial drives a routing decision, the response carries X-GammaInfra-Cost-Quality-Applied: 0.3 — the actual value the router used. If the request also sets an explicit X-GammaInfra-Preference, that takes precedence and the dial echo is absent.

This lets caller-side tracing confirm the dial actually drove the decision. If it's absent on a request that set the header, something upstream stripped or overrode it.

Precedence rules

Common questions

What value should I pass for the dial?
0.3 is a sensible production default — mostly quality-biased, drops to cheap when the model-fit difference is marginal. 0.0 forces flagship-only and gets expensive fast. 1.0 forces cheap-only and risks quality regressions on hard prompts. 0.5 is the neutral midpoint, equivalent to no dial at all.
Is the dial actually continuous or does it bucket internally?
Phase 1 (current default) buckets into two presets: any value below 0.5 maps to the quality preset, any value at or above 0.5 maps to the cost preset. The applied value is echoed back. Phase 2 (when the oracle response grid is fully populated) computes a continuous score per model from the grid — at that point any dial value can produce a unique model choice.
Can the dial override task-aware classification?
The dial biases endpoint selection within a task's chain — it doesn't change which task the prompt is classified as. A reasoning prompt with cost-quality=1.0 still gets routed to the reasoning chain, but to its cheap-tier entry rather than the flagship. The chain itself is task-defined; the dial picks which entry within the chain runs first.
What happens if I pass an out-of-range value like -0.5 or 2.7?
The header is silently dropped and the request falls back to default preference. The dial never returns a 400 on bad input — this is intentional, so an upstream proxy that mangles header values can't break otherwise-valid requests. If you depend on the dial taking effect, check X-GammaInfra-Cost-Quality-Applied on the response to confirm.
How does the dial relate to the auto / fast / cheap aliases?
gammainfra/auto with no dial is equivalent to dial=0.5 (neutral). gammainfra/cheap is equivalent to setting preference=cost; gammainfra/fast is equivalent to preference=latency. The dial sits between these — it's the way to express a precise cost/quality position rather than picking one of three buckets.

Try the gateway

Get a GammaInfra API key →

$3 free trial credit on signup, $10 minimum top-up. Pass-through provider rates plus 3% top-up fee during the launch window (5% after 2026-06-23).

Last updated 2026-05-15.