Question 1

What value should I pass for the dial?

Accepted Answer

0.3 is a sensible production default — mostly quality-biased, drops to cheap when the model-fit difference is marginal. 0.0 forces flagship-only and gets expensive fast. 1.0 forces cheap-only and risks quality regressions on hard prompts. 0.5 is the neutral midpoint, equivalent to no dial at all.

Question 2

Is the dial actually continuous or does it bucket internally?

Accepted Answer

Phase 1 (current default) buckets into two presets: any value below 0.5 maps to the quality preset, any value at or above 0.5 maps to the cost preset. The applied value is echoed back. Phase 2 (when the oracle response grid is fully populated) computes a continuous score per model from the grid — at that point any dial value can produce a unique model choice.

Question 3

Can the dial override task-aware classification?

Accepted Answer

The dial biases endpoint selection within a task's chain — it doesn't change which task the prompt is classified as. A reasoning prompt with cost-quality=1.0 still gets routed to the reasoning chain, but to its cheap-tier entry (DeepSeek V4 Flash for reasoning) rather than the flagship. The chain itself is task-defined; the dial picks which entry within the chain runs first.

Question 4

What happens if I pass an out-of-range value like -0.5 or 2.7?

Accepted Answer

The header is silently dropped and the request falls back to default preference. The dial never returns a 400 on bad input — this is intentional, so an upstream proxy that mangles header values can't break otherwise-valid requests. If you depend on the dial taking effect, check X-GammaInfra-Cost-Quality-Applied on the response to confirm.

Question 5

How does the dial relate to the auto / fast / cheap aliases?

Accepted Answer

gammainfra/auto with no dial is equivalent to dial=0.5 (neutral). gammainfra/cheap is equivalent to setting preference=cost; gammainfra/fast is equivalent to preference=latency. The dial sits between these — it's the way to express a precise cost/quality position rather than picking one of three buckets.

What is the LLM cost-quality dial?

The problem the dial solves

How to use it on GammaInfra

Per-request:

Per-SDK-client:

The response echo

Precedence rules

Common questions

Try the gateway