Question 1

How accurate is the task classifier?

Accepted Answer

GammaInfra's classifier hits roughly 73% top-1 accuracy on a held-out validation set of 88 prompts. That sounds low until you realize that adjacent labels (creative vs rewrite, chat vs summarize) often have equivalent best-fit models, so misclassifications don't always cost anything. The endpoint registry collapses some adjacent labels to the same chain head for exactly this reason.

Question 2

Can I override the routing decision per request?

Accepted Answer

Yes. Pin a specific model with model=anthropic/claude-opus-4-7. Or bias the router with X-GammaInfra-Preference: quality (forces toward stronger models) or X-GammaInfra-Cost-Quality: 0.3 (continuous dial, 0=quality, 1=cost). Explicit pins bypass classification entirely.

Question 3

What if my prompt doesn't fit any of the 8 labels well?

Accepted Answer

The classifier outputs probabilities for all 8 labels. If no single label exceeds the confidence threshold, the request falls through to the v1 keyword-rule router (10 task types, broader patterns). If that doesn't match either, it dispatches to the chat chain as a universal default. The X-GammaInfra-Router-Version response header tells you which path served the request: v2 (learned), v2_keyword (rule shortcut), v1 (fallback), or direct (explicit pin).

Question 4

Does task-aware routing handle multi-turn conversations?

Accepted Answer

Yes — each turn is classified independently. This sometimes routes different turns of one conversation to different providers, which is fine for text-only chat. For tool-heavy agent loops, switching providers mid-conversation can break tool_call.id continuity (each provider validates IDs it issued), so the recommended pattern is to pin a provider for one session and switch providers between sessions.

Question 5

How does task-aware routing relate to model routing in research papers (e.g. RouteLLM, FrugalGPT)?

Accepted Answer

It's the practical production form of those ideas. RouteLLM and FrugalGPT propose learning a per-prompt model-selection policy that maximizes quality-per-dollar. GammaInfra's classifier is one concrete implementation: discrete labels, a small embedding model, a calibrated classifier, and a per-label endpoint registry. The continuous cost-quality dial (X-GammaInfra-Cost-Quality) is the production hook for the cost-quality trade-off those papers parametrize.

What is task-aware LLM routing?

Why one model is the wrong default

The 8 labels GammaInfra classifies into

How the classifier works

The measured cost win

Common questions

Try the gateway