June 16, 2026

Why every GammaInfra response carries a cost-USD header

Most LLM APIs report tokens. The unit is technically correct — tokens are what get billed — but it's the wrong unit for the question developers are actually asking. The question is "how many dollars did that completion cost me?" and you can't answer it from tokens alone.

So every GammaInfra response carries this header:

X-GammaInfra-Cost-USD: 0.000123

Three decimal places of cents, the actual dollar value passed through from the underlying provider. No markup. The router can pick any model on any provider per prompt, but the number in the header is canonical for that request.

This post explains how we compute it, why it's harder than it sounds, and how to read it from the SDK you're already using.

Why dollars matter more than tokens

Token-counting was useful in 2023 when there were three frontier models and they all charged similar rates. In 2026 you can call:

A 1000-token completion costs anywhere from $0.00013 to $0.030 depending on which one served it — a 230× spread for the same token count. Token counts say nothing about cost in this world. You need the dollar value, at the request level, every time.

If you've ever opened your OpenAI invoice at end-of-month and tried to attribute spend to a specific feature, you know the alternative: aggregate counts grouped by API key, no way to drill down to "what did this user's session cost." The header solves it.

How the number gets computed

For each request, GammaInfra's signal collector knows three things that the upstream provider's response doesn't include directly:

  1. Which model actually served the request. When you call gammainfra/auto, the router picks. When you call openai/gpt-5, the fallback chain may have cascaded to a different model if OpenAI throttled. The X-GammaInfra-Endpoint response header tells you which one.
  2. The per-direction pricing for that model. Most models charge input and output differently — sometimes by 5×. The usage.prompt_tokens and usage.completion_tokens fields from the response, multiplied by the right per-direction rate, gives the dollar cost.
  3. Whether any per-context surcharges apply. Some models charge double for input above a context threshold — for example, GPT-5.5 doubles input tokens beyond 272K. The signal collector tracks this per-model and applies it in the calculation.

Sum the input and output costs (after surcharges) and you have a per-request dollar value. We emit it as X-GammaInfra-Cost-USD on the response. We also split it:

X-GammaInfra-Input-Cost-USD: 0.000045
X-GammaInfra-Output-Cost-USD: 0.000078
X-GammaInfra-Cost-USD: 0.000123

The split is useful for agent-loop chargeback where one direction usually dominates. A LangChain ReAct loop reading 80K tokens of context per tool call and outputting 200 has very different cost dynamics than a creative generation call doing the opposite.

Pass-through, not markup. The dollar value in the header is the underlying provider's listed rate, not a marked-up GammaInfra rate. There's a separate 3% top-up fee on managed credits (5% after the launch window ends Jun 23) — but that's charged when you top up, not per-token. Pass-through tokens, fee-on-cash-in.

Reading the header in different SDKs

Most OpenAI-compatible SDKs hide response headers behind a "raw response" interface. Here's how to extract the cost-USD header from common stacks:

curl

curl -i https://gammainfra.com/v1/chat/completions \
  -H "Authorization: Bearer sk-gammainfra-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"gammainfra/auto","messages":[{"role":"user","content":"hi"}]}'

The -i flag prints response headers. Look for the X-GammaInfra-Cost-USD line.

openai-python

from openai import OpenAI

client = OpenAI(
    base_url="https://gammainfra.com/v1",
    api_key="sk-gammainfra-...",
)

resp = client.with_raw_response.chat.completions.create(
    model="gammainfra/auto",
    messages=[{"role": "user", "content": "hi"}],
)

cost = resp.http_response.headers["X-GammaInfra-Cost-USD"]
completion = resp.parse()

print(f"Cost: ${cost}, content: {completion.choices[0].message.content}")

LangChain

LangChain's ChatOpenAI doesn't expose response headers by default. Drop down to the underlying OpenAI client (above) for per-call header access. For aggregate cost tracking across many LangChain calls, the dashboard at dashboard.gammainfra.com rolls up per-request cost by model.

Vercel AI SDK

import { createOpenAI } from '@ai-sdk/openai';

const kraken = createOpenAI({
  baseURL: 'https://gammainfra.com/v1',
  apiKey: process.env.KRAKEN_API_KEY,
  fetch: async (input, init) => {
    const res = await fetch(input, init);
    console.log('Cost USD:', res.headers.get('X-GammaInfra-Cost-USD'));
    return res;
  },
});

The custom fetch sees every response and can log the cost header (or push it to a metrics system, or write it to your DB alongside the request).

What the header doesn't include

Three things the X-GammaInfra-Cost-USD value is not:

  1. It doesn't include the top-up fee. When you top up $100 of managed credits, the 3% fee (during launch) means you get $97 of credits. Those credits then deduct at pass-through provider rates. The header reflects the pass-through rate, not your effective cost-after-fee.
  2. It doesn't include any failed retry costs. If the request cascaded through 3 providers because the first two rate-limited you, the cost in the header is from the provider that succeeded. Failed attempts that returned no tokens cost nothing. Look at X-GammaInfra-Fallback-Chain to see which providers were tried.
  3. It doesn't apply to BYOK requests with $0 deduction. If your BYOK balance is empty and you have BYOK keys configured, the request returns 402 — no provider call happens, no cost incurred.

The whole observability header surface

Cost-USD is the most-asked-about header, but it ships with siblings:

HeaderWhat it tells you
X-GammaInfra-Cost-USDTotal cost of this request in USD
X-GammaInfra-Input-Cost-USDCost of input tokens only
X-GammaInfra-Output-Cost-USDCost of output tokens only
X-GammaInfra-EndpointWhich provider/model actually served the request
X-GammaInfra-Fallback-ChainCascade order if multiple providers were tried
X-GammaInfra-Logical-ModelThe task classification (chat / code / reasoning / etc.)
X-GammaInfra-Cost-Quality-AppliedEcho of your X-GammaInfra-Cost-Quality dial value if you sent one
X-GammaInfra-ProviderProvider name (also derivable from X-GammaInfra-Endpoint)
X-GammaInfra-Attempted-CountHow many fallback attempts ran

Together these answer most "why did this request go the way it did" questions in one HTTP round trip. The full reference is in the docs.

The takeaway

Token counts are universal but they hide the question developers actually have. Dollar amounts answer it. Putting them in a response header instead of a separate dashboard query means every script that touches your LLM API can react to cost without an extra round trip.

If you want to try it, signup is at gammainfra.com. $3 trial credit on signup; you'll see your first X-GammaInfra-Cost-USD line on the first request.

Get a GammaInfra API key →

Frequently asked questions

What is the X-GammaInfra-Cost-USD header?
The total cost of the request in USD, returned on every successful response. Sum it across a session and you know exactly what a workload cost — no client-side recompute, no separate dashboard tab, no waiting for the monthly invoice.
How is the cost number computed?
Provider pass-through token rates times the request's prompt and completion tokens. GammaInfra also returns the per-direction split as X-GammaInfra-Input-Cost-USD and X-GammaInfra-Output-Cost-USD for chargeback when one direction dominates (agent-loop context, for example).
Does the cost header include GammaInfra's fee?
No — the header is the pass-through provider token cost with 0% markup. GammaInfra's fee is taken at top-up time (3% during the launch window through 2026-06-23, 5% after), or 1–2% per request on the BYOK path. The header is the canonical provider spend, not your blended bill.
How do I read the header in my SDK?
It is a standard HTTP response header, so it surfaces wherever your SDK exposes response headers — the response-headers accessor on most OpenAI-SDK result objects, or an onFinish/stream-end callback for streaming. The header name is stable across SDK versions; consult your SDK's docs for the exact accessor.
What other observability headers does GammaInfra return?
X-GammaInfra-Endpoint (which provider/model served the call), X-GammaInfra-Fallback-Chain (the cascade if one fired), X-GammaInfra-Router-Version (which routing path), X-GammaInfra-Cost-Quality-Applied (your dial echoed back when it drove the decision), and the standard X-RateLimit-Limit/Remaining/Reset trio.