GammaInfra MCP server
Smart routing as a tool your agent can call. The GammaInfra MCP server plugs into any Model Context Protocol host — Claude Code, Claude Desktop, Cursor, Cline, Continue — and gives the agent direct tool access to task-aware routing across every major LLM, with the exact cost and routing decision returned on every call.
Why call routing as a tool
Most MCP setups give an agent tools for the outside world — files, shells, browsers, APIs. The model itself is fixed by the host. That's a missed lever: a six-step research loop doesn't need the same model for query rewriting, deep reasoning, and JSON extraction, and paying flagship rates for all three is the most common source of silent agent cost.
This server exposes GammaInfra's router as a callable tool, so the agent can route each step to the best-fit model per prompt and read back exactly what it cost — without the host being reconfigured or the agent loop being rewritten.
The four tools
chat_completions— the main one. Call any supported model, orgammainfra/autoto let the router pick per prompt. Acceptscost_quality,max_latency_ms,preference, andregioncontrols. Returns the response plus a structuredrouting_metaobject (provider, endpoint, per-direction cost in USD, fallback chain) — the routing decision is visible inline, not buried in a dashboard.list_models— the full catalog with per-token pricing and capability flags, so the agent can choose a model on price or capability at runtime.get_balance— managed balance; passinclude_byok=truefor the BYOK balance too. Lets a long-running agent check headroom before a big fan-out.get_status— overall and per-provider health, so an agent can gate a heavy run on provider availability.
Install
1. Get a GammaInfra API key
Sign up at gammainfra.com and verify your email. The $3 free trial credit is enough to exercise all four tools end-to-end before topping up. You'll need Node.js 18+ — the server runs via npx, no manual install.
2. Register the server in your MCP host
Claude Code — one command:
claude mcp add gammainfra \
--env GAMMAINFRA_API_KEY=sk-gammainfra-... \
-- npx -y @gammainfra/mcp-server
Claude Desktop — edit claude_desktop_config.json (~/Library/Application Support/Claude/ on macOS, %APPDATA%\Claude\ on Windows):
{
"mcpServers": {
"gammainfra": {
"command": "npx",
"args": ["-y", "@gammainfra/mcp-server"],
"env": { "GAMMAINFRA_API_KEY": "sk-gammainfra-..." }
}
}
}
Cursor — edit ~/.cursor/mcp.json with the same mcpServers block. Cline (VS Code) — add it under the MCP Servers tab with "disabled": false. The command, args, and env are identical across these JSON-config hosts.
Continue — Continue uses YAML. Add to ~/.continue/config.yaml (or a file under ~/.continue/mcpServers/):
mcpServers:
- name: GammaInfra
command: npx
args:
- -y
- "@gammainfra/mcp-server"
env:
GAMMAINFRA_API_KEY: sk-gammainfra-...
3. Restart the host
Restart the MCP host. The four tools appear immediately — ask the agent to "list available models on gammainfra" to confirm the round-trip.
GAMMAINFRA_API_KEY is the only required configuration. GAMMAINFRA_BASE_URL is optional and defaults to https://api.gammainfra.com/v1 — set it only for staging or a self-pointed endpoint. The server validates the key is present at startup and exits with a clear message if it's missing, so a misconfigured host fails loudly rather than silently.
What the agent gets back
Every chat_completions call returns the model response and a routing_meta object the agent can reason over directly:
provider/endpoint— which provider and physical model actually served the callcost_usd,input_cost_usd,output_cost_usd— exact spend, split for per-step budgeting and chargebackrouter_version/logical_model— how the prompt was classified and routedfallback_chain/attempted_count— the full cascade if a provider was down and the request was rerouted
An agent can sum cost_usd across a loop and stop when it exceeds a budget, or switch to gammainfra/cheap once the expensive reasoning steps are done — decisions it can't make when cost is invisible until the invoice.
Notes & trade-offs
- Non-streaming. MCP tool responses are atomic, so the server always requests non-streamed completions. For token streaming, call the GammaInfra HTTP API directly — see the docs.
- stdio transport. The server runs as a child process of the host over stdio — the standard MCP local-server model. No ports, no inbound network.
- Key stays local. The API key is read from the host's environment and sent only to GammaInfra over TLS. It is never written to disk by the server or logged.
- Pass-through pricing. Provider token rates with no markup, plus the standard top-up fee (3% during the launch window, 5% after 2026-06-23). BYOK alternative at 1–2% per request.
Source & registry
The server is open source (MIT) and listed in the official Model Context Protocol registry:
- npm:
@gammainfra/mcp-server—npx -y @gammainfra/mcp-server - Source: github.com/yuz0101/gammainfra-mcp-server
- MCP registry:
com.gammainfra/mcp-server
Troubleshoot
- "GAMMAINFRA_API_KEY environment variable is required" in the host log. The env var didn't reach the spawned process — re-add the server with the
--envflag (Claude Code) or theenvblock (JSON hosts). - Tools don't appear after install. Restart the host process; some hosts cache the MCP server list until restart.
- 401 on tool calls. The key was rejected — verify it in the dashboard and regenerate if needed.
Error codes are in the docs. Stuck? Discord — usually a quick fix.
Ready to try it?
$3 free trial credit on signup, $10 minimum top-up. Pass-through provider token rates plus 3% top-up fee during the launch window (5% after 2026-06-23).
Frequently asked questions
What is the GammaInfra MCP server?
npx @gammainfra/mcp-server, no manual install.What tools does the MCP server expose?
chat_completions (call any model or gammainfra/auto, accepts cost_quality / max_latency_ms / preference / region, returns a structured routing_meta object), list_models (full catalog with per-token pricing and capability flags), get_balance (managed balance; include_byok=true adds the BYOK balance), and get_status (overall and per-provider health, so an agent can gate a heavy run on availability).How do I install the GammaInfra MCP server?
claude mcp add gammainfra --env GAMMAINFRA_API_KEY=sk-gammainfra-... -- npx -y @gammainfra/mcp-server. For Claude Desktop / Cursor / Cline: add the same mcpServers JSON block to the host config. For Continue: the equivalent YAML under ~/.continue/config.yaml. Restart the host; the four tools appear immediately.Does the MCP server support streaming responses?
How does an agent see per-call cost through the MCP server?
chat_completions call returns a routing_meta object alongside the response: provider / endpoint (which physical model served it), cost_usd plus input_cost_usd / output_cost_usd (exact spend, split for per-step budgeting), router_version / logical_model (how the prompt was classified), and fallback_chain / attempted_count (the cascade if a provider was down). An agent can sum cost_usd across a loop and stop when it exceeds a budget, or switch to gammainfra/cheap once the expensive reasoning steps are done.Where does my API key go when using the MCP server?
GAMMAINFRA_API_KEY is the only required config) and sent only to GammaInfra over TLS. It is never written to disk by the server or logged. The server validates the key is present at startup and exits with a clear message if it's missing, so a misconfigured host fails loudly rather than silently. GAMMAINFRA_BASE_URL is optional and defaults to https://api.gammainfra.com/v1.