GammaInfra MCP server

Smart routing as a tool your agent can call. The GammaInfra MCP server plugs into any Model Context Protocol host — Claude Code, Claude Desktop, Cursor, Cline, Continue — and gives the agent direct tool access to task-aware routing across every major LLM, with the exact cost and routing decision returned on every call.

Why call routing as a tool

Most MCP setups give an agent tools for the outside world — files, shells, browsers, APIs. The model itself is fixed by the host. That's a missed lever: a six-step research loop doesn't need the same model for query rewriting, deep reasoning, and JSON extraction, and paying flagship rates for all three is the most common source of silent agent cost.

This server exposes GammaInfra's router as a callable tool, so the agent can route each step to the best-fit model per prompt and read back exactly what it cost — without the host being reconfigured or the agent loop being rewritten.

The four tools

Install

1. Get a GammaInfra API key

Sign up at gammainfra.com and verify your email. The $3 free trial credit is enough to exercise all four tools end-to-end before topping up. You'll need Node.js 18+ — the server runs via npx, no manual install.

2. Register the server in your MCP host

Claude Code — one command:

claude mcp add gammainfra \
  --env GAMMAINFRA_API_KEY=sk-gammainfra-... \
  -- npx -y @gammainfra/mcp-server

Claude Desktop — edit claude_desktop_config.json (~/Library/Application Support/Claude/ on macOS, %APPDATA%\Claude\ on Windows):

{
  "mcpServers": {
    "gammainfra": {
      "command": "npx",
      "args": ["-y", "@gammainfra/mcp-server"],
      "env": { "GAMMAINFRA_API_KEY": "sk-gammainfra-..." }
    }
  }
}

Cursor — edit ~/.cursor/mcp.json with the same mcpServers block. Cline (VS Code) — add it under the MCP Servers tab with "disabled": false. The command, args, and env are identical across these JSON-config hosts.

Continue — Continue uses YAML. Add to ~/.continue/config.yaml (or a file under ~/.continue/mcpServers/):

mcpServers:
  - name: GammaInfra
    command: npx
    args:
      - -y
      - "@gammainfra/mcp-server"
    env:
      GAMMAINFRA_API_KEY: sk-gammainfra-...

3. Restart the host

Restart the MCP host. The four tools appear immediately — ask the agent to "list available models on gammainfra" to confirm the round-trip.

One env var. GAMMAINFRA_API_KEY is the only required configuration. GAMMAINFRA_BASE_URL is optional and defaults to https://api.gammainfra.com/v1 — set it only for staging or a self-pointed endpoint. The server validates the key is present at startup and exits with a clear message if it's missing, so a misconfigured host fails loudly rather than silently.

What the agent gets back

Every chat_completions call returns the model response and a routing_meta object the agent can reason over directly:

An agent can sum cost_usd across a loop and stop when it exceeds a budget, or switch to gammainfra/cheap once the expensive reasoning steps are done — decisions it can't make when cost is invisible until the invoice.

Notes & trade-offs

Source & registry

The server is open source (MIT) and listed in the official Model Context Protocol registry:

Troubleshoot

Error codes are in the docs. Stuck? Discord — usually a quick fix.

Ready to try it?

Get a GammaInfra API key →

$3 free trial credit on signup, $10 minimum top-up. Pass-through provider token rates plus 3% top-up fee during the launch window (5% after 2026-06-23).

Frequently asked questions

What is the GammaInfra MCP server?
An open-source (MIT) Model Context Protocol server that exposes GammaInfra's smart router as callable tools. It plugs into any MCP host — Claude Code, Claude Desktop, Cursor, Cline, Continue — and gives the agent direct tool access to task-aware routing across every major LLM, returning the exact cost and routing decision on every call. Runs via npx @gammainfra/mcp-server, no manual install.
What tools does the MCP server expose?
Four: chat_completions (call any model or gammainfra/auto, accepts cost_quality / max_latency_ms / preference / region, returns a structured routing_meta object), list_models (full catalog with per-token pricing and capability flags), get_balance (managed balance; include_byok=true adds the BYOK balance), and get_status (overall and per-provider health, so an agent can gate a heavy run on availability).
How do I install the GammaInfra MCP server?
Get an API key (the $3 trial credit exercises all four tools end-to-end). You need Node.js 18+. For Claude Code: claude mcp add gammainfra --env GAMMAINFRA_API_KEY=sk-gammainfra-... -- npx -y @gammainfra/mcp-server. For Claude Desktop / Cursor / Cline: add the same mcpServers JSON block to the host config. For Continue: the equivalent YAML under ~/.continue/config.yaml. Restart the host; the four tools appear immediately.
Does the MCP server support streaming responses?
No. MCP tool responses are atomic, so the server always requests non-streamed completions. For token streaming, call the GammaInfra HTTP API directly (see the docs). The server uses stdio transport — it runs as a child process of the host, no ports and no inbound network.
How does an agent see per-call cost through the MCP server?
Every chat_completions call returns a routing_meta object alongside the response: provider / endpoint (which physical model served it), cost_usd plus input_cost_usd / output_cost_usd (exact spend, split for per-step budgeting), router_version / logical_model (how the prompt was classified), and fallback_chain / attempted_count (the cascade if a provider was down). An agent can sum cost_usd across a loop and stop when it exceeds a budget, or switch to gammainfra/cheap once the expensive reasoning steps are done.
Where does my API key go when using the MCP server?
The key is read from the host's environment (GAMMAINFRA_API_KEY is the only required config) and sent only to GammaInfra over TLS. It is never written to disk by the server or logged. The server validates the key is present at startup and exits with a clear message if it's missing, so a misconfigured host fails loudly rather than silently. GAMMAINFRA_BASE_URL is optional and defaults to https://api.gammainfra.com/v1.