Use Cursor with GammaInfra

Point Cursor at GammaInfra for per-completion cost visibility, automatic provider fallback when one provider rate-limits, and per-mode model pinning. Setup takes about two minutes; no extension, no plugin.

The pain

Cursor's BYOK path uses your provider API key directly. That's fine until you hit either of two walls:

No per-completion cost visibility. A heavy autocomplete session can burn through tokens fast and you only see the damage on the provider's monthly invoice.
Provider throttling kills sessions. When OpenAI or Anthropic rate-limits your Cursor activity, Cursor stops dead. Manual mitigation is "switch your config to a different provider," which means setting it up again.

What changes with GammaInfra

Point Cursor at GammaInfra's smart router (https://api.gammainfra.com/v1) instead of https://api.openai.com/v1 — same OpenAI SDK shape, three things start happening:

Every completion response carries X-GammaInfra-Cost-USD, X-GammaInfra-Input-Cost-USD, and X-GammaInfra-Output-Cost-USD headers. Sum across a session and you know exactly what that 200-completion refactor cost.
The fallback chain absorbs provider rate-limits. When OpenAI throttles your key, GammaInfra cascades to the next provider in the chain — typically Anthropic, then Google, then Mistral — transparently. Cursor sees a 200, your refactor finishes.
You can mix models per Cursor mode. gammainfra/auto for inline autocomplete (task-aware routing picks the cheap fast model), anthropic/claude-opus-4-7 for the Chat panel when you want quality. One key, many endpoints.

Setup

1. Get a GammaInfra API key

Sign up at gammainfra.com and verify your email. New accounts get $3 of trial credit after verification — enough for a few hundred Cursor completions to confirm the flow works before you top up. Copy the API key from the dashboard.

2. Open Cursor's model settings

In Cursor:

Cmd/Ctrl + , (or open Settings from the menu)
Navigate to Models → OpenAI API Key
Toggle on Override OpenAI Base URL

3. Set base URL and key

Two fields:

OpenAI Base URL: https://api.gammainfra.com/v1
OpenAI API Key:  sk-gammainfra-...   (paste your GammaInfra key here)

Click Verify. Cursor will make a test request to confirm the endpoint responds.

4. Pick a model

In the same settings panel, set the model name. Some choices:

gammainfra/auto — task-aware routing. Cursor's autocomplete prompts route to fast cheap models; longer Chat-panel prompts route to higher-quality models. This is the default we recommend for most users.
gammainfra/fast — latency-optimized. Hedged requests when enabled, p50-latency-aware picker. Best for inline completion.
gammainfra/cheap — cost-optimized. Cheapest viable model for the prompt.
openai/gpt-5-mini — pin OpenAI's small model directly.
anthropic/claude-opus-4-7 — pin Claude Opus.
anthropic/claude-sonnet-4-6 — pin Claude Sonnet, balanced quality + speed.

Per-mode pinning trick: Cursor lets you configure different models for inline autocomplete vs the Chat panel. Set inline to gammainfra/fast and Chat to anthropic/claude-opus-4-7 — you get cheap fast completions while typing, quality reasoning when you ask a hard question.

Verify it's working

Trigger any completion in Cursor (start typing a function, ask the Chat panel a question). Then check your GammaInfra dashboard at dashboard.gammainfra.com:

Your request appears in the request log with the resolved provider and model.
Cost is itemized per request.
If you check Cursor's network panel (DevTools), you'll see X-GammaInfra-Cost-USD on the response headers.

Trade-offs to know about

Latency. Routing through GammaInfra adds 10–50 ms of overhead vs hitting OpenAI directly. For Cursor's autocomplete (TTFB-sensitive), this is usually imperceptible. The hedged-request feature on gammainfra/fast can actually reduce p50 latency vs going direct, since it races two providers in parallel.
Cost. GammaInfra charges a top-up fee (3% during the launch window, 5% standard) rather than marking up tokens. For a Cursor user spending $50/month on completions, that's $1.50–$2.50/month on top — the cost-visibility and fallback features generally pay for themselves the first time a provider throttles your session.
Privacy. GammaInfra doesn't log prompts or responses by default. See the privacy policy for the full picture.

Bring your own provider keys (BYOK)

If you have existing relationships with OpenAI/Anthropic/etc. and want to keep using your own provider keys directly, you can add them to GammaInfra via the dashboard's Provider Keys tab. GammaInfra will route through your keys when present (BYOK mode) and only charge a 1–2% per-request fee on the retail cost. Use this when you have provider credits to burn or when you want direct provider billing.

Troubleshoot

401 Unauthorized. The API key is wrong or hasn't been activated. Sign in to the dashboard and re-copy it.
402 Payment required. Your trial credit is exhausted. Top up the dashboard or wait for next month's free tier.
429 Rate limit exceeded. You hit the default 240-requests-per-minute rate limit. Slow down or contact us — heavy Cursor users get higher limits on request.
503 Service unavailable. All providers in the fallback chain are down. Rare; usually self-resolves within seconds.

Detailed error codes are in the docs. Stuck? Open a ticket in Discord — we usually reply within an hour.

Ready to try it?

Get a GammaInfra API key →

$3 free trial credit on signup, $10 minimum top-up. Pass-through provider token rates plus 3% top-up fee during the launch window (5% after 2026-06-23).

Frequently asked questions

Does using GammaInfra with Cursor add noticeable latency?

The extra network hop adds 10 to 50 ms per completion. For Cursor's inline autocomplete (TTFB-sensitive) this is usually imperceptible. If you set gammainfra/fast as the model, hedged requests can actually reduce p95 latency vs going direct because the gateway races two providers in parallel and takes the first success.

Can I see per-completion cost in Cursor?

Yes, indirectly. Cursor itself doesn't surface response headers in its UI, but GammaInfra's dashboard at dashboard.gammainfra.com shows every request with the resolved provider, model, and itemized cost. Open Cursor's DevTools network panel to see X-GammaInfra-Cost-USD on completion responses directly.

Can I configure different models for Cursor's inline autocomplete vs the Chat panel?

Yes. Cursor lets you set the model name separately for each mode. Set inline autocomplete to gammainfra/fast (latency-optimized) and the Chat panel to anthropic/claude-opus-4-7 (quality flagship). You get cheap fast completions while typing and quality reasoning when you ask a hard question.

What happens if my preferred provider rate-limits while I'm coding?

The gateway cascades to the next provider in the fallback chain transparently. When OpenAI throttles your key, GammaInfra typically tries Anthropic, then Google, then Mistral. Cursor sees a 200 response, your session keeps working. The actual cascade is reported in X-GammaInfra-Fallback-Chain on the response.

Can I use my own OpenAI key (BYOK) with Cursor via GammaInfra?

Yes. Add your OpenAI key in the GammaInfra dashboard under Provider Keys. GammaInfra will use your key when routing to OpenAI and charge 1–2% of retail cost per request from your BYOK balance instead of token markup. Useful when you have existing OpenAI credits or negotiated rates to burn through Cursor.