Use LibreChat with GammaInfra

Drop GammaInfra's smart router into LibreChat as a custom endpoint via the endpoints.custom block in librechat.yaml. Your self-hosted UI talks to every major LLM through one API key — with cost visibility and automatic provider fallback.

What changes with GammaInfra

One custom endpoint covers everything. Instead of separate endpoint definitions for OpenAI, Anthropic, Google, Mistral, etc., add GammaInfra once. The full catalog populates the model dropdown.
Per-request cost. LibreChat doesn't show cost natively. The GammaInfra dashboard rolls up exact per-request spend across users.
Fallback on provider throttle. Your local UI doesn't break when one provider rate-limits — GammaInfra cascades transparently.

Setup

1. Get a GammaInfra API key

2. Edit librechat.yaml

The configuration file lives in your LibreChat install root. If you don't have one yet, copy librechat.example.yaml to librechat.yaml and start there.

3. Add GammaInfra under endpoints.custom

endpoints:
  custom:
    - name: "GammaInfra"
      apiKey: "${KRAKEN_API_KEY}"
      baseURL: "https://api.gammainfra.com/v1"
      models:
        default:
          - "gammainfra/auto"
          - "gammainfra/fast"
          - "gammainfra/cheap"
          - "anthropic/claude-opus-4-7"
          - "anthropic/claude-sonnet-4-6"
          - "openai/gpt-5"
          - "openai/gpt-5-mini"
          - "google/gemini-3.1-pro-preview"
          - "deepseek/deepseek-v4-pro"
        fetch: false
      titleConvo: true
      titleModel: "gammainfra/cheap"
      modelDisplayLabel: "GammaInfra"

Save the file.

4. Set the env var

The ${KRAKEN_API_KEY} reference reads from your environment. In your LibreChat .env file:

KRAKEN_API_KEY=sk-gammainfra-...

5. Restart LibreChat

For Docker:

docker compose restart api

For direct installs:

pm2 restart librechat
# or whatever process manager you use

6. Verify in the UI

Open your LibreChat instance. The endpoint selector should now include "GammaInfra". Picking it shows the model list you configured under models.default.

Auto-fetch vs hardcoded model list: Setting fetch: true in the endpoint config makes LibreChat query GET /v1/models to populate the dropdown. That works (GammaInfra supports the endpoint) but you'll get 40+ models which is overwhelming. Hardcoding fetch: false with a curated models.default list keeps the UX clean.

Pricing display (optional)

LibreChat supports a tokens block under each custom endpoint to display token usage. Since GammaInfra passes through provider rates, you can set realistic per-model pricing for in-UI cost estimates:

endpoints:
  custom:
    - name: "GammaInfra"
      # ... (other fields)
      tokens:
        # Approximate — see /v1/models for actual provider rates
        "gammainfra/auto":               { input: 0.0015, output: 0.0060 }
        "anthropic/claude-opus-4-7": { input: 0.0150, output: 0.0750 }
        "openai/gpt-5-mini":         { input: 0.0003, output: 0.0024 }

Per-1k-token pricing. Refer to GET /v1/models for authoritative current rates; the GammaInfra dashboard reports exact billed cost regardless of what LibreChat estimates locally.

Trade-offs

Latency. ~10–50 ms overhead per request. Imperceptible for chat-UI use.
Cost. 3% top-up fee (launch) / 5% standard. Pass-through provider rates on tokens — no markup. BYOK 1–2% per request as an alternative.
Privacy. GammaInfra doesn't log prompts or responses by default. Privacy policy. LibreChat still stores chat history in its own MongoDB per its config.

Troubleshoot

Endpoint doesn't appear. YAML indent error. Make sure custom: is a list (dash prefix on the first field).
"Authentication failed" on chat. KRAKEN_API_KEY env var not loaded — check the .env file lives in the right directory and the process re-read it on restart.
Models missing from dropdown. Typo in a model name. Pull GET /v1/models for the authoritative list.

Ready to try it?

Get a GammaInfra API key →

$3 free trial credit on signup, $10 minimum top-up. Pass-through provider token rates plus 3% top-up fee during the launch window.

Frequently asked questions

How do I add GammaInfra to LibreChat?

LibreChat supports custom OpenAI-compatible endpoints. Add one with the base URL https://api.gammainfra.com/v1, your API key (referenced via an environment variable), and a default model like gammainfra/auto, then restart LibreChat. LibreChat's config file name and schema have changed across versions — see LibreChat's current custom-endpoint docs and the setup section above. The GammaInfra-side values don't change between versions.

Does LibreChat's preset feature work with smart routing?

Yes. LibreChat presets capture (endpoint, model, system prompt, temperature). Create separate presets for cost-sensitive vs quality-sensitive workflows — one preset pinned to gammainfra/cheap, another to anthropic/claude-opus-4-7. Users switch between them per conversation.

Can I see per-conversation cost in LibreChat?

LibreChat's built-in token counter shows estimated usage but doesn't read X-GammaInfra-Cost-USD response headers. For exact cost, use the GammaInfra dashboard's per-API-key view. Issue separate keys per LibreChat user (or per group) for attribution.

Does GammaInfra support LibreChat's plugin system?

LibreChat plugins typically run as tool calls in the chat completion request. GammaInfra forwards the tools[] field to whichever provider the router selects, with tool_call.id translation across providers. Plugins that need synchronous parallel tool calls work as long as the selected provider supports parallel tool calling (OpenAI gpt-5 family, Anthropic Claude 4 family, Google Gemini 3.x).

How do I set per-user rate limits in LibreChat with GammaInfra?

LibreChat enforces application-level limits in its own config. GammaInfra enforces 240 rpm per API key at the gateway. The cleanest pattern: issue one GammaInfra API key per LibreChat user (or per group), so each user has their own gateway-side rate-limit headroom. Manage keys in the GammaInfra dashboard.