Grok 4.1 Fast (non-reasoning)
grok/grok-4-1-fast-non-reasoning
xAI's fast non-reasoning model. Used in chat dispatcher chain fallback. 256K context window — useful for long-conversation chat workloads.
Pricing
| Direction | USD per 1M tokens | USD per 1K tokens |
|---|---|---|
| Input | $0.3000 | $0.000300 |
| Output | $1.2000 | $0.001200 |
Pass-through provider rates via GammaInfra. No per-token markup. A 3% top-up fee (launch window through 2026-06-23, then 5%) applies on managed credits; the BYOK alternative is 1–2% per request.
Capabilities
Specifications
| Field | Value |
|---|---|
| Context window | 256K |
| Max output | 32K |
| Provider | grok |
| Streaming | Yes — OpenAI-compatible SSE |
Best for
Task labels reflect where this model heads or appears in GammaInfra's default dispatcher chains. Override per-request with the X-GammaInfra-Preference or X-GammaInfra-Cost-Quality header.
How to call it
Through GammaInfra's smart router with one of your GammaInfra API keys:
curl https://api.gammainfra.com/v1/chat/completions \
-H "Authorization: Bearer sk-gammainfra-..." \
-H "Content-Type: application/json" \
-d '{
"model": "grok/grok-4-1-fast-non-reasoning",
"messages": [
{"role": "user", "content": "Hello, Grok 4.1 Fast (non-reasoning)!"}
]
}'
Or via any OpenAI SDK — see the integrations page for setup with Cursor, Cline, LangChain, the Vercel AI SDK, and others.
Smart routing — or pin this model
You can call grok/grok-4-1-fast-non-reasoning directly (as above), or let GammaInfra's router pick the best-fit model per prompt. Use gammainfra/auto as the model name for task-aware routing, gammainfra/fast for latency-optimized hedged requests, or gammainfra/cheap for cost-optimized routing. The router considers task type, latency, and your X-GammaInfra-Cost-Quality dial when picking.
Related models
Ready to try it?
$3 free trial credit on signup, $10 minimum top-up. Pass-through provider token rates plus 3% top-up fee during the launch window (5% after 2026-06-23).
Frequently asked questions
How much does Grok 4.1 Fast (non-reasoning) cost through GammaInfra?
grok/grok-4-1-fast-non-reasoning) is billed at the xAI (Grok) pass-through rate — $0.3 per 1M input tokens and $1.2 per 1M output tokens, with 0% token markup. GammaInfra's fee is taken at top-up time (3% during the launch window through 2026-06-23, 5% after), not per token; the BYOK option is 1–2% per request instead. Every response returns X-GammaInfra-Cost-USD with the exact spend for that call.What is Grok 4.1 Fast (non-reasoning)'s context window?
Does Grok 4.1 Fast (non-reasoning) support tool calling, vision, and JSON mode?
How do I call Grok 4.1 Fast (non-reasoning) through GammaInfra?
https://api.gammainfra.com/v1 with your sk-gammainfra-... key, then set the model to grok/grok-4-1-fast-non-reasoning to pin Grok 4.1 Fast (non-reasoning) directly — or use gammainfra/auto to let the smart router pick it when it is the best fit. Only base_url and api_key change; the rest of your OpenAI SDK code is unchanged.When does GammaInfra's router pick Grok 4.1 Fast (non-reasoning)?
gammainfra/auto the router selects it when a prompt classifies into one of those task types and your cost/quality preference fits; pin grok/grok-4-1-fast-non-reasoning to force it regardless of routing. The X-GammaInfra-Endpoint response header always reports which model actually served the request.