DeepSeek V4 Flash
deepseek/deepseek-v4-flash
DeepSeek's fast tier. Non-thinking by default. Used in code_gen, creative, and summarization fallback chains. Cheap enough to back high-volume agentic workloads.
Pricing
| Direction | USD per 1M tokens | USD per 1K tokens |
|---|---|---|
| Input | $0.1400 | $0.000140 |
| Output | $0.5500 | $0.000550 |
Pass-through provider rates via GammaInfra. No per-token markup. A 3% top-up fee (launch window through 2026-06-23, then 5%) applies on managed credits; the BYOK alternative is 1–2% per request.
Capabilities
Specifications
| Field | Value |
|---|---|
| Context window | 128K |
| Max output | 32K |
| Provider | deepseek |
| Streaming | Yes — OpenAI-compatible SSE |
Best for
Task labels reflect where this model heads or appears in GammaInfra's default dispatcher chains. Override per-request with the X-GammaInfra-Preference or X-GammaInfra-Cost-Quality header.
How to call it
Through GammaInfra's smart router with one of your GammaInfra API keys:
curl https://api.gammainfra.com/v1/chat/completions \
-H "Authorization: Bearer sk-gammainfra-..." \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-v4-flash",
"messages": [
{"role": "user", "content": "Hello, DeepSeek V4 Flash!"}
]
}'
Or via any OpenAI SDK — see the integrations page for setup with Cursor, Cline, LangChain, the Vercel AI SDK, and others.
Smart routing — or pin this model
You can call deepseek/deepseek-v4-flash directly (as above), or let GammaInfra's router pick the best-fit model per prompt. Use gammainfra/auto as the model name for task-aware routing, gammainfra/fast for latency-optimized hedged requests, or gammainfra/cheap for cost-optimized routing. The router considers task type, latency, and your X-GammaInfra-Cost-Quality dial when picking.
Related models
Ready to try it?
$3 free trial credit on signup, $10 minimum top-up. Pass-through provider token rates plus 3% top-up fee during the launch window (5% after 2026-06-23).
Frequently asked questions
How much does DeepSeek V4 Flash cost through GammaInfra?
deepseek/deepseek-v4-flash) is billed at the DeepSeek pass-through rate — $0.14 per 1M input tokens and $0.55 per 1M output tokens, with 0% token markup. GammaInfra's fee is taken at top-up time (3% during the launch window through 2026-06-23, 5% after), not per token; the BYOK option is 1–2% per request instead. Every response returns X-GammaInfra-Cost-USD with the exact spend for that call.What is DeepSeek V4 Flash's context window?
Does DeepSeek V4 Flash support tool calling, vision, and JSON mode?
How do I call DeepSeek V4 Flash through GammaInfra?
https://api.gammainfra.com/v1 with your sk-gammainfra-... key, then set the model to deepseek/deepseek-v4-flash to pin DeepSeek V4 Flash directly — or use gammainfra/auto to let the smart router pick it when it is the best fit. Only base_url and api_key change; the rest of your OpenAI SDK code is unchanged.When does GammaInfra's router pick DeepSeek V4 Flash?
gammainfra/auto the router selects it when a prompt classifies into one of those task types and your cost/quality preference fits; pin deepseek/deepseek-v4-flash to force it regardless of routing. The X-GammaInfra-Endpoint response header always reports which model actually served the request.