Llama 3.3 70B (via Groq)
groq/llama-3.3-70b-versatile
Llama 3.3 70B served via Groq's LPU silicon. Sub-second TTFB makes it GammaInfra's default secondary in hedged-request races for gammainfra/fast. Open weights, frontier-tier on multi-step tasks.
Pricing
| Direction | USD per 1M tokens | USD per 1K tokens |
|---|---|---|
| Input | $0.5900 | $0.000590 |
| Output | $0.7900 | $0.000790 |
Pass-through provider rates via GammaInfra. No per-token markup. A 3% top-up fee (launch window through 2026-06-23, then 5%) applies on managed credits; the BYOK alternative is 1–2% per request.
Capabilities
Specifications
| Field | Value |
|---|---|
| Context window | 128K |
| Max output | 8K |
| Provider | groq |
| Streaming | Yes — OpenAI-compatible SSE |
Best for
Task labels reflect where this model heads or appears in GammaInfra's default dispatcher chains. Override per-request with the X-GammaInfra-Preference or X-GammaInfra-Cost-Quality header.
How to call it
Through GammaInfra's smart router with one of your GammaInfra API keys:
curl https://api.gammainfra.com/v1/chat/completions \
-H "Authorization: Bearer sk-gammainfra-..." \
-H "Content-Type: application/json" \
-d '{
"model": "groq/llama-3.3-70b-versatile",
"messages": [
{"role": "user", "content": "Hello, Llama 3.3 70B (via Groq)!"}
]
}'
Or via any OpenAI SDK — see the integrations page for setup with Cursor, Cline, LangChain, the Vercel AI SDK, and others.
Smart routing — or pin this model
You can call groq/llama-3.3-70b-versatile directly (as above), or let GammaInfra's router pick the best-fit model per prompt. Use gammainfra/auto as the model name for task-aware routing, gammainfra/fast for latency-optimized hedged requests, or gammainfra/cheap for cost-optimized routing. The router considers task type, latency, and your X-GammaInfra-Cost-Quality dial when picking.
Related models
Ready to try it?
$3 free trial credit on signup, $10 minimum top-up. Pass-through provider token rates plus 3% top-up fee during the launch window (5% after 2026-06-23).
Frequently asked questions
How much does Llama 3.3 70B (via Groq) cost through GammaInfra?
groq/llama-3.3-70b-versatile) is billed at the Groq (Llama) pass-through rate — $0.59 per 1M input tokens and $0.79 per 1M output tokens, with 0% token markup. GammaInfra's fee is taken at top-up time (3% during the launch window through 2026-06-23, 5% after), not per token; the BYOK option is 1–2% per request instead. Every response returns X-GammaInfra-Cost-USD with the exact spend for that call.What is Llama 3.3 70B (via Groq)'s context window?
Does Llama 3.3 70B (via Groq) support tool calling, vision, and JSON mode?
How do I call Llama 3.3 70B (via Groq) through GammaInfra?
https://api.gammainfra.com/v1 with your sk-gammainfra-... key, then set the model to groq/llama-3.3-70b-versatile to pin Llama 3.3 70B (via Groq) directly — or use gammainfra/auto to let the smart router pick it when it is the best fit. Only base_url and api_key change; the rest of your OpenAI SDK code is unchanged.When does GammaInfra's router pick Llama 3.3 70B (via Groq)?
gammainfra/auto the router selects it when a prompt classifies into one of those task types and your cost/quality preference fits; pin groq/llama-3.3-70b-versatile to force it regardless of routing. The X-GammaInfra-Endpoint response header always reports which model actually served the request.