Name: Llama 3.3 70B (via Groq)
Brand: Groq (Llama)

Question 1

How much does Llama 3.3 70B (via Groq) cost through GammaInfra?

Accepted Answer

Llama 3.3 70B (via Groq) (groq/llama-3.3-70b-versatile) is billed at the Groq (Llama) pass-through rate — $0.59 per 1M input tokens and $0.79 per 1M output tokens, with 0% token markup. GammaInfra's fee is taken at top-up time (3% during the launch window through 2026-06-23, 5% after), not per token; the BYOK option is 1–2% per request instead. Every response returns X-GammaInfra-Cost-USD with the exact spend for that call.

Question 2

What is Llama 3.3 70B (via Groq)'s context window?

Accepted Answer

Llama 3.3 70B (via Groq) accepts up to 128K tokens of input context and returns up to 8K output tokens per request. GammaInfra passes the full window through with no truncation. Check this model's Specifications and notes above for any provider long-context surcharge on very large prompts.

Question 3

Does Llama 3.3 70B (via Groq) support tool calling, vision, and JSON mode?

Accepted Answer

Tool / function calling: Yes. Vision (image input): No. Native JSON / structured-output mode: Yes. When tool calling is used, GammaInfra translates tool-call IDs across providers so OpenAI-shaped agent code keeps working regardless of which provider serves the request.

Question 4

How do I call Llama 3.3 70B (via Groq) through GammaInfra?

Accepted Answer

Point any OpenAI-compatible SDK at https://api.gammainfra.com/v1 with your sk-gammainfra-... key, then set the model to groq/llama-3.3-70b-versatile to pin Llama 3.3 70B (via Groq) directly — or use gammainfra/auto to let the smart router pick it when it is the best fit. Only base_url and api_key change; the rest of your OpenAI SDK code is unchanged.

Question 5

When does GammaInfra's router pick Llama 3.3 70B (via Groq)?

Accepted Answer

Llama 3.3 70B (via Groq) heads or appears in GammaInfra's chat, code dispatcher chains. With gammainfra/auto the router selects it when a prompt classifies into one of those task types and your cost/quality preference fits; pin groq/llama-3.3-70b-versatile to force it regardless of routing. The X-GammaInfra-Endpoint response header always reports which model actually served the request.

Direction	USD per 1M tokens	USD per 1K tokens
Input	$0.5900	$0.000590
Output	$0.7900	$0.000790

Field	Value
Context window	128K
Max output	8K
Provider	groq
Streaming	Yes — OpenAI-compatible SSE

Llama 3.3 70B (via Groq)

Pricing

Capabilities

Specifications

Best for

How to call it

Smart routing — or pin this model

Related models

Ready to try it?

Frequently asked questions