What is a Bedrock cross-region inference profile?

Cross-region inference profile — an Amazon Bedrock model ID prefix (e.g. us., eu., apac.) that routes inference requests across multiple AWS regions within a geographic group rather than to a single region. Required for many newer foundation models (Claude 4 family, Amazon Nova) on Bedrock; older models reject the prefix and must be invoked with the bare model ID.

What the prefix means

AWS Bedrock model IDs come in two shapes: bare (meta.llama3-70b-instruct-v1:0) and inference-profile-prefixed (us.anthropic.claude-sonnet-4-6). The prefix is geographic: us. routes across US regions (us-east-1, us-east-2, us-west-2), eu. across EU regions, apac. across APAC regions.

When a model has a cross-region inference profile, AWS load-balances the request across the regions in that group. For newer models with high demand and constrained capacity (Claude 4, Nova), this is the only way to get reliable throughput. Bare-ID invocation returns ValidationException: on-demand throughput isn't supported for these models.

When the prefix is required vs forbidden

How GammaInfra handles this

GammaInfra's Bedrock adapter ships with a hand-curated catalog of model IDs verified live against the Bedrock API. Each entry uses the correct form per model: us. prefix for Claude 4 and Nova, bare ID for Llama and Mistral.

When a region constraint is set via X-GammaInfra-Region: us (or eu, apac, or an exact AWS region like us-east-1), the router filters endpoints to those whose region tag matches. Currently only US-region Bedrock endpoints are in the catalog; EU and APAC entries will land when the AWS account configures access in those regions.

Why this matters for multi-region deployments

A multi-region application running in eu-west-1 should not call us.anthropic.claude-sonnet-4-6 — the request crosses the Atlantic, adding 100+ ms of latency per call. The correct EU-resident invocation would be eu.anthropic.claude-sonnet-4-6 from an account with EU model access provisioned.

GammaInfra is currently single-region (US-east) by architectural decision. Multi-region is on the roadmap, but the cross-region inference profile structure means that adding EU and APAC entries to the catalog is the integration surface — not running gateway pods in multiple regions.

Common questions

Why does Bedrock have this prefix at all?
To handle newer-model capacity constraints. When Anthropic shipped Claude Opus 4 on Bedrock, demand exceeded any single region's reserved capacity. AWS introduced cross-region inference profiles as the load-balancing layer — your request routes to whichever region in the group has headroom right now. It's hidden from the caller but makes the new model usable.
How do I know which prefix to use for a specific model?
Check the AWS Bedrock console for that model. Each model's detail page shows a "Cross-region inference" badge if it requires a profile prefix. Copy the exact Model ID string from the console — don't try to construct it. The console reflects which profiles your account has access to.
Does the prefix affect billing or compliance?
Billing: requests are charged at the model's standard per-token rate regardless of which region in the group served them. Compliance: data residency depends on which regions are in the inference profile group. us. routes only across US regions, so data stays in the US. eu. routes across EU regions only. apac. across APAC. If your compliance requires a specific region (e.g. eu-west-1 only), you need a single-region invocation rather than cross-region — but those are not available for the newer models.
Will the prefix become required for all Bedrock models eventually?
Probably, for newer models. AWS has consistently introduced new Anthropic and Amazon models with the prefix required, and the older bare-ID models stay on legacy invocation paths. Foundation models added in 2026+ should be assumed to require a cross-region profile.
Is this the same as AWS Bedrock Converse vs InvokeModel?
No — separate concepts. Converse vs InvokeModel is the API surface (unified vs per-model). Cross-region inference profile is the model ID shape that controls which region serves the request. Both Converse and InvokeModel accept either bare or prefixed model IDs, subject to the per-model rules above.

Try the gateway

Get a GammaInfra API key →

$3 free trial credit on signup, $10 minimum top-up. Pass-through provider rates plus 3% top-up fee during the launch window (5% after 2026-06-23).

Last updated 2026-05-15.