Blog

Posts on how we build GammaInfra — smart routing design, the bug stories from running it in production, and the methodology behind shipping a solo-founder LLM routing service.

Every agent step deserves a different model
Agent loops compound every weakness of using one model for everything. A field guide to per-step model variance, tail-latency budgets, fallback chains, cost-runaway observability, and the cross-provider tool_call.id papercut — with code samples for Claude Agent SDK, LangGraph, and OpenAI Agents SDK.
Why every GammaInfra response carries a cost-USD header
LLM API providers report tokens. Developers care about dollars. How we compute X-GammaInfra-Cost-USD per request, why it's harder than it sounds (per-direction split, long-context surcharges, fallback cascades), and how to read it from common SDKs.
Designing a continuous cost/quality dial for LLM routing
Most LLM routers force a discrete tier — cheap, balanced, quality. We added a continuous 0.0..1.0 dial via one request header. Why continuous beats discrete, what we tried and threw away, how it maps to actual model picks.
Show HN retro — what we learned launching GammaInfraAfter May 14
A founder's retro on launching GammaInfra's smart routing service to Show HN — front-page traffic numbers, comment-thread themes, the bugs HN-day traffic surfaced, and what's next.

Frequently asked questions

What does the GammaInfra blog cover?
Engineering write-ups on smart LLM routing: per-step model selection in agent loops, the continuous cost/quality dial design, why every response carries a cost-USD header, and the trade-offs behind the routing decisions. It is the deeper-dive layer above the API docs and glossary.
Is GammaInfra's routing approach documented elsewhere?
Yes — the API reference is at docs.gammainfra.com and conceptual definitions are in the glossary. The blog is where design rationale and measured results live; the docs are the reference contract.
How do I try the routing behavior described in these posts?
Point any OpenAI-compatible SDK at https://api.gammainfra.com/v1 with a GammaInfra key and use model gammainfra/auto. Every behavior the posts describe — per-step routing, the cost-quality dial, the cost-USD header, fallback chains — is live on that endpoint with no extra configuration.
How often is the blog updated?
Posts ship alongside notable routing or observability work rather than on a fixed calendar. Each post is dated and the engineering claims are reproducible against the live API at publication time.