Inference Cost
How much it costs to run an LLM inference call.
Inference cost is the per-call expense of using an AI model. For frontier APIs, it's typically priced per million input tokens and per million output tokens, with output usually 3-5x more expensive than input.
For operators building AI products, inference cost is the line that determines unit economics. A naive agent design that uses 100K tokens per user session may be unprofitable; the same agent with prompt caching, model routing (cheap models for easy steps), and bounded loops may be highly profitable.
Key cost levers in 2026: prompt caching (cuts cost for repeated prefixes by 50-90%), model selection (a 10x cost gap between fastest and smartest tiers), batch processing (cheaper if latency-tolerant), and avoiding accidental loops in agent code. Most teams who say "AI is too expensive for us" haven't deeply optimized any of these.
Get the weekly digest
New tools, reviews, and prompts every Friday.