Context Window

The context window is the maximum amount of input (measured in tokens) an LLM can process in a single inference call. Everything you want the model to consider — system prompt, conversation history, retrieved documents, tool definitions, user query — has to fit inside it.

In 2026, frontier models routinely offer 200K to 2M token context windows, up from the 4K-8K standard of 2023. That shift has changed how operators build with LLMs: many use cases that previously required complex RAG pipelines can now be solved by stuffing relevant content directly into the prompt.

Caveats: long-context performance is uneven. Models often miss information in the middle of very long contexts ("lost in the middle"). Cost and latency scale with context length. For operators, the right pattern is usually: use as much context as you need, but don't reflexively dump everything in.

Related

Get in touch