Full Context Applications

December 2025

Introduction

Most people use AI in a sparse way. They upload one file or maybe fifty files and ask for a summary. That works, but it barely scratches what modern language models can actually do.

A smaller group, mostly power users of the current AI wave, are doing something different. I call these full context applications.

In a full context application, you do not try to save tokens. At all. Cost is not regarded as a constraint. You try to use the context window to its absolute maximum.

Context is compute

Think of an LLM like a computer chip. When you buy a CPU, you do not want it running at five percent usage. You want to push it close to its limits to extract value.

The context window is exactly that. It is usable compute. Leaving it empty means leaving capability unused.

Fine tuning vs full context

Another approach to AI applications is fine tuning. That absolutely has its place, especially for very narrow and highly specific tasks.

Again, the chip analogy helps. GPUs exist because graphics workloads needed specialization. But the general CPU still runs most applications we use every day.

Full context applications are the CPU path. They stay general, flexible, and powerful by feeding the model everything it might need instead of locking knowledge into weights.

And the different agents with their different contexts? Those are the applications running on the CPU. You can parallelize them or run them one by one.

Scaling laws

If scaling laws apply to LLMs in a similar way they applied to computer hardware, then one thing is very likely: the cost per token will continue to fall. When that happens, the price paid per unit of value shifts dramatically in our favor.

In simple terms, we will care less and less about token cost and more about output quality. That makes full context applications economically attractive, not reckless.

Token cost will fall. Context windows will grow.

That combination means applications which feel expensive today will feel cheap tomorrow. Tools like Claude Code already operate this way and they do not even fill the entire context window yet.

To make this tangible, I compared different context sizes so you can visualize what current context sizes actually mean in terms of text volume.

Dump everything in

If you can build agents that are smart enough to identify relevant files, the optimal strategy is simple: dump everything into the context.

Imagine every conversation, every word you say, every WhatsApp message you have ever written—all in your context. A perfect memory, in your style, with your autocomplete, your context.

The same applies to programming tools. Instead of guessing which files might matter, you include all files that are open or were recently visited. Cursor does exactly this for autocomplete. Every touched file becomes part of the model's awareness. Same for the git history, everything gets dumped into the context.

Why? No custom adjustments for different projects. Every user and every project is treated the same. No guessing. No pruning. Just full visibility.

What I learned building Ankathete

I saw the same pattern while working on Ankathete.com. Instead of aggressively filtering documents through a RAG pipeline, I often put the entire document into the context.

Why? Because I want every relevant token to exist in short term memory at the same time. That maximizes the probability that relationships are preserved. The quality jump is obvious. Everything is present at once.

Quality beats cost

When I program, I would rather pay more than accept worse quality. Power users of Claude Code probably cost Anthropic a lot today. But that equation flips over time.

What feels expensive now becomes standard later.

Specificity vs context

Different models will be used for different agents. You go for the best fitting one in relation to time and cost. You need an autocomplete? Speed is critical. You need a software architect? Quality is critical.

But for the context we stay the same: the more the better.

Conclusion

RAG pipelines exist because of constraints. If we could choose freely, we would put everything into the context.

Full context applications are not wasteful. They are forward looking. They assume cheaper tokens, larger windows, and higher expectations for quality.

And that is exactly where AI is heading.

Enjoy! ❤️