← All news
Research

Why a sidekick beats a bigger context window

More context isn't free. The case for offloading data-heavy work instead of paying to stuff it into your prompt.

The Lineman team

Every few months the context window gets bigger, and the temptation comes back: if the model can hold more, just give it more. Read the whole file. Paste the whole log. Pull in the entire repo. The window is huge now, so why be precious about it?

Because a bigger window isn't a free one. More context costs more, and, less obviously, it can make the model worse at the very thing you're paying it to do. Handing the data-heavy work to a sidekick beats stuffing it into a larger prompt on both counts.

More context still costs you

The cost argument is the simple one. You pay per token, and a large context window doesn't change the price of a token. It just removes the ceiling on how many you can spend. A 1,200-line file you needed three functions from costs the same whether your window is 200K or 2M. A growing window only means nothing stops you from reading all 1,200 lines at your frontier model's rate, over and over, for the rest of the session. The window didn't solve the data tax. It removed the guardrail that was holding it back.

And it can blunt the model

The subtler problem is focus. A model's attention is finite even when its context isn't. Bury the one failing assertion in 400 lines of passing test output and you've made it harder to find, not easier. You've handed the model a haystack and pointed at the needle you buried in it. Signal diluted across a vast prompt is still diluted.

A bigger window lets you carry more. It doesn't make you better at finding what you carried.

The sidekick approach

Lineman takes the other path. Rather than widening the prompt to fit the raw data, a fast secondary model compresses that data down to what matters before it reaches your primary model. The reasoning model gets a focused summary (the symbols, the failure, the relevant section) and never spends attention or budget on the rest.

The result is a context window that stays lean on purpose:

  • Lower cost. The high-volume reading happens on cheaper infrastructure, not at frontier rates.
  • Sharper focus. The reasoning model's attention lands on the problem rather than the noise.
  • Headroom. A session that isn't bloated with raw data runs longer before it has to compact or restart.

The numbers bear it out: a 53% average token reduction across 180 tasks in six benchmark suites, while retaining 98.3% of baseline output quality. Same answers, a fraction of the spend. See the benchmarks for the breakdown.

A bigger window only helps if you can afford to fill it. Most of the time you'd rather not. Start the free trial and watch the savings land on your first large file.

Related