Most coding work doesn’t need the most expensive model in the room. Planning needs reasoning. Implementation needs adherence. Debugging needs reasoning again. Documentation needs context window. Buying premium reasoning tokens for every prompt is the AI-tooling equivalent of taking a taxi to your own kitchen. This is the tiered setup I run, the reasoning tax I refuse to pay, and the per-session math that made me stop apologising for cheap models.
The pattern that costs you
The default behaviour in most agent UIs is one model for everything. Pick the smartest one you can afford and let it do the planning, the writing, the debugging, and the chit-chat. That model is, almost certainly, an expensive reasoning model — the kind that thinks-out-loud for half a minute before producing its first line of code.
This is the wrong default. Not because reasoning models are bad — they’re great at what they’re for — but because most prompts in a coding session don’t actually need reasoning. They need instruction-following. “Write the unit test for the function I just added.” “Add error handling around this fetch.” “Rename this variable everywhere.” A reasoning model doing those tasks is a graduate student helping you wash the dishes. They will do it. You will pay for the privilege.
The tax has a name in the bill: thinking tokens. Reasoning models bill you for the internal monologue you never see. A 30-second thinking pass before a 200-token output is paying for both, and one of them is invisible.
The four jobs in a coding session
I started writing down what I was actually asking the model to do. The list was short:
| Job | What it needs | What it doesn’t need |
|---|---|---|
| Plan | Reasoning, broad context, architectural taste | Speed, tight latency |
| Implement | Instruction-following, fast iteration, low cost-per-call | Deep reasoning between every prompt |
| Debug | Reasoning, hypothesis-testing, careful reading | Speed, low cost (it’s worth paying for) |
| Browse / ask | Massive context window, fast reads, low cost | Sophisticated reasoning |
Four jobs. Four shapes of model. One model is going to be wrong for at least three of them.
The tiered setup
Once you accept that the four jobs are different, the assignment writes itself:
flowchart LR
PLAN[Plan / architect] -->|reasoning class| A[Premium reasoning model<br/>e.g. Claude / GPT reasoning tier]
IMPL[Implement / write code] -->|instruction-following| B[Fast cheap coder model<br/>e.g. DeepSeek-class / Llama 4-class]
DEBUG[Debug / fix] -->|reasoning class again| A
ASK[Ask / browse repo] -->|long-context fast| C[Long-context fast model<br/>e.g. Gemini Flash-class]
style A fill:#e3edff,stroke:#1f4eb0
style B fill:#e6f4ea,stroke:#2c7a3f
style C fill:#fff3cd,stroke:#b58900
Three model classes, four jobs, two models doing double duty (planning and debugging both want reasoning). The “premium tier” prompts are rare — maybe 10-15 in a session. The “fast cheap” prompts are most of the volume. The “long-context fast” prompts are bursty: zero on most days, a flurry when you’re trying to understand someone else’s code.
I’m deliberately not naming specific models here. The models that fill each slot will change every few months. The slots will not.
What the per-session math looks like
A representative session of mine — feature work, not a marathon:
| Job | Prompts | $/prompt | Subtotal |
|---|---|---|---|
| Plan (reasoning model) | 10 | ~$0.10 | $1.00 |
| Implement (cheap coder) | 100 | ~$0.01 | $1.00 |
| Debug (reasoning model) | 5 | ~$0.10 | $0.50 |
| Ask (long-context Flash) | 20 | ~$0.002 | $0.04 |
| Total | 135 | ~$2.54 |
The same 135 prompts run entirely on a premium reasoning model would land somewhere between $13 and $15. The four-way split costs me less than a fifth as much, and the quality of the output isn’t measurably worse — because the cheap-coder model is doing exactly what cheap-coder models are great at (writing the code), and the reasoning model is doing exactly what it’s great at (deciding what to write).
The cost gap is what got me to switch. The quality gap is what made me stay switched.
When to spend, when not to spend
Three rules I’ve internalised:
Spend on planning. This is the prompt that decides the next ten prompts. Saving fifty cents on the planning prompt and then writing the wrong code for an hour is the worst ROI in software. A planning prompt where I pay a premium for a model that gets the architecture right pays for itself in implementation prompts I never had to throw away.
Spend on debugging. Cheap models are worse at debugging than they are at writing. They’ll happily produce three different “fixes” that all miss the actual bug, and you’ll pay for each one. A single reasoning-model debug pass that finds the real problem is cheaper than five cheap-model fix attempts that don’t.
Don’t spend on implementation. Cheap, fast, accurate models are extremely good at writing the exact code a tight spec asks for. They will gleefully regenerate a function five times to get the imports right. Let them. Use the savings to spend more on the prompts where it actually matters.
The trap I keep falling into
The trap, which I have fallen into and will fall into again, is upgrading mid-session because the cheap model got a thing wrong. Cheap model produces wrong code → frustration → “I’ll just switch to the smart model for this one prompt” → smart model also produces wrong code, because the spec was the problem, not the model.
When the cheap model gets it wrong, the right move is almost always to go back to planning, not up the model tier. The smart model has the same input you just gave the cheap model. It will hit the same wall. You’ll pay reasoning rates for the privilege of hitting it harder.
The rule, written down so I stop forgetting it: if the cheap model gets it wrong twice, the next prompt is in the planning tier, not a model upgrade.
What this is, in the end
The reasoning tax is a real thing. Premium reasoning models price their thinking time at a premium because thinking time is genuinely valuable — when you need it. You don’t need it for “add a console.log here.” You need it for “should this be a queue or a stream?” The whole game of using AI tooling well is learning which prompt is which, and paying for the right kind of intelligence on each one.
The cheap implementation models in 2026 are extraordinary. The fast long-context models are extraordinary. The premium reasoning models are extraordinary. They’re not in competition; they’re complements. Use them like complements and the bill drops 80%. Use them like substitutes and you’ll either pay too much or get worse output. There’s no third option that involves a single model and a happy ending.
Related: Spec engineering for AI-assisted delivery — because the cheaper your implementation tier, the more your specs are doing the work.