Before You Pick a Model

When you're about to run a Claude Code task, you pick a model. Consciously or not, you're making a judgment call: is this a Haiku job or an Opus job? Is the task simple enough to risk a cheaper model, or complex enough to need the best?

Most of the time, that call is a gut feeling. And gut feelings, at $15 per million output tokens, are expensive.

The pattern is familiar: you default to Sonnet because it's "usually fine." Or you always reach for Opus on anything important, because the cost of a failed run feels higher than the cost of a premium model. Both strategies have the same underlying problem. You're paying for certainty you don't actually have, with money you can't get back.

The model selection problem

There's a reason people overbuy on models: the downside of underfitting is painful. A task that fails because you picked too weak a model costs you the failed run plus a retry on a better model. Two runs, one result. So people hedge upward.

But the hedge has a cost. Claude Opus 4.8 runs $5 per million input tokens. Haiku 4.5 runs $0.80. On an agentic coding task that sends 50,000 tokens of context on each of 20 turns, that's the difference between a $5 run and a $0.80 run, for a task Haiku could have handled fine.

Nobody tracks this. The overspend is invisible because the successful Opus run looks like a win.

What you actually need to know before you run

The right model for a task depends on three things:

How many turns it will take. A task that takes 8 turns is a different problem from one that takes 40. Longer tasks re-send growing context on every step, so model choice compounds with task length.
Whether the cheaper model can pass it at all. Pass rate matters more than speed. A Haiku run that fails and needs an Opus retry is worse than just using Sonnet from the start.
What your codebase complexity looks like right now. The same task description means something different in a 200-line repo versus a 9,000-line one.

These are measurable. They're just not being measured.

synaxi-predict

synaxi-predict is a Claude Code plugin that predicts cost, turn count, and pass rate before a task runs. It's trained on 53,000 real agent runs from SWE-bench, SWE-smith, OpenHands, and real Claude Code sessions.

When you trigger a task, you get a table before anything runs:

Model              Est. cost   Turns   Pass
─────────────────────────────────────────────
single-haiku       $    0.35    28.1    8%  ◀ recommended
single-sonnet      $    0.62    18.4   11%

You pick the model. The subagent runs. Then synaxi-predict captures the actual turns, cost, and pass/fail result and records them against the prediction.

The predictions use TF-IDF on the task description combined with tree-sitter code complexity features extracted from your repo at prediction time: line counts, function density, branch depth, try/except blocks. The model has seen how tasks like yours, in codebases like yours, actually run.

On a held-out 20% test set, turn predictions land within 2x of actual 91% of the time. The pass rate classifier hits AUC-ROC of 0.91, with 84% accuracy.

The closed loop

The prediction model is trained on benchmark data. Benchmark data isn't your code.

This is why the recording matters. Every completed task writes a ground-truth record: actual turns, actual cost, actual pass/fail, and a tree-sitter snapshot of your codebase. Over time, predictions calibrate to how agent tasks actually run on your work, not how they run on SWE-bench instances.

You can also contribute. bin/contribute posts your anonymised actuals as a GitHub issue: task text, model, turns, cost, pass/fail, code features. No file contents, no diffs. The more diverse the contributions, the better the model gets for everyone.

Turn predictions currently run high for everyday Claude Code sessions — the training data skews toward longer SWE-bench-style tasks. This is what real-world contributions improve most directly.

Install

Inside any Claude Code session:

/plugin marketplace add BeadW/synaxi-predict
/plugin install synaxi-predict

Once installed, the prediction runs automatically whenever Claude spawns a subagent. No extra command needed. The model artifact (~190MB) downloads once to your platform data directory and stays there.

For CLI use, the bin/predict command runs against any task description and repo path without a Claude Code session.

synaxi-predict is open source under MIT. If you're running Claude Code tasks regularly and want to know what they should cost before they run, give it a try.

Synaxi cuts token costs on the request side, stripping tool schema duplication, stale conversation history, and structural overhead from every outgoing Claude request. synaxi-predict cuts costs on the selection side, picking the right model before the task runs. Together they cover both ends of the problem.

The model selection problem

What you actually need to know before you run

synaxi-predict

The closed loop

Install

Get new posts in your inbox.