When you're about to run a Claude Code task, you pick a model. Consciously or not, you're making a judgment call: is this a Haiku job or an Opus job? Is the task simple enough to risk a cheaper model, or complex enough to need the best?
Most of the time, that call is a gut feeling. And gut feelings, at $15 per million output tokens, are expensive.
The pattern is familiar: you default to Sonnet because it's "usually fine." Or you always reach for Opus on anything important, because the cost of a failed run feels higher than the cost of a premium model. Both strategies have the same underlying problem. You're paying for certainty you don't actually have, with money you can't get back.
The model selection problem
There's a reason people overbuy on models: the downside of underfitting is painful. A task that fails because you picked too weak a model costs you the failed run plus a retry on a better model. Two runs, one result. So people hedge upward.
But the hedge has a cost. Claude Opus 4.8 runs $5 per million input tokens. Haiku 4.5 runs $0.80. On an agentic coding task that sends 50,000 tokens of context on each of 20 turns, that's the difference between a $5 run and a $0.80 run, for a task Haiku could have handled fine.
Nobody tracks this. The overspend is invisible because the successful Opus run looks like a win.
What you actually need to know before you run
The right model for a task depends on three things:
- How many turns it will take. A task that takes 8 turns is a different problem from one that takes 40. Longer tasks re-send growing context on every step, so model choice compounds with task length.
- Whether the cheaper model can pass it at all. Pass rate matters more than speed. A Haiku run that fails and needs an Opus retry is worse than just using Sonnet from the start.
- What your codebase complexity looks like right now. The same task description means something different in a 200-line repo versus a 9,000-line one.
These are measurable. They're just not being measured.
synaxi-predict
synaxi-predict is a Claude Code plugin that predicts cost, turn count, and pass rate before a task runs. It's trained on 53,000 real agent runs from SWE-bench, SWE-smith, OpenHands, and real Claude Code sessions.
When you trigger a task, you get a table before anything runs:
Model Est. cost Turns Pass
─────────────────────────────────────────────
single-haiku $ 0.35 28.1 8% ◀ recommended
single-sonnet $ 0.62 18.4 11%
You pick the model. The subagent runs. Then synaxi-predict captures the actual turns, cost, and pass/fail result and records them against the prediction.
The predictions use TF-IDF on the task description combined with tree-sitter code complexity features extracted from your repo at prediction time: line counts, function density, branch depth, try/except blocks. The model has seen how tasks like yours, in codebases like yours, actually run.
On a held-out 20% test set, turn predictions land within 2x of actual 91% of the time. The pass rate classifier hits AUC-ROC of 0.91, with 84% accuracy.
The closed loop
The prediction model is trained on benchmark data. Benchmark data isn't your code.
This is why the recording matters. Every completed task writes a ground-truth record: actual turns, actual cost, actual pass/fail, and a tree-sitter snapshot of your codebase. Over time, predictions calibrate to how agent tasks actually run on your work, not how they run on SWE-bench instances.
You can also contribute. bin/contribute posts your anonymised actuals as a GitHub issue: task text, model, turns, cost, pass/fail, code features. No file contents, no diffs. The more diverse the contributions, the better the model gets for everyone.
Turn predictions currently run high for everyday Claude Code sessions — the training data skews toward longer SWE-bench-style tasks. This is what real-world contributions improve most directly.
Install
Inside any Claude Code session:
/plugin marketplace add BeadW/synaxi-predict
/plugin install synaxi-predict
Once installed, the prediction runs automatically whenever Claude spawns a subagent. No extra command needed. The model artifact (~190MB) downloads once to your platform data directory and stays there.
For CLI use, the bin/predict command runs against any task description and repo path without a Claude Code session.
synaxi-predict is open source under MIT. If you're running Claude Code tasks regularly and want to know what they should cost before they run, give it a try.
Synaxi cuts token costs on the request side, stripping tool schema duplication, stale conversation history, and structural overhead from every outgoing Claude request. synaxi-predict cuts costs on the selection side, picking the right model before the task runs. Together they cover both ends of the problem.