When you're about to run a Claude Code task, you pick a model. Consciously or not, you're making a judgment call: is this a Haiku job or an Opus job? Is the task simple enough to risk a cheaper model, or complex enough to need the best?

Most of the time, that call is a gut feeling. And gut feelings, at $15 per million output tokens, are expensive.

The pattern is familiar: you default to Sonnet because it's "usually fine." Or you always reach for Opus on anything important, because the cost of a failed run feels higher than the cost of a premium model. Both strategies have the same underlying problem. You're paying for certainty you don't actually have, with money you can't get back.

The model selection problem

There's a reason people overbuy on models: the downside of underfitting is painful. A task that fails because you picked too weak a model costs you the failed run plus a retry on a better model. Two runs, one result. So people hedge upward.

But the hedge has a cost. Claude Opus 4.8 runs $5 per million input tokens. Haiku 4.5 runs $0.80. On an agentic coding task that sends 50,000 tokens of context on each of 20 turns, that's the difference between a $5 run and a $0.80 run, for a task Haiku could have handled fine.

Nobody tracks this. The overspend is invisible because the successful Opus run looks like a win.

What you actually need to know before you run

The right model for a task depends on three things:

  • How many turns it will take. A task that takes 8 turns is a different problem from one that takes 40. Longer tasks re-send growing context on every step, so model choice compounds with task length.
  • Whether the cheaper model can pass it at all. Pass rate matters more than speed. A Haiku run that fails and needs an Opus retry is worse than just using Sonnet from the start.
  • What your codebase complexity looks like right now. The same task description means something different in a 200-line repo versus a 9,000-line one.

These are measurable. They're just not being measured.

synaxi-predict

synaxi-predict is a Claude Code plugin that predicts cost, turn count, and pass rate before a task runs. It's trained on 53,000 real agent runs from SWE-bench, SWE-smith, OpenHands, and real Claude Code sessions.

When you trigger a task, you get a table before anything runs:

Model              Est. cost   Turns   Pass
─────────────────────────────────────────────
single-haiku       $    0.35    28.1    8%  ◀ recommended
single-sonnet      $    0.62    18.4   11%

You pick the model. The subagent runs. Then synaxi-predict captures the actual turns, cost, and pass/fail result and records them against the prediction.

The predictions use TF-IDF on the task description combined with tree-sitter code complexity features extracted from your repo at prediction time: line counts, function density, branch depth, try/except blocks. The model has seen how tasks like yours, in codebases like yours, actually run.

On a held-out 20% test set, turn predictions land within 2x of actual 91% of the time. The pass rate classifier hits AUC-ROC of 0.91, with 84% accuracy.

The closed loop

The prediction model is trained on benchmark data. Benchmark data isn't your code.

This is why the recording matters. Every completed task writes a ground-truth record: actual turns, actual cost, actual pass/fail, and a tree-sitter snapshot of your codebase. Over time, predictions calibrate to how agent tasks actually run on your work, not how they run on SWE-bench instances.

You can also contribute. bin/contribute posts your anonymised actuals as a GitHub issue: task text, model, turns, cost, pass/fail, code features. No file contents, no diffs. The more diverse the contributions, the better the model gets for everyone.

Turn predictions currently run high for everyday Claude Code sessions — the training data skews toward longer SWE-bench-style tasks. This is what real-world contributions improve most directly.

Install

Inside any Claude Code session:

/plugin marketplace add BeadW/synaxi-predict
/plugin install synaxi-predict

Once installed, the prediction runs automatically whenever Claude spawns a subagent. No extra command needed. The model artifact (~190MB) downloads once to your platform data directory and stays there.

For CLI use, the bin/predict command runs against any task description and repo path without a Claude Code session.


synaxi-predict is open source under MIT. If you're running Claude Code tasks regularly and want to know what they should cost before they run, give it a try.

Synaxi cuts token costs on the request side, stripping tool schema duplication, stale conversation history, and structural overhead from every outgoing Claude request. synaxi-predict cuts costs on the selection side, picking the right model before the task runs. Together they cover both ends of the problem.