Survival rate for CLI-first agents

A year ago, AI code mostly meant autocomplete in an editor. Not any more.

A growing share of real work happens in the terminal. An agent takes a task, edits a dozen files, runs the tests, and reports back, often with no human watching a keystroke land. If your measurement only sees VS Code, you’re blind to some of the most productive and most expensive AI on the team.

Same metric, new surface

Source Trace now recognises the common CLI-first agents, built around Codex, Claude, Gemini, OpenCode and Kilo Code among others, and attributes their work with the same line-level precision as the editor extension. We detect the agent and model, write it to git notes, and measure what survives. Their survival rate sits on the same axes as your IDE tools, directly comparable.

That comparison matters because terminal agents fail differently. An IDE assistant makes small edits you vet as you go. A CLI agent produces a large, confident batch in one shot. That’s great when it survives and painful when it doesn’t, because there’s more to throw away. In the data, some agents post excellent one-shot survival on well-specified tasks and fall off a cliff on ambiguous ones. Knowing which is which on your codebase beats a vendor’s demo.

Note: detection leans on the signals each agent leaves behind, and we’d rather show nothing than a confident wrong attribution, so we filter the low-confidence cases. There’s one edge we don’t cover yet: cloud and background agents that run entirely server-side and commit on their own. Attribution happens where the work happens, and that one is still open.

Editor and terminal now land on the same chart, as long as the VS Code is active. Once you know the true code output of each AI model, what decision will you make for your AI budget?