December 2025, Revisited: The Year Assistants Became Agents

Eric Greene June 11, 2026

This post is part of our Three-Year Retrospective series: thirty-six posts, one per month, looking back at what actually mattered in software engineering. This one wraps up 2025.

Every December we try to name the year. For 2025 the name was never in doubt: it was the year assistants became agents. In January, the standard interaction with an AI coding tool was still mostly conversational — autocomplete, chat, the occasional multi-file edit you watched like a hawk. By December, every major platform offered some version of the same loop: hand the system an issue, let it explore the repository, write code, run tests, iterate on failures, and open a pull request. The unit of AI work went from the suggestion to the task.

How fast it happened

Looking back through this series' own 2025 entries tells the story in increments. Terminal agents went mainstream in the first half of the year — Claude Code's rise, Codex CLI, then Gemini CLI's open-source, free-tier launch in June. The Windsurf saga in July showed how strategically valuable agentic tooling had become. GPT-5 in August and the frontier releases that followed kept raising the ceiling on how long a task a model could carry without losing the plot — and long-horizon reliability, more than raw intelligence, was the capability that made agents practical. Asynchronous, cloud-hosted agents matured through the autumn: by year's end, "assign this issue to the agent and review the PR tomorrow" was not a demo, it was a workflow with logos on it — GitHub's coding agent, Devin, Codex's cloud tasks, Cursor's background agents, Claude Code on the web among them.

The numbers backed the vibe. Surveys through 2025 put AI tool usage among professional developers above eighty percent, while — tellingly — trust in AI output declined in the same surveys. Developers were using these tools more and believing them less. That tension was the year's defining texture, and the teams that noticed it early built the right habits.

The supervision spectrum

The most useful mental model we taught in 2025 — and the one that organized our whole curriculum by December — was the supervision spectrum. Agentic work isn't one thing; it's a dial. At one end, synchronous pairing: the agent works in your terminal, you watch every step, approval gates on anything destructive. In the middle, delegated tasks with checkpoints: the agent works a scoped ticket while you do something else, and you inspect the plan and the diff. At the far end, autonomous issue-to-PR: the agent works unattended and your only touchpoint is review.

The mistake teams made all year was treating the dial as a maturity ladder — as if the goal were to crank everything to autonomous. The teams that got real value matched the supervision level to the task: well-specified, well-tested, low-blast-radius work could run autonomously; ambiguous, architectural, or security-sensitive work stayed synchronous. "What supervision level does this task deserve?" became, in our trainings, the first question of agentic engineering.

Review became the bottleneck — and the skill

If 2025 had a second lesson, it was this: when generation gets cheap, review becomes the constraint. Teams learned, sometimes painfully, that agent output fails differently than human output — confidently, plausibly, and at volume. Code that looks idiomatic but subtly misreads a requirement; tests that pass because they assert the bug; dependencies added casually. The countermeasures that emerged were mostly classical engineering rediscovered under pressure: smaller PRs, stronger test suites as guardrails, CI as the agent's leash, explicit standards files in the repo so agents inherit conventions, and review checklists tuned for machine-generated failure modes. By December, "how do we review agent output without becoming a rubber stamp?" was the most common question in our courses — displacing "which tool should we buy?", which had dominated January.

Looking back from June 2026

Six months on, the assistants-to-agents framing has only hardened — 2026 so far has been about scale (fleets of agents, parallel worktrees, agent-first interfaces) rather than any reversal. The supervision spectrum is now common vocabulary, and review skills have become a hiring signal. If anything, December 2025's open question — what happens to engineering teams when generation is no longer the constraint — is still the question.

If your team is calibrating its own supervision dial, AI-Assisted Development with Python builds the day-to-day agentic workflow from the ground up, and Agentic Code Review and PR Automation tackles the review bottleneck directly — including how to use agents to review agents without losing the human judgment that makes review work.