March 2024, Revisited: Devin and the Agentic Hype Cycle

Eric Greene June 11, 2026

Our Three-Year Retrospective reaches March 2024 and one of the most effective product demos of the decade. On March 12, a startup called Cognition introduced Devin, billed without hedging as "the first AI software engineer." The launch video showed Devin taking a task in plain English, making a plan, writing code in its own editor, running commands in its own shell, browsing documentation, hitting errors, and debugging its way through them — autonomously, over a long horizon. For about a week, it was the only thing the industry talked about.

What the demo actually showed

Strip away the framing and the technical content was a coherent architecture: an LLM wrapped in a persistent workspace — shell, editor, browser — with a planning loop that decomposed tasks, executed steps, observed results, and revised. Cognition reported a then-striking score on SWE-bench, the benchmark of real GitHub issues from open-source Python repos: Devin resolved 13.86% of issues end-to-end unassisted, against low-single-digit baselines for unassisted models. The demos included completing a freelance job from Upwork, a framing choice that did a lot of the viral work and aged the worst.

Reactions split on predictable lines: breathless "the end of programming" takes, equally reflexive dismissals, and a smaller group asking the productive question — not "is this real?" but "what is the actual delta between this demo and a tool I would trust?"

What the skeptics got right

Quite a lot, and faster than usual. Within weeks, careful reviewers were picking the demos apart, showing that some showcased tasks were simpler than presented, that Devin sometimes fabricated work — including debugging errors it had itself introduced — and that wall-clock times were far longer than the edited videos implied. The deeper critiques held up too. A 13.86% benchmark score means six failures in seven attempts, and in software a confident 86% failure mode is worse than useless without supervision — someone must review everything, and reviewing an agent's sprawling, plausible-looking work is genuinely harder than reviewing a colleague's. The demo format hid exactly the things practitioners needed to see: the failure cases, the cost, the babysitting.

The "AI software engineer" framing earned its backlash as well. Engineering is requirements negotiation, trade-off judgment, and accountability — not just ticket-to-PR conversion. Naming the product like a hire rather than a tool generated headlines and burned trust, a trade that looked clever in March and expensive by summer.

What the demo got right anyway

Here is the part that mattered: the architecture was correct. Agent-with-tools — plan, execute in a real environment, observe, retry — was the right shape, and essentially everything that came after used it. The demo's core insight, that models had become capable enough to sustain long-horizon multi-step work rather than single-shot completions, was true and consequential. SWE-bench, for all the arguments about it, became the shared yardstick the field actually optimized against, and those scores did not stay at 13.86% for long.

Our advice to teams that spring was to ignore both the hype and the debunkings and watch the trendline: autocomplete had gone from toy to standard in two years, and there was no reason to expect agents to move slower. That read aged well.

Looking back from June 2026

The hype cycle ran its course — peak, backlash, trough — and then the technology quietly climbed out the other side. Coding agents in 2026 are normal tools: scoped to tasks rather than job titles, run under review gates and CI like any other contributor, and genuinely useful in a way the March 2024 demo promised but could not deliver. Cognition's framing was wrong and its thesis was right, which is roughly the modal outcome for demos that launch an era. The durable skill was never picking the winning vendor; it was learning to specify, supervise, and review delegated work — a skill that looks suspiciously like engineering management.

That supervisory craft is what we teach in Building with Coding Agents — scoping tasks agents can actually complete, structuring repos and tests so agents fail loudly instead of plausibly, and reviewing agent output without rubber-stamping it. If your team is earlier in the journey, AI-Assisted Software Engineering Fundamentals builds the judgment that the March 2024 discourse, on both sides, mostly skipped.