January 2025, Revisited: DeepSeek R1 and the Open-Weight Shock
Eric Greene June 11, 2026This post is part of our Three-Year Retrospective series: thirty-six posts, one per month, looking back at what actually mattered in software engineering. This one covers January 2025.
On January 20, 2025, a Chinese lab that most working engineers had never thought much about released DeepSeek R1: a reasoning model competitive with OpenAI's o1 on math and coding benchmarks, with open weights under an MIT license, a published paper describing how it was trained, and a family of distilled smaller models from 1.5B to 70B parameters. Within a week it was the most-discussed release in AI, the DeepSeek app topped the App Store, and on January 27 the markets delivered their opinion: Nvidia shed hundreds of billions of dollars in market value in a single day. For one strange week, a model release was front-page general news.
What R1 actually demonstrated
Strip away the market drama and three technical facts mattered. First, reasoning was reproducible: four months after o1 introduced test-time compute behind a curtain of hidden tokens, R1 delivered comparable benchmark performance with its chain of thought fully visible, trained largely through reinforcement learning on verifiable problems — the paper's R1-Zero result, showing reasoning behavior emerging from RL without supervised examples of reasoning, was the part that startled researchers most.
Second, the cost narrative cracked. DeepSeek's published figure for training its V3 base model — on the order of a few million dollars of compute, on export-restricted hardware — was widely debated, caveated, and probably understated the full program cost. But the precise number mattered less than the direction: frontier-adjacent capability did not require frontier-lab budgets. Efficiency was a frontier too.
Third, distillation worked. The released Qwen- and Llama-based distillations meant a 32B model you could run on a beefy workstation now carried a meaningful fraction of frontier reasoning ability. That single fact rewrote what "local model" meant.
The week every enterprise asked about self-hosting
The question we heard constantly that month — from clients who had never once asked about model weights — was some variant of: could we run this ourselves? The motivations were a tangle worth separating, because we spent the spring helping teams separate them:
- Data residency and confidentiality. An open-weight model on your own GPUs means prompts and code never leave your boundary. For regulated industries this changed feasibility, not just preference.
- Provider concentration risk. A year of leapfrogging and deprecations had taught teams not to weld themselves to one API. Weights you possess cannot be deprecated out from under you.
- The DeepSeek-specific confusion. Using DeepSeek's hosted API meant sending data to servers in China — a non-starter for many. Running the MIT-licensed weights yourself, or via a Western hosting provider, carried no such data path. An enormous amount of January's discourse blurred this distinction, and unblurring it became a standard slide in our courses.
The sober engineering answer, then as now: self-hosting trades an API bill for an infrastructure and MLOps commitment, and the hosted frontier stayed ahead on raw capability. But after January 2025, "no" to local models had to be a reasoned no. The option was real.
Looking back from June 2026
The shock absorbed, the lesson stuck. Frontier labs kept their capability lead through 2025, but the open-weight ecosystem never receded: distilled and successor reasoning models kept improving, visible chains of thought and RL-on-verifiable-tasks became standard technique industry-wide, and hybrid architectures — hosted frontier models for hard problems, local models for sensitive or high-volume work — went from exotic to ordinary. The Nvidia panic, for what it's worth, reversed; cheaper reasoning meant more demand for inference, not less. January 2025 permanently added a row to every team's model decision matrix.
If your team is weighing local models seriously, the local-models module of Building AI Agents with Python and MCP gets hands-on with running and integrating open-weight models in agent stacks, and LLM Application Development with Python covers the architecture work — routing, fallbacks, data-boundary design — that makes a hosted-plus-local strategy actually operable.