February 2024, Revisited: Python Task Pipelines in the KRaft Era

Eric Greene June 11, 2026

Our Three-Year Retrospective reaches February 2024 and Apache Kafka 3.7, released at the end of the month. On paper it was an incremental release. In practice it marked the moment the KRaft era became the only story worth planning around — and, for the Python data-engineering teams we work with, it sharpened a question that had been blurring for years: now that Kafka is operationally simpler, does it replace your Celery-and-RabbitMQ task infrastructure? (Spoiler from the field, then and now: usually not, and knowing why is the valuable part.)

Kafka 3.7 and the end of the ZooKeeper tax

KRaft — Kafka's built-in Raft-based consensus replacing ZooKeeper for metadata — had been production-ready since 3.3, but 3.7 made the trajectory unambiguous. It was positioned as the final 3.x minor release, with Kafka 4.0 ahead and ZooKeeper mode removed entirely in it. The release filled in remaining KRaft gaps, notably early access to JBOD support in KRaft clusters, one of the last features keeping large ZooKeeper deployments from migrating. It also shipped the first official Apache Kafka Docker image, a small thing that said a lot about who was deploying Kafka now and how.

For operators, the KRaft story was an unqualified win: one system instead of two, faster controller failover, metadata scaling to far more partitions, and an entire category of "ZooKeeper session expired" incident reports retired. February 2024 was when we started telling every team standing up new clusters: KRaft only, no exceptions, and put your ZooKeeper migration on this year's calendar, not next year's.

The decision Python teams actually faced

Simpler Kafka operations had a side effect: teams started reaching for Kafka in places it did not belong, on the logic that "we'll need it eventually." The Celery-versus-Kafka conversation came up in nearly every data-engineering engagement we ran that spring, so here is the decision framework as we taught it then — it has barely needed updating since.

Celery with RabbitMQ is a task system. The unit is a job: send this email, generate this report, resize this image. You get per-message acknowledgement, automatic retries with backoff, routing, priorities, scheduled and chained tasks, and a worker model tuned for "do this once, tell me when it's done." Messages are consumed and gone; the broker's job is delivery, not history.

Kafka is an event log. The unit is a fact: this order was placed, this sensor read 41.7. Events persist for a configured retention; consumers track offsets and can replay from any point; multiple independent consumer groups read the same stream without interfering. You get throughput and durable history, but no per-message ack-and-retry semantics, no task routing, no built-in notion of "this specific job failed, redeliver it to someone else."

The framework, compressed: if you would naturally say "process this", you want a task queue; if you would say "this happened", you want a log. If you need replay, multiple downstream readers, or stream processing, that points to Kafka. If you need retries, priorities, and per-job lifecycle, that points to Celery. And the architectures that aged best used both, Kafka carrying the events and Celery executing the work those events trigger — with the consumer feeding tasks across the seam.

Looking back from June 2026

The KRaft transition completed on schedule: Kafka 4.0 shipped in 2025 without ZooKeeper, and the migration tooling matured enough that the move became routine rather than feared. The decision framework, meanwhile, outlived every tooling change around it — we have watched teams burn quarters rebuilding perfectly good Celery pipelines on Kafka for resume reasons, and other teams strain RabbitMQ into an event-distribution role it was never meant for. The boundary between "task" and "event" remains the cheapest architectural clarity available in this space.

Both halves of that boundary are courses we teach hands-on: Distributed Task Automation with Python, Celery, and RabbitMQ covers the task-queue side deeply — workers, retries, routing, monitoring — while Distributed Task Automation with Python, Kafka, and Celery builds the combined architecture, including exactly the Kafka-consumer-to-Celery-task seam that February 2024's KRaft milestone made so much easier to operate.