Replies: 2 comments 3 replies
-
|
AWS announcement follows GKE's 65k node scale (https://cloud.google.com/blog/products/containers-kubernetes/gke-65k-nodes-and-counting) where etcd was replaced by Spanner. While both articles quote replacing etcd/raft as reason for reaching higher scale, I personally think it played smaller role than it might seem. Both achievements primarily show the results of recent work in Kubernetes control plane performance, which has unlocked a much higher scale (e.g., https://kubernetes.io/blog/2025/09/09/kubernetes-v1-34-snapshottable-api-server-cache/). The Kubernetes scale target (5k nodes) was set to represent the needs of the large majority of users, leaving it to advanced users like cloud providers to push it further. I've personally seen 30k node clusters running on etcd, so it shouldn't be immediately abandoned if you need higher scale. The magic piece behind both improvements isn't just replacing etcd; it's that both Amazon's journal and Google Spanner use atomic clocks to resolve conflicts in their consensus algorithms. While Raft needs two network hops to confirm the order of two operations, these systems can just look up the real time from an atomic clock to achieve the same result. This makes things that much easier to scale horizontally compared to purely software-based consensus. I don't expect such advanced systems, which require dedicated hardware, to be open-sourced anytime soon. (Full disclosure: I work on GKE) |
Beta Was this translation helpful? Give feedback.
-
This infromation is awesome. Never thought of it. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I recently came across a surprising article: Amazon EKS can scale up to 100K nodes. In the article, they mentioned that they offloaded etcd's raft to their own journal. Is there more detailed information on this? Or does anyone know if Amazon will open-source the journal?
https://aws.amazon.com/cn/blogs/containers/under-the-hood-amazon-eks-ultra-scale-clusters/
Beta Was this translation helpful? Give feedback.
All reactions