Discussion about this post

User's avatar
Robots and Chips's avatar

This is a fantasic deep dive on how S3 actually works under the hood. What really stands out to me is how critical HDDs still are to the modern cloud infrastrucure despite everyone declaring them dead technology. The fact that S3 runs on tens of millions of HDDs is a huge testament to the economics of spinning disk storage. You mentioned the WD Gold 20TB drives in the footnote, and that's exacly the kind of enterprise HDD that Western Digital has been betting on. While everyone was focused on NAND and SSDs eating the storage market, companies like WDC and Seagate quietly became essential to hyperscale cloud providers because of those dirt cheap per-TB costs you highlighted. The parallelization strategy through erasure coding is genius because it turns the HDD's weakness (slow random IO) into a non-issue by spreading requests across thousands of drives. Western Digital's whole business thesis for the past few years has been that AI and cloud storage will drive massive HDD demand for cold storage tiers, and this article basically validates that bet. The workload decorrelation you describe at scale is also fascinating because it shows why multi-tenant systems like S3 can afford to keep using HDDs economically. Really excellent breakdown of the architecture.

Expand full comment

No posts