redm 2 days ago

I've been following this report for many years, but Backblaze, as a backup service (traditionally), has very different IO patterns than many users. They originally started with consumer drives, which we found to be far too unreliable. In my experience, the BER and write cycles have a dramatic impact on overall drive performance. The MTBF declines sharply as write cycles increase, both as a percentage of IO and overall IO.

Backblaze changed IO patterns with B2, but that would be the key data for me to make this more useful: failure rate as a percentage of bytes read/written, etc.

tempest_ 2 days ago

While I find this data interesting it isnt usually very actionable.

The skus with the lowest number immediately get bought out(if they are still available, which they are not always) and will never be available. You also always run the risk of "getting a bad batch" or just getting some drives that got beat up in shipping.

Usually this data is only useful for keeping an eye on your own stuff and prioritizing replacements when the time comes.

When buying drives I just look at the sizes I need and the performance then get 1/3rd from each of the manufacturers.

  • tracker1 2 days ago

    Yeah, usually by the time you know a specific model is or isn't "good" the mfg has changed production or how things are laid out in the products themselves. Over time, you can glean that some mfg have been better or worse overall than others though, but that's not a promise of future efforts.

    All the same, it's definitely cool and interesting to see. I've had some good and some very bad luck with storage drives over the years. I still think twice about Seagate drives since I had 6 out of 8 of their 3tb enterprise models go bad relatively quickly a decade and a half ago, specifically bought through separate vendors. I also had the first IBM Deskstar drives, the second died before the first could be RMA'd (raid1 isn't backup).

  • toast0 a day ago

    Any sort of long term testing is like this. You can't know what the long term reliability of something is when you buy it. You can estimate from reliability of similar items made in the past, but even if you bought some of everything and kept it on the shelf for X years and then only used the best, the stuff aged on the shelf.

    Reports like this might help drive planning for failures. It might also help validate your experience if you've had a bunch of failures with some model and they have too.

    IIRC, there have been a couple models that seemed to hit a big bathtub curve style end of life (I think 6TB drives in particularl); that could be a pre-failure indicator for you if you have that model.

    Otherwise, yeah, mostly not actionable, but very nice to see the data.

    > When buying drives I just look at the sizes I need and the performance then get 1/3rd from each of the manufacturers.

    This is a good plan, you should avoid most correlated failures from firmware and manufacturing (although there's a lot of shared supply chain, so you might not avoid all correlated failures if some common component was made improperly during a long enough time period that all three drive makers would be using it in your purchase).

  • theanomaly 2 days ago

    While it's tough if you want new drives, I've found I could frequently get used drives on eBay that have significant history on Backblaze's report. Despite the increased risk from used drives, I've found I still end up more reliable than buying random new drives.

  • warmwaffles 2 days ago

    I'm mainly looking at manufacturer and model failure rates in aggregate over a period of time like 6 months to determine my next purchases. As you pointed out SKUs with the lowest get slurped up and you always run the risk of bad batches.

basilgohar 2 days ago

What Backblaze is doing here is so underrated. This a large scale, practical, in-datacenter real data on essential hardware infrastructure that is available almost nowhere else, and they provide it, and their excellent analysis, completely for free.

I miss this culture and I admire leadership that allows it to not only exist, but thrive. I fear the day a stockholder meeting occurs and someone wringing their hands see the decommissioned pennies they can save by limiting or stopping these reports.

  • bArray 2 days ago

    What it buys is long-term good will. Engineers will see they know their stuff and suggest them as a solution for projects and people.

    That said, all it would take is for the wrong leadership to start cutting corners to undo all of this hard work.

    • lostdog 2 days ago

      Backblaze stuck my email on a list, and now I get daily marketing spam from them. They shattered that good will with me very quickly.

  • holysoles 2 days ago

    This is the main reason I use them for their S3 compatible storage service over their competitors. While its not enterprise level revenue, I still like to think it makes a difference.

  • ISL 2 days ago

    For as long as Backblaze has been doing this and at this level of quality, I have no doubt that these reports are good for business.

    (As an anecdotal example -- I first heard about Backblaze from these reports many years ago and have relied on them to an extent in selecting new drives. I'm now a Backblaze customer.)

  • AnonHP 2 days ago

    > I fear the day a stockholder meeting occurs and someone wringing their hands see the decommissioned pennies they can save by limiting or stopping these reports.

    The Backblaze stock has taken a beating over the years. Recently I saw some news that there were issues with financial reporting (and fraud?). So it’s anybody’s guess as to what may happen or if the company would even be around (as it exists now) in the next decade.

    I’d guess they may already have tools in place to prepare the stats and charts, leaving some amount of writing as manual work (which could or would probably be offloaded to generative AI). But analyzing the reliability of drives and publishing the data could also be seen as a competitive advantage when comparing with newer companies (positive and negative).

  • emailrob a day ago

    > by limiting or stopping these reports

    Hopefully not, given the performance one was just newly added!

  • 400thecat 2 days ago

    is there any danger this data is biased? Everything good gets corrupted eventually (amazon reviews, consumer reports, ..). is it possible they get some kickbacks for positive reviews ?

    • basilgohar 2 days ago

      It's always possible. But I haven't seen anything that would imply this to be the case so far in all the years I've been reading this.

londons_explore 2 days ago

My takeaway... The specific model plays a huge role in the failure rate.

A great model has a MTBF of 250 years.

A bad model might have a MTBF of just 5 years.

I suspect if you had a need for reliable storage which couldn't be met with the usual RAID approach, buying 2nd hand drives from eBay of a model and batch proven to be really reliable is probably your best bet.

  • londons_explore 2 days ago

    And to answer the obvious question... One usecase where you want reliability and can't use RAID is where you are selling a product that only has physical space or money for one drive - for example a standalone CCTV storage device.

    Every drive failure will lead to an unhappy customer and product return, so you really want the failure rate in the first 10 years of operation to be 1% or below. (Which none of the drives in this study can do).

blindriver 2 days ago

Given the upcoming 2 year enterprise data shortage coming up due to hyperscalers, I'm curious how this will affect Backblaze.

  • tempest_ 2 days ago

    That is SSD/Memory.

    These are HDDs.

    • benjiro 2 days ago

      HDDs have also been under pressure ... There was barely a month ago a article here, from somebody who setup a cluster of like half a million, with several 1000's hdds. Just to store data for AI training.

      Not even two days ago there as a article of backlog on HDDs for AI. Because everybody and their grandmother wants to store the entire internet, out of fear that AI scraping will become more difficult. Aka, they are gating data. And yes, you can train AI easily on HDD even with their lower IOPS. The fact that you got a few 1000 in parallel does the trick, and its often bandwidth issues that hit harder.

      I just stockpiled a few extra 4TB NVME because i learned my lesson. NVME has not been dropping in prices after the manufacture pushed it up, and AI is going to keep eating NVME storage for a long time. Let alone HDD storage...

      Welcome to the new normal ... Crypto miners killing GPU prices, HDD Crypto miners, Crypto miners again back with a vengeance, O pandemic, everybody needs hardware... Short time of benefits because of over production (on NVME especially, manufacture cut back production) AAAAND .. here comes AI.

      Its something every fying year.

      • sersi 2 days ago

        In my neck of the woods (HK), HDD price pretty much doubled in the last 2 months. I bought 22TB Toshiba 1 year ago at 30% less than what they cost now.

  • zhdc1 2 days ago

    Shortage -> Glut