Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Hardware

Reviews of Hard Drive Reliability? 44

ewhac asks: "After having three 18G drives go toes-up on me in the last two months, all of them done so after about 40 days of use, I want the replacement drives to be rock-solid. While Tom's Hardware and AnandTech review individual drives and their performance, I haven't yet been able to locate any comprehensive or cohesive review of drive reliability and longevity. Does such a resource exist?"
This discussion has been archived. No new comments can be posted.

Reviews of Hard Drive Reliability?

Comments Filter:
  • Never buy IBM Drives (Score:3, Interesting)

    by duffbeer703 ( 177751 ) on Monday January 28, 2002 @03:31PM (#2915119)
    Just had two 18GB IBM SCSI (LZX) drives die after less than a year. Also had 6 bad disks in 5 months on a shark at work.

    Never, ever, ever, ever buy IBM storage.
    • I'm sorry to hear about your problems with IBM, but it would seem, even if only anecdotally, that most people I know have found IBM drives to be rock-solid.
    • Unfortunately I don't know of any practicular resources specific to storage, but here is an old link about the IBM harddrive problems reported earlier on slashdot [slashdot.org]

      I had a similar problem, specifically with the Desktar 75GB.

      1. I went out and picked one up when it came out. The first once I purchased didn't work at all. The system couldn't detect the drive. I sent it back to the company I ordered from and received my second drive
      2. My second drive installed fine. The drive seemed to sleep a bit often for some reason; I used it as a slave drive. Just less than two months of usage, the harddrive crashed on me. Lots of cool sounds (click-et-ti-clack click-et-ti-clack). I RMA'ed the product and received the third one
      3. So far the third one seems ok, but I've only used it for 40 hours so far, so I'm expecting it to crash. This time around it's running on a non-critical machine, my gaming box.
      Just my two cents on the IBM 75GB harddrives. Fortunately my IBM 16GB is still working fine and it's been in use for about 2 years now.
    • Yeah - unfortunately I have to agree with this.

      I've got an IBM 60GXP in a box at work.

      The first one made losts of "clacking" sounds ... sounded very painful. Linux occasionally locked with the harddisk making a painful repetative sound.

      The replacement is better ... still makes some horrible sounds. No problems with Linux locking ... but the logs occasionally have IO faults logged.

      I wouldn't get another.

      Nige.
    • I've got an IBM 40gb disk (calls itself a IBM-DTLA-305040), and I've had no issues with it at all for about a year. Several of people I know have also had no issues with their IBM disks.

      My advice with any new disk is to put it in to a non critical box, then thrash it like mad for about a week solid. Lots of disk IO, keep the head moving a fair bit, read and write data etc. If it survives that, you shouldn't have any problems with it for the next two years (based on the normal failure to useage distribution curves). If it does fail, you haven't lost any data when you send for a new one.

      Part of the problem seems to be that most disk manufacturers don't like to advertise exactly how reliable (or not) their disks really are. The best way to tell how reliable they think they are is to look at their returns process. If it is really really easy and straightforward, they can't be expecting many returns (or else it wouldn't be ecconomic).

      • These drives are supposed to have 750,000 hours MTBF. They are enterprise-class 10k RPM SCSI drives that cost about $850 each when they were new.

        The fact that so many have broken to the point that there is a 60-day wait to get them replaced under warranty (IBM is out of spares at the moment) is an absolute outrage.
  • Use a RAID array so that you have failure protection. I know compaq sells a good product that goes with their servers, maybe they sell it stand alone too.
    • RAID 5 arrays dont help when two drives fail at the same time. heat or shock can do that.
      Use RAID-51 or RAID-54...most Sun RAID arrays can be configured to do that.
  • by Lepruhkawn ( 199083 ) on Monday January 28, 2002 @03:41PM (#2915188) Homepage
    I commend the request for asking for real data.

    Anecdotal evidence from people who have had drives of a certain brand fail on them and then say "never use this drive" is basically worthless. Even if you hear 5 or 10 people say that, ignore them.

    What you need to know is if there are enough anecdotes to show that the mfgr's MTBF rate is inaccurate and the real rate is a lot lower than what they report (or a lot lower than other mfgr's). Or maybe if there is a certain batch of drives that are anomalous.

    The question is: is the mfg's MTBF rate good enough for you and is it accurate?

    www.storagereview.com has started a reliability database but I don't know if their data is statistcally valuable yet.
    • I would like to note (as someone else did) that StorageReview [storagereview.com] was attempting to build a reliability database, in addition to reviewing units themselves. Tho they seem to have intended to make money, they have subsequently followed the dot-bust and are going to end their site when their current funding runs out. It would be a shame to lose the data. Anyone interested should email them and ask them to make the database public domain, and then see if there is enough support for someone to host it. This would be a valuable resource. There is no substitute for good statistical data analysis. The only other thing you can review is manufacturer claimed MTBF (Mean Time Between Failures). If your drive bites the dust outside a statistically likely variance from the manufacturers claim, at least call them up and ask that they give you a new drive.

      That said, you should use HW RAID and SCSI if you want reliablity. Otherwise, simply buy a good tape backup device and backup regularly. IDE drives are a commodity item, and are basicly least-common-denominator products where whoever can cut a corner to bring down the price will. Given that, use equipment aimed at business/enterprise/professionals and use HW RAID if the data needs to have reliable uptime.
    • I just asked ten of my friends, colleagues, and associates whose opinions I respect. Of those I asked, some are Unix users/experts, some are Windows administrators, some are systems retailers, and some are hardcore gamers. Every one of them recommended Seagate. Most also said that Fujitsu and IBM are reliable.

      I agree that anecdotal evidence should not always be taken at face value or regarded as the only true source of information, but if several persons post their experiences here and there is general consenus as to which are best and worst, then surely it should be given some credence (at least more than web polls!). I would also pay particular attention to the experience of retailers, who have to deal with many warranties on faulty units -- they know which drives break down more.

      The fact is, however, that all manufacturers are likely to produce an odd bad disk, so the warranty length and support provided by the manufacturer and/or retailer should also be taken into consideration.

      Of course, it's still important to backup your important data, regardless of how reliable you perceive your disks to be.
      • Every one of them recommended Seagate.

        That's interesting. Seagate drives used to be famous for having a sudden death problem called "stiction", where the heads would fuse to the platters and the drives would become good only for so much landfill.

        Perhaps they solved that difficulty, and some time ago, but I'm only guessing because Goodle for "stiction" doesn't turn up Seagate anywhere in the top 10. But "seagate stiction" at least shows me that some people out there remember this. Some pages call it "infamous." When was this solved, if it was?

        • "When was that solved?" with the latter model st251's i think, sometime well before 1990. The problem as i recall was the wax-based lubricant they used on the spindle, which after some age would congeal when the drive cooled. If your drive wouldn't spin up when you powered up the computer, the solution was to smack it a good one, which would break the wax up allow the motor to begin turning.

          Of more importance to the modern customer i think is the phenomenon of bad pallets; where a whole pallet of drives falls of a forklift, but are still sold individually.

  • Point of failure (Score:4, Informative)

    by ArcticChicken ( 172915 ) on Monday January 28, 2002 @03:43PM (#2915204)
    If you've had 3 hard disks die on you in 2 months, the problem may not have been with the disks themselves. The first thing to check is if you're getting adequate ventilation to the area where the hard disks are at. You might also want to test the voltage your power supply is putting out.

    Questions like this about hard disks are really better answered here [storagereview.com].
    • Re:Point of failure (Score:4, Informative)

      by martyb ( 196687 ) on Monday January 28, 2002 @03:59PM (#2915293)

      You might also want to test the voltage your power supply is putting out.

      Couldn't agree more; and not only in a static situation, but especially when you are booting the system.

      Here's a strange but true experience. I was working at a small company which was making custom PBXs. We had a few prototypes which were supposed to be identical. Most of them would boot up fine, but one exhibited strange behavior and would fail to boot cleanly. We saw many different modes of failure. We swapped out boards, power supply, etc. between the "good" and the "strange" PBX, but to no avail.

      Finally, I noticed that the power strips for the "good systems" had a 16-gauge wire to plug into the wall; the "strange" one had 18-gauge (i.e. a smaller gauge wire). Swapped in a new power strip and it worked like a charm.
      The voltage drop over the smaller wire was significant enough at boot time (when there was the greatest demand for power) to cause the system to fail!

    • I agree. 90% of all problems I believe are related (not necessarily caused by - but related) to unconditioned power. I use the cheap and effective yet less known about APC LINE-R line conditioners, up to 1250 VA [apcc.com]. They can be had from places like www.pricewatch.com/ [pricewatch.com] and http://www.streetprices.com/ [streetprices.com] for about $115-$130. Well worth it, but they offer no battery backup, but *superior* line conditioning, like the integrated line conditioners on their (APC) very high end UPS's. I'd rather pay for a superior conditioner than pay for some lead acid batteries, and inverter and a "regular" conditioner. The cheap UPS's use crappy relays and a fast clamp time, thus they are not "real." TO me anyway, with exacting standards. Watch the tolerance on "conditioned" output on cheap UPS's.

      BACK to hard drives, I have had great success with both Maxtor and IBM, and reasonably high success with Seagate SCSI - just not Medalist drives or the types with the nasty-medalist fluid bearing design, some barracudas (none RECENT) suffer from this. I have seen many IDE drives fail, usually on lower memory systems when lots of thrashing / swapping occurs, and secretary's need to have every "office" application open along with www.revlon.com.

      Outside of that, since I work in IT, I have seen obscene failure rates with Western Digital products - there are have been times when ONTRACK got $3000+ for someone's hard drive having been failed, needs the "CRITICAL" data, blah blah blah (learn to backup - beeeotch, need to be a BOFH.) Dell was putting these garbage 6GB WDs in the Optiplex systems for a while and were really good at saying F**k You when you wanted them to do something extra nice when the broken hard drive cost you money and downtime. Cute Dell.

      Aside from the nasty 75GXP, particularly the ones made in Hungary, the new IBM drives and especially the 120GXP drives are simply superior in performance, I'll get back to you in a few months on MBTF on the 120GXP, but I don't suspect any problems, plus I do in fact check the SMART status with the superior IBM support disks to see if any shit is about to hit the fan. The 60GXP was very reliable, but I never got in more than 3-4 months on that one. None of my drives ever spin down or get shut off, I think cycling the power all the time can piss drives off as well - just a superstition. For Win32 victims, there is decent SMART Defender software to give you an early heads up, I'm sure some *nix variant of SMART polling has appeared or will, I just don't care to monitor *nix operations that carefully because impending hardware failure seems to be easier to see coming... Just a feeling.

      Touching on power once again, I would also suggest a PC Power and Cooling (overpriced) or an ENERMAX power supply, there are many other decent vendors, but these seem to get the job done, have a medusa pile of wires - more than any case needs, and are relatively quiet and reliable.

      Watch the temp on some of the hard drives as well, keeping the airflow good is essential. I kept an 18GB HDD on for almost 3 years straight until I got my 60GXP (soon to be upgraded to a 120GXP =), and I have had several SCSI drives in other machines as well, and thank goodness knock on wood never had any HDD failure.

    • I bought my new computer about 6 months ago with a new Lain-Li case that has 2 big blowers cooling the drive. The new IBM 40giger died in 4 months, bought a WD 60giger and died 2 months later.

      I just think that new drives have such small platters space that the heads are more sensitive. At work he had an old 30MB drive and installed win3.1 on it, removed the HD cover and had it running for a month in the open, later we got board and stated trowing things at it, then frezing it with an air can... still ran. Of course we abusded so much that eventuly it died.
  • i know for sure that i have an ibm deskstar. and when i went to the computerstore they said that there is a lot of problems with ibm disks (ide atleast)

    i used to run a store before, and then i had mostly problems with western digital caviar disks.

    now i'm not sure whether there is just normal for disks to just die/have bad sectors, or if there is still some hope using samsung disks...

    i know samsung bought a fab from conner. and i know i had some problems with those old conner disks.

    i guess somebody should cover which fabs are producing what brands and how their reliability tests are performed. atleast i think it's a shame that ibm just have a bad word nowadays.

    regards,
    fluor
  • storagereview.com (Score:5, Informative)

    by zoombah ( 447772 ) <anarkkyNO@SPAMcyberwarrior.com> on Monday January 28, 2002 @04:54PM (#2915758)
    the storage review reliability index [storagereview.com] should serve you well. Unfortunately the site itself may be taken down soon (due to financial reasons), so get there quick.
  • by InitZero ( 14837 ) on Monday January 28, 2002 @05:27PM (#2916041) Homepage

    Four 36 gig drives on 16 in our array blew out last week. (Probably heat-related. We had some AC problems in the computer room but the room never exceeded rated temperature.) Two weeks before that, two 18-gig drives in separate machines died for unknown reasons. The 36-gig drives were IBM. The 18-gig drives were Segate (who, at one time, made the IBM drives). In the last two months, we've also lost a few Maxtor drives.

    Except for the batch of drives in one array, the above is fairly typical. We have thousands of drives from many vendors and I can't swear one is any better or worse than the other. Hard drives all pretty much suck.

    Sure, we all read about MTBF being 500,000 hours for new drives but that's a pipe dream. Drives burn out every single day.

    If you have the money, buy a pair of top quality drives and mirror them. If you can't afford that, buy a couple of cheap drives and mirror them. Don't put important data on a single drive and expect it to be there when you get back from lunch.

    InitZero

    • Sure, we all read about MTBF being 500,000 hours for new drives but that's a pipe dream. Drives burn out every single day.

      Can't quite work out this sentence. Drives burn out every day? Stars burn out every single day (maybe they don't, but you get the idea) but that doesn't mean stars don't have long lifetimes (hint: they do)

    • InitZero writes:
      If you have the money, buy a pair of top quality drives and mirror them. If you can't afford that, buy a couple of cheap drives and mirror them. Don't put important data on a single drive and expect it to be there when you get back from lunch.
      Good advice.

      One problem I have is that most of the times I have had drives die early in their lifespan, it has been a 'batch' problem, and had a purchased two identical drives from the same vendor, chances are, both of them would have died at about the same time.

      Most mirroring solutions depend on using nearly-identical drives for the mirrored pair, right?

      Another issue, I've had very few drives fail in service, where the system was running for years and then either just went dead or started getting disk errors, increasing over time. 99% of the failures I have encountered have been with drives that just would not come back up after a shutdown.

      Sometimes you can hear the bearings going out, other times you shut the system down for just a few minutes, turn the power back on, and the drives just go 'clunk', but cannot spin up.

      In the old days of 'stiction' this could sometimes be overcome by repeated powercycles or the old 'weak karate chop to the side of the drive' method.

      Again, I've had multiple drives of about the same age fail in this manner, which in the case of a mirror, means losing the data...

  • by uslinux.net ( 152591 ) on Monday January 28, 2002 @06:09PM (#2916340) Homepage
    I have several 9 and 18 GB drives in a mid size desktop, and they've been running for ages. I've used IBM, Maxtor, Seagate, Quantum, Western Digital, etc, and what I generally find is drives last about 3 years, which is really their useful life anyway. Some go longer, but in general, you should be able to count on 3 years.

    So, if you're finding your drives die in 30-60 days, there's likely another problem you're missing. If you're using SCSI, I'd guess they're probably 7200 or 10k RPM drives, which means LOTS of heat, especially if you have several. So, first of all, go buy a few 60 or 80mm fans, and stick them in front of the drives, if you can. Get some air flow across them (remember, air pushed across the drives does much more than air pulled/sucked across them). Heat will quickly kill a drive.

    Barring that, you haven't said how the drives have died (won't spin up, unusual read errors, etc), but a poor power supply, especially one running at capacity could burn out a drive. Finally, any sort of shock (case constantly being moved, bounced around, kicked, etc) could do a drive in, though that is probably less likely.

    As with anything else, it's all IMO, YMMV, etc.

    • So, if you're finding your drives die in 30-60 days, there's likely another problem you're missing.

      If there is, I'd sorely like to know what it is.

      Barring that, you haven't said how the drives have died [ ... ]

      Two drives died by developing an unrecovered read error on exactly two consecutive sectors. The latest one was right in the middle of the directory structure for C:\WINDOWS\SYSTEM. Fortunately, the Linux and BeOS partitions remain bootable. The third drive hasn't malfunctioned yet, but is making a very worrying "squeak" noise regularly every 60 seconds, so I'm unwilling to commit data to it.

      The system is all SCSI, all the time. The internal chain is all Wide SCSI (no 50-pin adapters), with a twisted-pair cable and a separate terminator pack. The controller is a Mylex (nee BusLogic) BT-958 single-ended controller.

      The internal SCSI chain appears as follows (nice /proc/scsi/scsi formatting ruined to get past lameness filter):

      Attached devices:
      Host: scsi0 Channel: 00 Id: 00 Lun: 00
      Vendor: IBM
      Model: DDYS-T18350N
      Rev: S96H
      Type: Direct-Access
      ANSI SCSI revision: 03
      Host: scsi0 Channel: 00 Id: 01 Lun: 00
      Vendor: IBM
      Model: DDYS-T18350N
      Rev: S9YB
      Type: Direct-Access
      ANSI SCSI revision: 03
      Host: scsi0 Channel: 00 Id: 02 Lun: 00
      Vendor: IBM
      Model: DDRS-39130D
      Rev: DC1B
      Type: Direct-Access
      ANSI SCSI revision: 02
      Host: scsi0 Channel: 00 Id: 08 Lun: 00
      Vendor: PLEXTOR
      Model: CD-ROM PX-40TW
      Rev: 1.03
      Type: CD-ROM
      ANSI SCSI revision: 02

      The first two drives in the chain are the ones with problems. Drive 0 (boot drive) has the unrecovered read error; Drive 1 is the squeaker. Drive 1 itself is an RMA replacement for an earlier, identical drive that developed an unrecovered read error. Both of these drives have a fan blowing over them.

      Drive 2 has never exhibited any problems.

      The motherboard is an ASUS P2B-D, with two 1GHz Pentium-3s. The RAM is from Crucial, CAS latency 2, ECC. The power supply is 300W and came with the Antec case.

      In short, I've tried to not cheap out on anything. If you can spot something I've missed, I'd be happy to know.

      Schwab

      • I would try a 400W PSU.
      • For starters, upgrade the power supply. Seriously. If you have 3 drives, plus a CD ROM and a dual P3 system, you're probably sucking mad amounts of power. If the drives aren't getting enough power, you could brown them out. Think about when the lights dim in your house because an appliance kicks on (okay, maybe not in *your* house, but it does in mine). That's because the voltage is dropping below (usually) 103 volts because of the sudden load. If several of your drives are being accessed simultaneously, you may very well be accomplishing the same thing *inside* the case. You can usually run at about 75% of your max wattage before you start to have problems. I've found drives are generally about 25 watts a piece, and those P3 CPUs are probably 25-30 watts each. When you start considering RAM, motherboard, etc, you'll probably find you're running over 225 watts. 300 watt power supplies were really designed for single cpu, 2 drive + CD & CDRW systems. With 3 drives constantly spinning, and dual CPUs, you really need 350-400 watts.

        Seriously, if you're getting unrecoverable read errors on adjacent sectors, it really sounds to me like there wasn't enough power, and the data didn't get written cleanly to disk. Remember, bits and bytes are just current.

        In addition, make sure you have a well shielded setup (cover on your system), and a good, high quality drive cable. I had a maxtor 340 mb drive back in '95/96 which occasionally crashed on "unrecoverable read error writing to drive c:" under windows95/dos. Swapping IDE cables fixed it (maxtor claimed it was RF interference).

  • You know, Consumer Reports has long been known for compiling reliability data for automobiles by surveying its readers.

    If that kind of method is a good one, I wonder if we can get some techy rag to do something similar.
  • Some of this may be redundant - but that's the point! Redundant storage :-)

    Is it you? first check
    1. Power Supply. Don't run 3x 10000 RPM's off a 230W p/s. It's just not cricket.

    2. Cooling. Blow wind onto them, don't suck it from them - someone smarter than me can say why, but it just works better.

    3. Shock. Did the courier drop them? Did you drop them?

    If it's not you them,
    1. If it is the slightest bit valuable then it should be redundant.

    2. Did I mention redundancy? :-)

    I have a seagate SCSI disk in a MicroVAX that has hardly missed a beat since 1987. These disks don't exist any more - they just don't make em as good. This is sad because it is a reliable disk, but not so bad because it weighs about 5kg and I can hear it spin up from the other end of the house!

    Having said all that - The newer IDE disks die _way_ before they should. It pisses me of as much as the next guy. What can we do?
  • 1. Death
    2. Taxes
    3. Hard Drive Failure


    Fortunately, there is good news. Though the latter will never be completely eliminated, data loss as a result of hard drive failures can!!! The secret it actually no secret at all - redundancy!

    You can read a truck load of technical documentation, bore yourself to death with piles of market research, or even consult a psychic [onlinepsychic.com], but nothing will stop the inevitable failure of hard drives. It is an industry wide problem with (in my experience) little bearing on the hardware manufacturer. Sure, everyone has their favorites, but in the end the simple fact is that hard drives have moving parts and any thing with a moving part can, will, and DOES break...
  • I don't know of any such resource, but there's surely sufficient users here to form an idea of what to buy and what not to, just from their experiences.

    Only last week I was agreeing with fellow LinuxSA [linuxsa.org.au] members that Seagate [seagate.com], Fujitsu [fujitsu.com], and IBM [ibm.com] drives are reliable [linuxsa.org.au], and Maxtor and Western Digital drives are not. The last-mentioned brands seem far more likely to seize or develop bad clusters after a few years of use.

    I also does not seem coincidental that larger reputable companies seem to sell those drives perceived to be reliable and smaller "iffier" companies (such as those marketing only on cost) seem to sell those drives perceived to be unreliable.
    • We need a truly objective survey of hard drive reliability. My personal experience is nearly the exact opposite of yours-- I have had two fujitsu drive failures within 2 years, and one IBM failure in 8 months. My maxtor and western digital drives (even the really old ones) are all still running happily.

      Just goes to show how true YMMV really is, and why anecdotal evidence isn't much help.
  • The few hard drives I have had fail over time, the bad block blackhole, have always failed due to heat issues. This is especially true for 10K and 15K rpm SCSI drives. One particular PC chassis of mine was on an IBM 18GB 10K SCSI drive killing spree, until I stuck the latest drive in a well ventilated 5.25" slot.

    Now that I've given my opinion on hard drives and heat, I'm going to reinforce some advice that has already been posted. If at all possible mirror your data drives. If your data is of a life or death nature create a backup system and don't forget to regularly verify backups.

    --I'm just glad electronic devices work most of the time.

  • I've had some Maxtor DiamondPlus drives for a while and they work great. I've got the newer 40gig and older 13gig, both 7200rpm drives. When I first got the 13gig drive it started failing within about two weeks, so I brought it back and got a replacement (which was actually a 15gig).
  • A cool feature of the latest FC-AL based systems from Sun, the OS includes commands to support hot-swap, including the ability to disconnect and/or power down one drive in a system without affecting the others.

    I've attempted this 'live software disconnect/spin down' with other OS's using standard SCSI, but haven't had much luck. Solaris never supported it before, and now only on FC-AL.

    One trick you can do with this is to have a 'warm spare' installed, a drive that contains a mirror of the system as of the last major change, but is not constantly running. By keeping the spare drive updated, installed, and ready, you can recover from a failed disk remotely, without any need for physical intervention. Combine this with the new "RSC" (battery-backed lights-out-management card with it's own ethernet and modem paging, and you really have something to brag about).

    If the big Sunfires are out of your budget, a subset of the full feature set is in the LOM interface on some(?) Netra models.

    One drawback of spinning down the disk (as I mentioned in another comment here), one of the most common failure modes is a drive that just won't spin up once you turn it off...

Get hold of portable property. -- Charles Dickens, "Great Expectations"

Working...