Reviews of Hard Drive Reliability? 44
ewhac asks: "After having
three 18G drives go toes-up on me in the last two months, all of them
done so after about 40 days of use, I want the replacement drives to
be rock-solid. While Tom's Hardware and
AnandTech review individual
drives and their performance, I haven't yet been able to locate any
comprehensive or cohesive review of drive reliability and longevity.
Does such a resource exist?"
Never buy IBM Drives (Score:3, Interesting)
Never, ever, ever, ever buy IBM storage.
Re:Never buy IBM Drives (Score:1)
Re:Never buy IBM Drives (Score:1)
Guess I'm just lucky.
Re:Never buy IBM Drives (Score:2)
Re:Never buy IBM Drives (Score:1)
Unfortunately I don't know of any practicular resources specific to storage, but here is an old link about the IBM harddrive problems reported earlier on slashdot [slashdot.org]
I had a similar problem, specifically with the Desktar 75GB.
Re:Never buy IBM Drives (Score:1)
I've got an IBM 60GXP in a box at work.
The first one made losts of "clacking" sounds
The replacement is better
I wouldn't get another.
Nige.
Re:Never buy IBM Drives (Score:2)
I've got an IBM 40gb disk (calls itself a IBM-DTLA-305040), and I've had no issues with it at all for about a year. Several of people I know have also had no issues with their IBM disks.
My advice with any new disk is to put it in to a non critical box, then thrash it like mad for about a week solid. Lots of disk IO, keep the head moving a fair bit, read and write data etc. If it survives that, you shouldn't have any problems with it for the next two years (based on the normal failure to useage distribution curves). If it does fail, you haven't lost any data when you send for a new one.
Part of the problem seems to be that most disk manufacturers don't like to advertise exactly how reliable (or not) their disks really are. The best way to tell how reliable they think they are is to look at their returns process. If it is really really easy and straightforward, they can't be expecting many returns (or else it wouldn't be ecconomic).
Re:Never buy IBM Drives (Score:2)
The fact that so many have broken to the point that there is a 60-day wait to get them replaced under warranty (IBM is out of spares at the moment) is an absolute outrage.
All I know is... (Score:1)
Re:All I know is... (Score:1)
Use RAID-51 or RAID-54...most Sun RAID arrays can be configured to do that.
resist anecdotal evidence (Score:5, Insightful)
Anecdotal evidence from people who have had drives of a certain brand fail on them and then say "never use this drive" is basically worthless. Even if you hear 5 or 10 people say that, ignore them.
What you need to know is if there are enough anecdotes to show that the mfgr's MTBF rate is inaccurate and the real rate is a lot lower than what they report (or a lot lower than other mfgr's). Or maybe if there is a certain batch of drives that are anomalous.
The question is: is the mfg's MTBF rate good enough for you and is it accurate?
www.storagereview.com has started a reliability database but I don't know if their data is statistcally valuable yet.
Re:Resist anecdotal evidence: help generate data (Score:1)
That said, you should use HW RAID and SCSI if you want reliablity. Otherwise, simply buy a good tape backup device and backup regularly. IDE drives are a commodity item, and are basicly least-common-denominator products where whoever can cut a corner to bring down the price will. Given that, use equipment aimed at business/enterprise/professionals and use HW RAID if the data needs to have reliable uptime.
Re:resist anecdotal evidence (Score:1)
I agree that anecdotal evidence should not always be taken at face value or regarded as the only true source of information, but if several persons post their experiences here and there is general consenus as to which are best and worst, then surely it should be given some credence (at least more than web polls!). I would also pay particular attention to the experience of retailers, who have to deal with many warranties on faulty units -- they know which drives break down more.
The fact is, however, that all manufacturers are likely to produce an odd bad disk, so the warranty length and support provided by the manufacturer and/or retailer should also be taken into consideration.
Of course, it's still important to backup your important data, regardless of how reliable you perceive your disks to be.
Re:resist anecdotal evidence (Score:2)
That's interesting. Seagate drives used to be famous for having a sudden death problem called "stiction", where the heads would fuse to the platters and the drives would become good only for so much landfill.
Perhaps they solved that difficulty, and some time ago, but I'm only guessing because Goodle for "stiction" doesn't turn up Seagate anywhere in the top 10. But "seagate stiction" at least shows me that some people out there remember this. Some pages call it "infamous." When was this solved, if it was?
Re:resist anecdotal evidence (Score:1)
Of more importance to the modern customer i think is the phenomenon of bad pallets; where a whole pallet of drives falls of a forklift, but are still sold individually.
Point of failure (Score:4, Informative)
Questions like this about hard disks are really better answered here [storagereview.com].
Re:Point of failure (Score:4, Informative)
You might also want to test the voltage your power supply is putting out.
Couldn't agree more; and not only in a static situation, but especially when you are booting the system.
Here's a strange but true experience. I was working at a small company which was making custom PBXs. We had a few prototypes which were supposed to be identical. Most of them would boot up fine, but one exhibited strange behavior and would fail to boot cleanly. We saw many different modes of failure. We swapped out boards, power supply, etc. between the "good" and the "strange" PBX, but to no avail.
Finally, I noticed that the power strips for the "good systems" had a 16-gauge wire to plug into the wall; the "strange" one had 18-gauge (i.e. a smaller gauge wire). Swapped in a new power strip and it worked like a charm.
The voltage drop over the smaller wire was significant enough at boot time (when there was the greatest demand for power) to cause the system to fail!
Re:Point of failure (Score:2)
BACK to hard drives, I have had great success with both Maxtor and IBM, and reasonably high success with Seagate SCSI - just not Medalist drives or the types with the nasty-medalist fluid bearing design, some barracudas (none RECENT) suffer from this. I have seen many IDE drives fail, usually on lower memory systems when lots of thrashing / swapping occurs, and secretary's need to have every "office" application open along with www.revlon.com.
Outside of that, since I work in IT, I have seen obscene failure rates with Western Digital products - there are have been times when ONTRACK got $3000+ for someone's hard drive having been failed, needs the "CRITICAL" data, blah blah blah (learn to backup - beeeotch, need to be a BOFH.) Dell was putting these garbage 6GB WDs in the Optiplex systems for a while and were really good at saying F**k You when you wanted them to do something extra nice when the broken hard drive cost you money and downtime. Cute Dell.
Aside from the nasty 75GXP, particularly the ones made in Hungary, the new IBM drives and especially the 120GXP drives are simply superior in performance, I'll get back to you in a few months on MBTF on the 120GXP, but I don't suspect any problems, plus I do in fact check the SMART status with the superior IBM support disks to see if any shit is about to hit the fan. The 60GXP was very reliable, but I never got in more than 3-4 months on that one. None of my drives ever spin down or get shut off, I think cycling the power all the time can piss drives off as well - just a superstition. For Win32 victims, there is decent SMART Defender software to give you an early heads up, I'm sure some *nix variant of SMART polling has appeared or will, I just don't care to monitor *nix operations that carefully because impending hardware failure seems to be easier to see coming... Just a feeling.
Touching on power once again, I would also suggest a PC Power and Cooling (overpriced) or an ENERMAX power supply, there are many other decent vendors, but these seem to get the job done, have a medusa pile of wires - more than any case needs, and are relatively quiet and reliable.
Watch the temp on some of the hard drives as well, keeping the airflow good is essential. I kept an 18GB HDD on for almost 3 years straight until I got my 60GXP (soon to be upgraded to a 120GXP =), and I have had several SCSI drives in other machines as well, and thank goodness knock on wood never had any HDD failure.
Re:Point of failure (Score:2)
I just think that new drives have such small platters space that the heads are more sensitive. At work he had an old 30MB drive and installed win3.1 on it, removed the HD cover and had it running for a month in the open, later we got board and stated trowing things at it, then frezing it with an air can... still ran. Of course we abusded so much that eventuly it died.
don't buy ibm (Score:1)
i used to run a store before, and then i had mostly problems with western digital caviar disks.
now i'm not sure whether there is just normal for disks to just die/have bad sectors, or if there is still some hope using samsung disks...
i know samsung bought a fab from conner. and i know i had some problems with those old conner disks.
i guess somebody should cover which fabs are producing what brands and how their reliability tests are performed. atleast i think it's a shame that ibm just have a bad word nowadays.
regards,
fluor
storagereview.com (Score:5, Informative)
All Drives Suck -- Go Redundant (Score:5, Informative)
Four 36 gig drives on 16 in our array blew out last week. (Probably heat-related. We had some AC problems in the computer room but the room never exceeded rated temperature.) Two weeks before that, two 18-gig drives in separate machines died for unknown reasons. The 36-gig drives were IBM. The 18-gig drives were Segate (who, at one time, made the IBM drives). In the last two months, we've also lost a few Maxtor drives.
Except for the batch of drives in one array, the above is fairly typical. We have thousands of drives from many vendors and I can't swear one is any better or worse than the other. Hard drives all pretty much suck.
Sure, we all read about MTBF being 500,000 hours for new drives but that's a pipe dream. Drives burn out every single day.
If you have the money, buy a pair of top quality drives and mirror them. If you can't afford that, buy a couple of cheap drives and mirror them. Don't put important data on a single drive and expect it to be there when you get back from lunch.
InitZero
Re:All Drives Suck -- Go Redundant (Score:2)
Can't quite work out this sentence. Drives burn out every day? Stars burn out every single day (maybe they don't, but you get the idea) but that doesn't mean stars don't have long lifetimes (hint: they do)
Problems with mirroring disks (Score:2)
One problem I have is that most of the times I have had drives die early in their lifespan, it has been a 'batch' problem, and had a purchased two identical drives from the same vendor, chances are, both of them would have died at about the same time.
Most mirroring solutions depend on using nearly-identical drives for the mirrored pair, right?
Another issue, I've had very few drives fail in service, where the system was running for years and then either just went dead or started getting disk errors, increasing over time. 99% of the failures I have encountered have been with drives that just would not come back up after a shutdown.
Sometimes you can hear the bearings going out, other times you shut the system down for just a few minutes, turn the power back on, and the drives just go 'clunk', but cannot spin up.
In the old days of 'stiction' this could sometimes be overcome by repeated powercycles or the old 'weak karate chop to the side of the drive' method.
Again, I've had multiple drives of about the same age fail in this manner, which in the case of a mirror, means losing the data...
What's the real problem? (Score:4, Interesting)
So, if you're finding your drives die in 30-60 days, there's likely another problem you're missing. If you're using SCSI, I'd guess they're probably 7200 or 10k RPM drives, which means LOTS of heat, especially if you have several. So, first of all, go buy a few 60 or 80mm fans, and stick them in front of the drives, if you can. Get some air flow across them (remember, air pushed across the drives does much more than air pulled/sucked across them). Heat will quickly kill a drive.
Barring that, you haven't said how the drives have died (won't spin up, unusual read errors, etc), but a poor power supply, especially one running at capacity could burn out a drive. Finally, any sort of shock (case constantly being moved, bounced around, kicked, etc) could do a drive in, though that is probably less likely.
As with anything else, it's all IMO, YMMV, etc.
Re:What's the real problem? (Score:2)
If there is, I'd sorely like to know what it is.
Two drives died by developing an unrecovered read error on exactly two consecutive sectors. The latest one was right in the middle of the directory structure for C:\WINDOWS\SYSTEM. Fortunately, the Linux and BeOS partitions remain bootable. The third drive hasn't malfunctioned yet, but is making a very worrying "squeak" noise regularly every 60 seconds, so I'm unwilling to commit data to it.
The system is all SCSI, all the time. The internal chain is all Wide SCSI (no 50-pin adapters), with a twisted-pair cable and a separate terminator pack. The controller is a Mylex (nee BusLogic) BT-958 single-ended controller.
The internal SCSI chain appears as follows (nice /proc/scsi/scsi formatting ruined to get past lameness filter):
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: IBM
Model: DDYS-T18350N
Rev: S96H
Type: Direct-Access
ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: IBM
Model: DDYS-T18350N
Rev: S9YB
Type: Direct-Access
ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 02 Lun: 00
Vendor: IBM
Model: DDRS-39130D
Rev: DC1B
Type: Direct-Access
ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 08 Lun: 00
Vendor: PLEXTOR
Model: CD-ROM PX-40TW
Rev: 1.03
Type: CD-ROM
ANSI SCSI revision: 02
The first two drives in the chain are the ones with problems. Drive 0 (boot drive) has the unrecovered read error; Drive 1 is the squeaker. Drive 1 itself is an RMA replacement for an earlier, identical drive that developed an unrecovered read error. Both of these drives have a fan blowing over them.
Drive 2 has never exhibited any problems.
The motherboard is an ASUS P2B-D, with two 1GHz Pentium-3s. The RAM is from Crucial, CAS latency 2, ECC. The power supply is 300W and came with the Antec case.
In short, I've tried to not cheap out on anything. If you can spot something I've missed, I'd be happy to know.
Schwab
Re:What's the real problem? (Score:1)
Re:What's the real problem? (Score:2)
Seriously, if you're getting unrecoverable read errors on adjacent sectors, it really sounds to me like there wasn't enough power, and the data didn't get written cleanly to disk. Remember, bits and bytes are just current.
In addition, make sure you have a well shielded setup (cover on your system), and a good, high quality drive cable. I had a maxtor 340 mb drive back in '95/96 which occasionally crashed on "unrecoverable read error writing to drive c:" under windows95/dos. Swapping IDE cables fixed it (maxtor claimed it was RF interference).
consumer reports style (Score:1)
If that kind of method is a good one, I wonder if we can get some techy rag to do something similar.
I hate disks dying as much as the next guy... (Score:1)
Is it you? first check
1. Power Supply. Don't run 3x 10000 RPM's off a 230W p/s. It's just not cricket.
2. Cooling. Blow wind onto them, don't suck it from them - someone smarter than me can say why, but it just works better.
3. Shock. Did the courier drop them? Did you drop them?
If it's not you them,
1. If it is the slightest bit valuable then it should be redundant.
2. Did I mention redundancy?
I have a seagate SCSI disk in a MicroVAX that has hardly missed a beat since 1987. These disks don't exist any more - they just don't make em as good. This is sad because it is a reliable disk, but not so bad because it weighs about 5kg and I can hear it spin up from the other end of the house!
Having said all that - The newer IDE disks die _way_ before they should. It pisses me of as much as the next guy. What can we do?
Three sure things in life... (Score:1)
2. Taxes
3. Hard Drive Failure
Fortunately, there is good news. Though the latter will never be completely eliminated, data loss as a result of hard drive failures can!!! The secret it actually no secret at all - redundancy!
You can read a truck load of technical documentation, bore yourself to death with piles of market research, or even consult a psychic [onlinepsychic.com], but nothing will stop the inevitable failure of hard drives. It is an industry wide problem with (in my experience) little bearing on the hardware manufacturer. Sure, everyone has their favorites, but in the end the simple fact is that hard drives have moving parts and any thing with a moving part can, will, and DOES break...
Re:Three sure things in life... (Score:1)
2. Taxes
3. Hard Drive Failure
4. Hearing this joke.
personal experiences (Score:1)
Only last week I was agreeing with fellow LinuxSA [linuxsa.org.au] members that Seagate [seagate.com], Fujitsu [fujitsu.com], and IBM [ibm.com] drives are reliable [linuxsa.org.au], and Maxtor and Western Digital drives are not. The last-mentioned brands seem far more likely to seize or develop bad clusters after a few years of use.
I also does not seem coincidental that larger reputable companies seem to sell those drives perceived to be reliable and smaller "iffier" companies (such as those marketing only on cost) seem to sell those drives perceived to be unreliable.
YMMV (Score:2)
Just goes to show how true YMMV really is, and why anecdotal evidence isn't much help.
Haven't seen heat concerns given enough emphasis (Score:1)
The few hard drives I have had fail over time, the bad block blackhole, have always failed due to heat issues. This is especially true for 10K and 15K rpm SCSI drives. One particular PC chassis of mine was on an IBM 18GB 10K SCSI drive killing spree, until I stuck the latest drive in a well ventilated 5.25" slot.
Now that I've given my opinion on hard drives and heat, I'm going to reinforce some advice that has already been posted. If at all possible mirror your data drives. If your data is of a life or death nature create a backup system and don't forget to regularly verify backups.
--I'm just glad electronic devices work most of the time.
Maxtor DiamondPlus series (Score:1)
SunFire servers and redundant boot disks (Score:2)
I've attempted this 'live software disconnect/spin down' with other OS's using standard SCSI, but haven't had much luck. Solaris never supported it before, and now only on FC-AL.
One trick you can do with this is to have a 'warm spare' installed, a drive that contains a mirror of the system as of the last major change, but is not constantly running. By keeping the spare drive updated, installed, and ready, you can recover from a failed disk remotely, without any need for physical intervention. Combine this with the new "RSC" (battery-backed lights-out-management card with it's own ethernet and modem paging, and you really have something to brag about).
If the big Sunfires are out of your budget, a subset of the full feature set is in the LOM interface on some(?) Netra models.
One drawback of spinning down the disk (as I mentioned in another comment here), one of the most common failure modes is a drive that just won't spin up once you turn it off...