Forgot your password?
typodupeerror
Hardware

Hardware For Bulk IDE Hard Drive Burn-In? 51

Posted by Cliff
from the making-sure-it-works-correctly dept.
r0gue_ asks: "I work for a mid-size OEM hardware manufacturer. We ship approximately 300 to 500 IDE HDs every month across all our units. Currently we experience about a 4% failure rate (Maxtor and WDs), though in recent months it has been a couple percent higher. The problem is our systems are dedicated boxes with a non end-user friendly form factor. Virtually every physical HD failure results in an RMA. What we are looking for is a hardware based IDE HD burn-in platform. Something that we could drop a dozen or so drives in at once, stress test them for a day or two, then put them into inventory for builds. I know the HD manufacturers and larger OEMs use them but I have not been able to track down anywhere we could purchase one. Right now moving to SCSI or a form factor that supports externally removable drives is not an option. I was hoping that the Slashdot community could point me in the right direction."
This discussion has been archived. No new comments can be posted.

Hardware For Bulk IDE Hard Drive Burn-In?

Comments Filter:
  • Here's a solution: (Score:2, Interesting)

    by Swift Guru (168704)
    For the love of God, don't use Western Digital or Maxtor drives. It's like you're asking for that 4%.
    • Who's reccomended today? I used to like IBM, they started sucking, got sued, sold their HD division. Had 2 of their drives die this month. I have a Western Digital 540mb HD from '94 that still runs fine. Had no problems with Maxtor either.
      • by slaker (53818)
        I'd suggest Samsung. Yes, I'm being serious. Even the best of their drives is slow, but "slow" just means that the 7200rpm 80GB Samsung brings up the tail of the pack of _current_ ATA drives, performing better than current 5400rpm entries from WD and Seagate and just a hair slower than current Seagate and Maxtor drives. Before someone jumps on me about performance, do TRY to keep in mind that any current ATA drive is going to be substantially faster than any two-year-old ATA drive, mainly due to the benefit
    • For the love of God, don't use Western Digital or Maxtor drives. It's like you're asking for that 4%.

      Yep. Use only Maxtor. That way you should be able to get 8%.

      -- MarkusQ

    • that not a solution; it's a trite comment (not worthy of its present moderation 'interesting' either. a solution would include another brand or two to actually use.

      i'm still using maxtors myself at this point. why? i'm not sure what else to use that's better and the ones that have failed at least made noises and gave warning--something past western digitals did not.

      so again, what do you recommend that would be better?
    • by tomoe27 (315555)
      Western Digital: I've owned several western digital drives over the past decade, and none of them have ever failed me. At my workplace, I've found old WD drives in Pentium I PC's that have been in service for 6+ years without a single problem.

      Maxtor: I've been plagued with problems from maxtor drives over the years. From one original Maxtor i've bought (and it's RMA replacements), i had 2 that had spindle motors that became abnormally loud, one catastrophically fail (IDE Auto-detect had problems even de
      • by mmontour (2208) <mail@mmontour.net> on Sunday April 27, 2003 @05:13PM (#5821084)
        You can't go by brand alone - at some point every manufacturer has had a line of bad drives.

        StorageReview [storagereview.com] has a Drive Reliability Survey that lists statistics for many drive families. For example, WD 205Bx drives are near the top of the rankings (99th percentile) while the 600Ax is near the bottom (10th percentile).
        • by afidel (530433)
          That 99th percentile is based on barely enough drives for it to be rated (just over 60) so it doesn't mean much, most of the drives have hundreds of units in the database. Besides I think the database is in many ways flawed as most people who list their drives will do so because they have had a failed drive. The best way from my perspective is to look at what companies with hundreds or thousands of drives are doing. Rackspace switched out all of their IBM IDE's for Maxtors, google uses Maxtor's, and the rec
      • I have 3 20MB Seagate ST225s ... 20 years old ... still work like the day they were made. that being said, my two current HDs are a Western Digital, and a ... uh, Quantum Fireball. the name inspires such confidence!
        • Same here, i don't think anybody ever made a more reliable drive than the old ST-225, so many people i know never had a single failure with that drive. about a decade ago i used to use a ST-225 and an ST-277R (65mb RLL Drive).

          I've had pretty good luck in the past with the seagate IDE drives as well.
  • How-to (Score:4, Informative)

    by m0rph3us0 (549631) on Sunday April 27, 2003 @07:24AM (#5818613)
    Put 8 IDE controllers into a box (more than this maxs out the PCI bus bandwidth) .
    write bash script that checks dmesg for how many drives are in the system and invoked the follwing perl script for each drive.

    Write perl script that does this.
    formats and partitions drive to max size,
    copies a kernel or some other large file onto the disk until it is full.
    monitors syslog for IDE errors
    md5sums the files to make sure they all match.
    reports an error if the MD5 doesnt match.

    unless you get hotswap controllers you will have to reboot everytime you want to test another batch of drives.

    if you dont wish to write this perl script i can be hired to do it for you.
    • Re:How-to (Score:3, Informative)

      by m0rph3us0 (549631)
      one more thing, dont put more than 1 drive on each channel as it massively slows down the operation.
    • Re:How-to (Score:3, Interesting)

      by torpor (458)
      It's kinda stupid to only do *8* disks at a time, when you can easily do 64 ... using Firewire.

      My advice would be to investigate into as many Firewire->IDE convertors as your company can afford, and then use a Firewire-friendly OS to do the burn-in. Something like OSX or Linux would work very well in this case - actually, a cheap Apple machine would be perfect for this application.

      There's no need to start things up in batches with Firewire, either. You can plug in a disk, and your 'stresser' program
      • Re:How-to (Score:3, Informative)

        by vadim_t (324782)
        Yeah, that'd be real fast. The bandwidth of Firewire is less than PCI. But okay, suppose you get several cards. In this case, the bandwidth is still 133MB/s. Assuming that you have all that for your disks, which doesn't include the network card, sound card, overhead, and whatever else you have, that gives 2MB/s per disk. Real fast.

        Now, my motherboard supports PCI 64 at 66Mhz, with a bandwidth of 532MB/s, this would give 8.3MB/s per disk. Still not a lot, and you'd have to find a PCI 64 Firewire card with a
        • Re:How-to (Score:2, Informative)

          by torpor (458)
          How do you - at a consumer'ish level - fix the ugliness of the IDE cable, and non-hotswappable capabilities? I would think being able to load disks on and off without having to do a full system re-boot would be pretty advantageous... more important than speed concerns, anyway.

          Can your PC really do sustained writes to >8 drives without getting into performance issues?

          Actually, speed is a good point. I guess I hadn't thought about that so much as part of the setup, but I guess if you're exercising disk
          • >How do you - at a consumer'ish level - fix the ugliness of the IDE cable, and non-hotswappable capabilities?

            You want the dangerous answer?

            I used to do this, and never did blow my IDE interface, as some say I should have (try at your own risk).

            Buy some IDE removable drive bays (one per drive) -- $20 each. Put the drives into the sleeves, and hook up a bay to your computer. Simply remove the drive sleeve and replace it with another when you want to. Obviously this drive can't have any active data on
          • At consumer level I'd get serial ATA, which has a thin cable and is hot swappable, although I think the drive has to suppport how swap.

            About speed, I'm not really sure, I only have 2 drives at the moment, and nothing in the PCI 64 slots, but at least the available bandwidth wouldn't be a bottleneck. Of course, it also depends on how fast those drives are. I'm pretty sure there are drives that are noticeably faster than mine.
    • Re:How-to (Score:5, Informative)

      by dubl-u (51156) * <2523987012@@@pota...to> on Sunday April 27, 2003 @10:27AM (#5819229)
      That's a good approach. At 8 drives a day that's 250 a month for a station that you can build for well under $1000. I'd only add that you may need to tune this based on the failure modes that you are seeing.

      For example, if it's just bad spots, then you'll want to do as many reads and writes as possible. For that, the fastest thing would be a little C program that reads and writes different patterns to the raw device linearly.

      On the other hand, if the failures are tied to seeks, you'll want to write to semi-random locations on the device, to force maximum seeks. Or if you see a mix of both, then your best bet might be to follow m0rph3us0's plan, perhaps tweaking it a bit to better simulate normal filesystem efficiency (and you can just do bit compares rather than md5sums if CPU is an issue).

      You should also keep an eye on heat issues. The burn-in should happen at temperatures that are like what they will be in the end systems. If you pack 8 seeking drives into some cases, they'll cook. If you leave them in the open air, they might not trigger the failures you are seeing in the field. Try to match measured operating case temperature.

      Oh, and don't forget to measure whether this burn-in is really helping. Take stats now, and keep tracking causes of return. It could be that the drives are sensitive to noisy power or vibration or something else that your burn-in won't catch.
    • or maybe 8 hardware raid IDE controllers? Set them to mirror and you could possibly double the number of drives?
  • by dfinster (65564) on Sunday April 27, 2003 @07:36AM (#5818633) Homepage
    In the late '80s I did hard drive repair and we used Wilson and Flexstar equipment for testing and burn-in. I can't find any links to Wilson equipment right now. Flexstar [flexstar.com] had a more extensible architecture and sounds like what you need. I've used the 2550 series RLL and EDSI Flexstar modules (this was the late '80s, we all thought that IDE was a passing fad at the time) and I can verify that the programming language for this equipment was very straightforward. The Flexstar equipment was very reliable. The only trouble we ever had was the cable ends that would naturally wear out from constant plugging and unplugging. We just replaced all the cable ends every two or three months.

  • Slashdotters! If you don't find a story interesting, please don't complain and call Slashdot lame. Just ignore the story. Do you complain to your local newspaper that they should not publish recipes because you don't cook?

    Comment about the Slashdot question: The wording of the question seems to imply that you believe that Maxtor and Western Digital hard drives have an equal failure rate. That has not been my experience. My experience has been that Western Digital are the most reliable hard drives. I'm very interested to know the experience of other readers.

    Western Digital went through a bad stretch in which they experienced a problem that caused high failure rates several years ago, but that was cured.

    It's shocking that you are in the computer business and knowingly shipping products with a 4% failure rate. That's very expensive and annoys the customers.

    However, you are on the right track. Electronic products have what is called "infant failure". Most failures occur in the first week. During 192 hours (one week), the failure rate falls typically by a factor of 100 or even 10,000. At the end of one week most failures have already happened.

    It's very easy to write a program that exercises a hard drive. Just copy files back and forth from folder to folder. It is easy to write a program that fills a hard drive with files, then erases them and starts again.

    The Promise Ultra133 TX2 [promise.com] supports adding four more hard drives to the 4 already supported by modern motherboards. Eight is enough for one test computer, usually, because the power supply won't support more. Be careful to use delayed start. Maybe you will need more powerful power supplies than you normally use.

    Make SURE that you are not having troubles with heat. Are your drives cool when they are installed in your product? High heat will cause high failure rate.
    • Addendums to your message:
      With a true 400 watt power supply, you can easily power 16 drives reliably. For reference, 8 drives pull a total of about 5-6 amps on 12v spin up, for about 1 second, then together use less than an amp on 12v, and very little 5v. This is based on testing with Maxtor 5400rpm drives, 7200 probably use a little more, and other brands may vary.

      Power specs given in hard disk spec sheets are mostly boilerplate and do not reflect actual power consumption, the actual consumption is usual
      • Interesting.

        Intel motherboards have a BIOS setting called "Hard Disk Pre-Delay". The system waits for the hard drives to spin before it tries to detect them.
        • What your parents are talking about, however, is not pre-delay detection.

          They are talking about actually delaying the spin-up sequentially to save your system from the initial power draw of the drives all spinning up at once.
          First one drive starts, drawing about an amp. Then, once it is spun-up, the next one starts. This continues for each of your hard drives.
          In this way you do not have a 5-10 amp draw when you turn on your system, as that is a very good way to cook your power supply.

          • Yes, exactly. However, cooking the power supply is not a problem, since all power supplies have overcurrent protection. The problem is that the BIOS begins its detection process before the power supply has stabilized enough to provide the correct voltage, due to the unusual load. When the detection fails, there is an error message. So the BIOS pre-delay can be helpful.
            • In theory. :) As a recent hardware site review showed, many power supplies burned up at or before their rated power. They didn't review any of the good brands though, so I think you would be OK with one of the big three (enermax, antec, PC power+cooling)

              In any case, it is more stressful on the components to surge at startup. It's not really much of an issue for servers that stay on all the time, since they probably go through the spin-up stress less than once a year, if that much.
      • I believe there have been some ATA drives which have some support for delayed start - i.e. there's a jumper setting to delay spinup until they're initalized by the BIOS..
      • Sorry if this sounds really dumb but....

        Why can't you use multiple power supplies for the drives themselves? As far as I know, there is no requirement for the power supply to supply both disks and system. This would eliminate the need to have a "Spin-up" option in BIOS as the drives to be tested would already be powered up.

        For example, you could easily have two additional, external power supplies and plug four drives into each. Simply power the drives up first (count to ten or whatever), then the syste
        • You can do that, just have to make sure the grounds don't float and are tied together.

          As far as powering them up in a sequence, there is no need to do that really, you can just turn on all the power supplies with the same switch. That's a little trickier to do with ATX, but cyberguys sells a adapter to make an ATX power supply act like an AT one with an external switch, and AT style motherboard connector. Or, since you aren't using the motherboard connector, you could just send the power supply the on si
  • by Fweeky (41046)
    Over what period? If that's over anything less than five years, I'd perhaps be looking towards the conditions the drives are in; are they well ventilated, or near any hot components? Keeping a drive cool can reduce failure rate by ~30% (based on a study IBM did on their SCSI drives); keeping them too hot can drastically increase it. Don't underestimate the effects a bit of active cooling on a drive could have on reducing early failures too.

    After that, I'd look at maybe trying some different manufacturers
    • by 0x0d0a (568518)
      Over what period? If that's over anything less than five years, I'd perhaps be looking towards the conditions the drives are in; are they well ventilated, or near any hot components?

      You haven't bought a consumer IDE hard drive in the last few years, have you? Quality has gone to the dogs.
  • I used to be involved with a manufacturer, and we used something called an Octet machine for mastering IDE drives for desktops and laptop computers.

    IIRC, there was a feature to test the disks as they were being mastered, but we never ran the machine in this mode due to the time it took to do it.

    You could do 8 disks at a time, hence the name, I did a Google, but couldn't find you a manufacturer.

    It looks like a elongated cash register, with an area covered with padding to site the drives, it can be connect
  • Why not do what this guy [slashdot.org] did?

    Sheesh -- an Ask Slashdot that's already been answered on Slashdot! Not exactly a duplicate post, but apparantly the Editors aren't the only ones who don't read /.

  • Case solution (Score:3, Informative)

    by lusername (248616) on Sunday April 27, 2003 @11:20AM (#5819517) Homepage

    We built some disk arrays using a front-loading IDE case with drive trays. This one is pretty pricey but it's _nice_ hardware:

    https://www.rackmountplus.com/spec.asp?ID=RMAC4D -IDE

    That, plus a couple RAID cards (like 3ware's new 12-port cards) in a 64/66 PCI slot and bonnie++ would do a pretty good job of burning in your drives. You could flip drives in and out in a few seconds.

  • Why not (Score:3, Interesting)

    by Froze (398171) on Sunday April 27, 2003 @01:58PM (#5820246) Homepage
    buy a few IDE raid cards and set them up raid one? This impliments a full mirror of data on the raided devices. Then perform burnin on the raid device.

    Note: I have never implimented raid and am not an expert, so this idea would need to be independently verified.
  • Get several IDE adapters and run the cables out the back of the box. Use an another power suppply to spin the drives. I do not have any ideas about hot swapping. You could do a cheap environmental chamber with a cardboard box and no fans to see how the drives do without any ventilation. Then get iometer and write some tests. Be sure to do sevral passes with different byte patterns (00, AA, 55 ,FF) over the whole media. Also through in a large block of random accesses of varying both length and locatio
  • We have several IDE fileservers at work. Each box is equipped with two 3Ware 8-port controllers, and 16 removable drive bays. Stick a 17th drive in there as an OS drive, install Linux, and run benchmark of your choice. Once you're happy with the drives, just pull the bays and swap in new drives.
    • Yep and use software from Extreme Protocol Solutions or someone like them. Yes you can put together your own testing software, but why bother when there are others out there who have already gone through all the variables and problems. They explicitly support 3Ware cards for IDE testing see This [extremeprotocol.com] link.
  • A friend of mine interned at the Seagate R&D plant in Longmont, Colorado last year, doing testing runs for harddrive series, all IDE. They had BIG refrigerator looking things that did automatic testing (or actually any other low-level function) based on commands from a terminal.

    If I'm not mistaken, they just upgraded their cabinets, so it is likely that either there are surplus cabinets around from the various manufacturers, or theres somewhere making em. They might be a bit expensive, but if you're
  • You can power the drives with old AT power supplies which can be had a lot cheaper than ATX supplies these days.
  • FireWire external cases, Many Many disks can hang off them. FireWire allows you to daisy chain up to 63 external devices

    USB, also a usable plan.

    Sadly you may need to use Windows has Solaris just isn't right and Linux has horrible Spaghetti Code for this stuff. Windows for all it's oh so many faults will let you get this up quickest.

  • I was looking for some support tools for my Deskstar yesterday, and ran acrossed this tool for OEM's from Hitachi.

    Hitachi DDD-SI [hgst.com]

    Looking at the User's guide, it looks like you could use it's basic features on non IBM/Hitachi drives. You also might want to check out the other manufacturers sites and see if they've got something similiar.

The reason that every major university maintains a department of mathematics is that it's cheaper than institutionalizing all those people.

Working...