Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Hardware

Hardware For Bulk IDE Hard Drive Burn-In? 51

r0gue_ asks: "I work for a mid-size OEM hardware manufacturer. We ship approximately 300 to 500 IDE HDs every month across all our units. Currently we experience about a 4% failure rate (Maxtor and WDs), though in recent months it has been a couple percent higher. The problem is our systems are dedicated boxes with a non end-user friendly form factor. Virtually every physical HD failure results in an RMA. What we are looking for is a hardware based IDE HD burn-in platform. Something that we could drop a dozen or so drives in at once, stress test them for a day or two, then put them into inventory for builds. I know the HD manufacturers and larger OEMs use them but I have not been able to track down anywhere we could purchase one. Right now moving to SCSI or a form factor that supports externally removable drives is not an option. I was hoping that the Slashdot community could point me in the right direction."
This discussion has been archived. No new comments can be posted.

Hardware For Bulk IDE Hard Drive Burn-In?

Comments Filter:
  • How-to (Score:4, Informative)

    by m0rph3us0 ( 549631 ) on Sunday April 27, 2003 @08:24AM (#5818613)
    Put 8 IDE controllers into a box (more than this maxs out the PCI bus bandwidth) .
    write bash script that checks dmesg for how many drives are in the system and invoked the follwing perl script for each drive.

    Write perl script that does this.
    formats and partitions drive to max size,
    copies a kernel or some other large file onto the disk until it is full.
    monitors syslog for IDE errors
    md5sums the files to make sure they all match.
    reports an error if the MD5 doesnt match.

    unless you get hotswap controllers you will have to reboot everytime you want to test another batch of drives.

    if you dont wish to write this perl script i can be hired to do it for you.
  • Re:How-to (Score:3, Informative)

    by m0rph3us0 ( 549631 ) on Sunday April 27, 2003 @08:27AM (#5818617)
    one more thing, dont put more than 1 drive on each channel as it massively slows down the operation.
  • Comment removed (Score:5, Informative)

    by account_deleted ( 4530225 ) on Sunday April 27, 2003 @08:36AM (#5818633)
    Comment removed based on user account deletion
  • by Futurepower(R) ( 558542 ) on Sunday April 27, 2003 @08:52AM (#5818658) Homepage

    Slashdotters! If you don't find a story interesting, please don't complain and call Slashdot lame. Just ignore the story. Do you complain to your local newspaper that they should not publish recipes because you don't cook?

    Comment about the Slashdot question: The wording of the question seems to imply that you believe that Maxtor and Western Digital hard drives have an equal failure rate. That has not been my experience. My experience has been that Western Digital are the most reliable hard drives. I'm very interested to know the experience of other readers.

    Western Digital went through a bad stretch in which they experienced a problem that caused high failure rates several years ago, but that was cured.

    It's shocking that you are in the computer business and knowingly shipping products with a 4% failure rate. That's very expensive and annoys the customers.

    However, you are on the right track. Electronic products have what is called "infant failure". Most failures occur in the first week. During 192 hours (one week), the failure rate falls typically by a factor of 100 or even 10,000. At the end of one week most failures have already happened.

    It's very easy to write a program that exercises a hard drive. Just copy files back and forth from folder to folder. It is easy to write a program that fills a hard drive with files, then erases them and starts again.

    The Promise Ultra133 TX2 [promise.com] supports adding four more hard drives to the 4 already supported by modern motherboards. Eight is enough for one test computer, usually, because the power supply won't support more. Be careful to use delayed start. Maybe you will need more powerful power supplies than you normally use.

    Make SURE that you are not having troubles with heat. Are your drives cool when they are installed in your product? High heat will cause high failure rate.
  • Re:How-to (Score:3, Informative)

    by vadim_t ( 324782 ) on Sunday April 27, 2003 @10:11AM (#5818922) Homepage
    Yeah, that'd be real fast. The bandwidth of Firewire is less than PCI. But okay, suppose you get several cards. In this case, the bandwidth is still 133MB/s. Assuming that you have all that for your disks, which doesn't include the network card, sound card, overhead, and whatever else you have, that gives 2MB/s per disk. Real fast.

    Now, my motherboard supports PCI 64 at 66Mhz, with a bandwidth of 532MB/s, this would give 8.3MB/s per disk. Still not a lot, and you'd have to find a PCI 64 Firewire card with a lot of connectors, because at least my motherboard has only two slots.

    My 80GB disk can do 40MB/s quite easily according to hdparm, so with my available bandwidth I could support about 13 drives, let's say 10 to compensate for overhead and other things on the bus. I think that 40MB/s is quite near the limit of Firewire, so I might need ATA instead. With two serial ATA cards with at least 5 connectors on each I suppose it'd be possible. Parallel ATA would also work, I guess, but the wiring would be really complicated with so many drives, especially because you want to have a drive per cable for maximum performance.
  • Re:How-to (Score:5, Informative)

    by dubl-u ( 51156 ) * <2523987012&pota,to> on Sunday April 27, 2003 @11:27AM (#5819229)
    That's a good approach. At 8 drives a day that's 250 a month for a station that you can build for well under $1000. I'd only add that you may need to tune this based on the failure modes that you are seeing.

    For example, if it's just bad spots, then you'll want to do as many reads and writes as possible. For that, the fastest thing would be a little C program that reads and writes different patterns to the raw device linearly.

    On the other hand, if the failures are tied to seeks, you'll want to write to semi-random locations on the device, to force maximum seeks. Or if you see a mix of both, then your best bet might be to follow m0rph3us0's plan, perhaps tweaking it a bit to better simulate normal filesystem efficiency (and you can just do bit compares rather than md5sums if CPU is an issue).

    You should also keep an eye on heat issues. The burn-in should happen at temperatures that are like what they will be in the end systems. If you pack 8 seeking drives into some cases, they'll cook. If you leave them in the open air, they might not trigger the failures you are seeing in the field. Try to match measured operating case temperature.

    Oh, and don't forget to measure whether this burn-in is really helping. Take stats now, and keep tracking causes of return. It could be that the drives are sensitive to noisy power or vibration or something else that your burn-in won't catch.
  • Case solution (Score:3, Informative)

    by lusername ( 248616 ) on Sunday April 27, 2003 @12:20PM (#5819517) Homepage

    We built some disk arrays using a front-loading IDE case with drive trays. This one is pretty pricey but it's _nice_ hardware:

    https://www.rackmountplus.com/spec.asp?ID=RMAC4D -IDE

    That, plus a couple RAID cards (like 3ware's new 12-port cards) in a 64/66 PCI slot and bonnie++ would do a pretty good job of burning in your drives. You could flip drives in and out in a few seconds.

  • by mmontour ( 2208 ) <mail@mmontour.net> on Sunday April 27, 2003 @06:13PM (#5821084)
    You can't go by brand alone - at some point every manufacturer has had a line of bad drives.

    StorageReview [storagereview.com] has a Drive Reliability Survey that lists statistics for many drive families. For example, WD 205Bx drives are near the top of the rankings (99th percentile) while the 600Ax is near the bottom (10th percentile).
  • by GigsVT ( 208848 ) * on Sunday April 27, 2003 @08:26PM (#5821729) Journal
    You can do that, just have to make sure the grounds don't float and are tied together.

    As far as powering them up in a sequence, there is no need to do that really, you can just turn on all the power supplies with the same switch. That's a little trickier to do with ATX, but cyberguys sells a adapter to make an ATX power supply act like an AT one with an external switch, and AT style motherboard connector. Or, since you aren't using the motherboard connector, you could just send the power supply the on signal the same way the motherboard does (read the ATX spec).

    Most redundant power supply systems are really parallel power supply systems, so if you really need something like 800 watts, just look for a case that can take redundant supplies and put two 400s in, or two 480s, or whatever you can afford. :)
  • by afidel ( 530433 ) on Sunday April 27, 2003 @08:44PM (#5821790)
    Yep and use software from Extreme Protocol Solutions or someone like them. Yes you can put together your own testing software, but why bother when there are others out there who have already gone through all the variables and problems. They explicitly support 3Ware cards for IDE testing see This [extremeprotocol.com] link.
  • by slaker ( 53818 ) on Sunday April 27, 2003 @09:32PM (#5821966)
    I'd suggest Samsung. Yes, I'm being serious. Even the best of their drives is slow, but "slow" just means that the 7200rpm 80GB Samsung brings up the tail of the pack of _current_ ATA drives, performing better than current 5400rpm entries from WD and Seagate and just a hair slower than current Seagate and Maxtor drives. Before someone jumps on me about performance, do TRY to keep in mind that any current ATA drive is going to be substantially faster than any two-year-old ATA drive, mainly due to the benefits of increased platter density.

    My main reason for suggesting Samsung, aside from the joys of a real 3 year warranty and the fact that Samsung drives really are value-priced, is that my return rate, and the return rates of several other resellers I know, has been exceedingly low.
  • Re:How-to (Score:2, Informative)

    by torpor ( 458 ) <ibisum AT gmail DOT com> on Monday April 28, 2003 @03:27AM (#5823173) Homepage Journal
    How do you - at a consumer'ish level - fix the ugliness of the IDE cable, and non-hotswappable capabilities? I would think being able to load disks on and off without having to do a full system re-boot would be pretty advantageous... more important than speed concerns, anyway.

    Can your PC really do sustained writes to >8 drives without getting into performance issues?

    Actually, speed is a good point. I guess I hadn't thought about that so much as part of the setup, but I guess if you're exercising disks to make them fail, you want to do it as fast as possible... even though, in your end-system where the disks will be used, a Firewire stress-test may be more realistic in terms of disk behaviour.

    Either way, there's tradeoffs.

Neutrinos have bad breadth.

Working...