Ask Slashdot: How Do You Test Storage Media? 297
First time accepted submitter g7a writes "I've been given the task of testing new hardware for the use in our servers. For memory, I can run it through things such as memtest for a few days to ascertain if there are any issues with the new memory. However, I've hit a bit of a brick wall when it comes to testing hard disks; there seems to be no definitive method for doing so. Aside from the obvious S.M.A.R.T tests ( i.e. long offline ) are there any systems out there for testing hard disks to a similar level to that of memtest? Or any tried and tested methods for testing storage media?"
Why? (Score:5, Insightful)
Even if your storage passes the test, it could fail the next day. What you should be doing is designing your storage to gracefully handle failure, like RAID 5 with spares.
Re:Why? (Score:5, Insightful)
No, the point is to design your system so that if it fails 2 weeks down the line... it isn't a problem.
Re:Why? (Score:5, Insightful)
And then what you should test is that it actually notifies you when something does fail, so you know about it and can fix it. You can also test how long it takes to rebuild the array after replacing a disk, and how much performance degradation there is while that is happening.
Hard Disk Sentinel (Score:3, Insightful)
Even with that, using the SMART data, in a SMART way, still only predicts about 30% of failures. The other 70% will come out of no where. That is why it is best to assume all drives will die at anytime and are suspect and never allow a single drive to be the sole copy of anything.
Are you testing an array or individual drives? (Score:5, Insightful)
I manage a team that oversees PB of disk, both within an enterprise array and internal to the server. For testing the arrays, since there's GB of cache in front of the disks, I can only rely on the vendor to do the appropriate post installation testing to make sure there are no DOA disks. For internal disks, as others have mentioned you could run IOMeter for days without a problem and then the very next day it's dead. Unlike memory, disks have moving parts that can fail much easier than chips. However, with proper precautions like RAID, single disk failures can be avoided.
The bigger problem is having a double disk failure. This is due to the amount of time required to rebuild the failed disk. Back when disks were 100GB this was a "relatively" quick process. However, in some of my arrays with 3TB drives in them, it can take much longer to replace the drive. Even to the point whereby having hotspares has been considered to be not worth it as my array vendor will have a new disk in the array within 4hrs. With what an enterprise disk costs from the array vendor (not Frys), it can start to add up.
Re:Why? (Score:5, Insightful)
Point is: You can't 'test'.
You can only tell if it's working, not when it's about to fail.
If people could predict when hard drives were going to fail we wouldn't need RAID or backups.
Re:scsi (Score:4, Insightful)
Perhaps an honest mistake, the link is broken. Second, evidence has shown SATA are more reliable than commercial/enterprise grade drive. Only buy those if you don't like your money, or there is some clear advantage. That supposed advantage is not reliability, unless there is there is some sort of rapid replacement mechanism coming with the drive. Although replacement isn't reliability in my book.
http://lwn.net/Articles/237924/ [lwn.net]
Re:Why? (Score:3, Insightful)
A plastic strap won't save you from the drive head failing to move. I've seen this happen when a bunch of unemployed temp workers unload the truck. This is why it seems "batches" of similar drives fail if you are getting them from the same source... some asshole was throwing and kicking the boxes around.
If your static strap is made of (all) plastic, then you will have issues beyond shipping and handling woes...
Comment removed (Score:4, Insightful)