Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Data Storage

Ask Slashdot: How Do You Test Storage Media? 297

First time accepted submitter g7a writes "I've been given the task of testing new hardware for the use in our servers. For memory, I can run it through things such as memtest for a few days to ascertain if there are any issues with the new memory. However, I've hit a bit of a brick wall when it comes to testing hard disks; there seems to be no definitive method for doing so. Aside from the obvious S.M.A.R.T tests ( i.e. long offline ) are there any systems out there for testing hard disks to a similar level to that of memtest? Or any tried and tested methods for testing storage media?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: How Do You Test Storage Media?

Comments Filter:
  • This is what I use (Score:4, Interesting)

    by Wolfrider ( 856 ) <kingneutron@NOsPAm.gmail.com> on Tuesday April 03, 2012 @01:43PM (#39562363) Homepage Journal

    root ~/bin # cat scandisk
    #!/bin/bash

    # RW scan of HD
    argg='/dev/'$1

    # if IDE (old kernels)
    hdparm -c1 -d1 -u1 $argg

    # Speedup I/O - also good for USB disks
    blockdev --setra 16384 $argg
    blockdev --getra $argg

    #time badblocks -f -c 20480 -n -s -v $argg
    #time badblocks -f -c 16384 -n -s -v $argg
    time badblocks -f -c 10240 -n -s -v $argg

    exit;

    ---------

    Note that this reads existing content on the drive, writes a randomized pattern, reads it back, and writes the original content back. With modern high-capacity over-500GB drives, you should plan on leaving this running overnight. You can do this from pretty much any linux livecd, AFAIK. If running your own distro, you can monitor the disk I/O with ' iostat -k 5 '.

    From ' man badblocks '
    -n Use non-destructive read-write mode. By default only a non-destructive read-only test is done. This option must not be combined with the -w option, as they are mutually exclusive.

  • old timers look here (Score:2, Interesting)

    by vlm ( 69642 ) on Tuesday April 03, 2012 @01:48PM (#39562427)

    OK so that was the noob version of the question.

    I have a question for the old timers. has anyone ever implemented something like:
    1) log the time and temp
    2) do a run of bonnie++ or a huge dd command
    3) log the time and temp
    4) Repeat above about ten times
    5) numerical differentiation of time and temp and also any "overtemps"

    In theory run from a cold or lukewarm start that could detect a drive drawing "too much" current or otherwise being F'd up, or cooling fan malfunction
    I'm specifically looking for rate of temp increase as in watts expended, not just static workload temp.
    In practice it might be a complete waste of time.

    Another one might be something like a smart reported temp vs iostat reported usage plotted on a scatterplot graph.

    So the old timer question is has anyone ever bothered to implement this, and if so, did it do anything useful other than pad your billable hours?

  • badblocks (Score:5, Interesting)

    by Janek Kozicki ( 722688 ) on Tuesday April 03, 2012 @02:56PM (#39563371) Journal

    badblocks -c 10240 -s -w -t random -v /dev/sda1

    that's my standard test for all HDDs

  • Re:Why? (Score:5, Interesting)

    by v1 ( 525388 ) on Tuesday April 03, 2012 @03:18PM (#39563753) Homepage Journal

    The point is to know whether it's faulty now at the time of arrival rather then 2 weeks down the line where it becomes a problem.

    I would disagree. I believe it's best to be able to identify the first moment a hard drive is starting to have problems, rather than the condition its in when you get it.

    One reason is that most of your hard drives will eventually develop a problem, and only a small fraction of the drives you buy will arrive defective.

    Another reason is that nothing of value is on the new drive, you are risking only purchase price. A year from now, you may have important, possibly irreplaceable or at least inconvenient things to replace.

    I run a piece of custom software I wrote that does a slow "disk crawl", reading ~100mb every 5 minutes. Over the course of a month it has read every block on the drive, and starts over. I get an email if an i/o error OR slow performance is encountered. I store a lot here, I have somewhere around 25TB of storage under the roof at home. Over the years I've been notified ~8 times of a failing drive. In all cases I was able to replace it before it became inaccessible. One of them failed to spin up ever again the day after I removed it from service. I consider this a very good system, and am surprised not to see a similar commercial offering. (it's a 5,600 line bash script!)

    SMART is only useful to possibly confirm that a drive has a problem. Only a fool relies on it to notify them when there's a problem. I've probably replaced somewhere around 750 hard drives here at work, and of those, under a dozen were still accessible and displaying a SMART failure. Many times I've had SMART toggle to failed while I was doing data recovery to a replacement drive, as I was fighting my way through I/O errors. Got some Cpt Obvious going on there I think.

UNIX is hot. It's more than hot. It's steaming. It's quicksilver lightning with a laserbeam kicker. -- Michael Jay Tucker

Working...