Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Data Storage Media IT

Long-Term Storage of Moderately Large Datasets? 411

hawkeyeMI writes "I have a small scientific services company, and we end up generating fairly large datasets (2-3 TB) for each customer. We don't have to ship all of that, but we do need to keep some compressed archives. The best I can come up with right now is to buy some large hard drives, use software RAID in linux to make a RAID5 set out of them, and store them in a safe deposit box. I feel like there must be a better way for a small business, but despite some research into Blu-ray, I've not been able to find a good, cost-effective alternative. A tape library would be impractical at the present time. What do you recommend?"
This discussion has been archived. No new comments can be posted.

Long-Term Storage of Moderately Large Datasets?

Comments Filter:
  • by idiot900 ( 166952 ) * on Wednesday March 03, 2010 @06:25PM (#31351410)

    Hard drives are ridiculously cheap these days, especially for how much data you are storing. You may wish to consider buying drives from different manufacturers but of the same size to put in a single mirrored set. This way if there is a problem with a particular batch of drives it won't ruin everything.

  • by jbridges ( 70118 ) on Wednesday March 03, 2010 @06:39PM (#31351578)

    I would use RAID6 not RAID5, since 2 drive failures means data loss with RAID5, while it takes 3 drive failures to loose data on RAID6.

    Linux MDADM has supported RAID6 for years, it's stable.

    I would mix and match drives, not buying all the same model from one maker. One Samsung, One WD, One Hitachi, One Seagate.

    That gets you 4TB in 4 drives, and unlike a RAID1, any 2 drives can fail with no dataloss.

    You can further ensure no dataloss by making a second copy using different brand drives for each clone.

    Eight 2TB drives is around $1500. Not bad for a very safe 4TB backup.

  • by hardburn ( 141468 ) <hardburn@wumpus-ca[ ]net ['ve.' in gap]> on Wednesday March 03, 2010 @06:42PM (#31351630)

    That's why you hot-swap them. You treat them just like tapes. In fact, once you start doing that, you realize that RAID mirroring isn't helping you any (striping is another matter).

    The best way to backup a big hard drive these days is with another big hard drive.

  • by mengel ( 13619 ) <mengel@users.sou ... rge.net minus pi> on Wednesday March 03, 2010 @06:43PM (#31351634) Homepage Journal

    There's some code lurking in the amanda backup package I did a while back for "RAIT" (RAID with tape instead of disk) to make a stripe-set of tapes, if you need several tapes worth of data in one set, with redundancy.

    On the other hand, while LT04 tapes are about half the price ($40) of cheap 1TB disk drives ($80), the tape drives are ablout $2k apiece, so depending how many data sets you want to keep, and for how long, the disk drives may really be cheaper...

  • Re:Exactly. (Score:5, Insightful)

    by Anonymous Coward on Wednesday March 03, 2010 @06:43PM (#31351640)

    Ok, yes, we see you know a lot about this.

    So what's your recommendation?

  • by trentblase ( 717954 ) on Wednesday March 03, 2010 @06:46PM (#31351680)
    Who wants to burn over 100 discs per client? I guess they make automated disc burners, but that's a little nutso. Plus, with that many discs you have a high chance of failure, so you'll need some kind of ecc scheme (parity discs?). I'd also have to vote for hard drives, although I agree with the shortcomings of that solution.
  • Re:Go with Blu-ray (Score:4, Insightful)

    by eldepeche ( 854916 ) on Wednesday March 03, 2010 @06:50PM (#31351738)

    I thought burned optical discs started to degrade after a few years. Have they solved this problem?

  • Re:Blu-Ray (Score:3, Insightful)

    by snowraver1 ( 1052510 ) on Wednesday March 03, 2010 @06:53PM (#31351768)
    Cheaper how exactly? Even if you could get BR discs at $2 each, it would cost $80/TB, and I havn't seen BR discs even close to that cheap. That doesn't include the writer which I belive are still $400. For the cost of the writer alone, you could purchase 5TB of HDD.
  • by jbridges ( 70118 ) on Wednesday March 03, 2010 @07:46PM (#31352416)

    Just because you had some problems with Samsung means nothing about their general reliability.

    A few specific models have had problems, such as the IBM "Deathstar" models, or the recent Seagate firmware problems, but there is no evidence that whole brands are less reliable.

    Read the Google report on drive brands, there are no clear winners or losers across brand lines in their exhaustive real world tests.

  • by Again ( 1351325 ) on Wednesday March 03, 2010 @09:43PM (#31353324)

    (Or btrfs on a Linux distro)

    Are you honestly suggesting using an in-development filesystem for backup purposes?

  • Re:Exactly. (Score:5, Insightful)

    by TooMuchToDo ( 882796 ) on Wednesday March 03, 2010 @09:46PM (#31353348)
    Either MogileFS, Lustre, or possible Hadoop (depending on the type and size of the data). Any sort of distributed file system where multiple chunks, replicas, etc (3 is a good number, more is better if you have cheap disk and deduping at the filesystem level) are constantly available.

    Feel free to ask more questions.

  • by Vellmont ( 569020 ) on Wednesday March 03, 2010 @11:33PM (#31354140) Homepage


    if it was medical records i'd be storing 5 copies in 5 geographically distinct locations, each with their own backup for the backup. i'd be checking the MD5's each day on all the backups to ensure they can be accessed when i need them

    I can about guarantee you that nobody stores medical records in this way. And realistically, why should they? 5 different locations is insane for just about any piece of data.

    Geeks tend to go overboard when it comes to data paranoia and worry too much about technology, but then forget about all the human problems that go on. Most data loss doesn't occur from some geographic catastrophe where a super volcano destroys half a continent. More often someone changes some critical path of the backup scheme and the whole she-bang comes crashing down. Super-redundant geographic co-location can't save you from one idiot that didn't understand changing one critical name silently took down the backup scheme.

  • by temojen ( 678985 ) on Thursday March 04, 2010 @11:31AM (#31358276) Journal

    If disks in the safe deposit box are fast enough to access, running to the store to buy a generic power supply is fast enough recovery.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...