Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Data Storage

How Do You Backup 20TB of Data? 983

Sean0michael writes "Recently I had a friend lose their entire electronic collection of music and movies by erasing a RAID array on their home server. He had 20TB of data on his rack at home that had survived a dozen hard drive failures over the years. But he didn't have a good way to backup that much data, so he never took one. Now he wishes he had.

Asking around among our tech-savvy friends though, no one has a good answer to the question, 'how would you backup 20TB of data?'. It's not like you could just plug in an external drive, and using any cloud service would be terribly expensive. Blu-Ray discs can hold a lot of data, but that's a lot of time (and money) spent burning discs that you likely will never need. Tape drives are another possibility, but are they right for this kind of problem? I don' t know. There might be something else out there, but I still have no feasible solution.

So I ask fellow slashdotters: for a home user, how do you backup 20TB of Data?"
Even Amazon Glacier is pretty pricey for that much data.
This discussion has been archived. No new comments can be posted.

How Do You Backup 20TB of Data?

Comments Filter:
  • Hmmm... (Score:5, Funny)

    by Anonymous Coward on Wednesday March 12, 2014 @09:32AM (#46462839)

    I would say use floppies, but I'm kind of old and out of touch now.

  • reduce the amount (Score:4, Interesting)

    by JeffSh ( 71237 ) <jeffslashdot&m0m0,org> on Wednesday March 12, 2014 @09:32AM (#46462841)

    At home, I didn't feel like paying for 2 large arrays to store my data, so if I rip any media, I always rip it to DIVX. 800 MB for a DVD or even bluray rip is a great economy, saves me money on primary storage and also enables me to back it up. I accept the loss of quality as I can always reference the original media if I want.

    Another option in the future may be subscription services which have HD content, thus eliminating my need to roll my own. We'll see what happens there.

    • Re:reduce the amount (Score:4, Informative)

      by buchner.johannes ( 1139593 ) on Wednesday March 12, 2014 @09:40AM (#46462977) Homepage Journal

      20TB is not out of the world. With a RAID of 4TB disks you can cover that at home, and it doesn't need to be on all the time. Maybe you can reduce the amount of disk usage by reducing duplicate content using bup [github.com] or an appropriate FS.

      • 20TB is not out of the world. With a RAID of 4TB disks you can cover that at home, and it doesn't need to be on all the time.

        Sure, it's easy to have 20TB of usable disk space (I've got forty 2TB drives spread among 5 servers at my house), but 20TB of "must be backed up because that's the only copy" is a little unbelievable for a home user.

        For example, I have 700 Blu-Ray movies that have been ripped and re-encoded to take about 2TB of disk space. If I had 30-40TB available, I might store the raw Blu-Ray images, but then I don't need backup, as the data is easy to re-create. So, I'm a little skeptical that the "friend" in TFS had

    • by cdrudge ( 68377 ) on Wednesday March 12, 2014 @09:49AM (#46463093) Homepage

      I always rip it to DIVX. 800 MB for a DVD or even bluray rip is a great economy,

      I do the exact same thing with high res pictures. I immediately will take the full resolution raw image and convert it down to a 320px gif. Or maybe a 10% quality jpeg. You get great economy that way too. Who wants to keep a 30+MB image around when you can have almost the same thing in 10kB instead!

    • Re: (Score:3, Insightful)

      by edxwelch ( 600979 )

      > I always rip it to DIVX. 800 MB for a DVD or even bluray rip is a great economy
      I do that as well, but I found out to my horror that all my DVD's had become unreadable over time. So, probably good idea to test your backups from time to time

      • I do that as well, but I found out to my horror that all my DVD's had become unreadable over time. So, probably good idea to test your backups from time to time

        That's why you should reserve 10-15% of the disk for parity data. While DVD-R format has built in error-recovery at the sector level, by the time you figure out that the disk is going bad it is too late. By adding even more recovery data at the file level, you can treat the disk errors as an early-warning system, then use the recovery data files
    • Re:reduce the amount (Score:4, Informative)

      by FreonTrip ( 694097 ) <`freontrip' `at' `gmail.com'> on Wednesday March 12, 2014 @10:16AM (#46463471)
      I'd switch to x264 in an MKV container - you can get the same quality in about 3/4ths the file size without even being clever.
      • x264 is great, but why mkv? I've had terrible compatibility with mkv files in portable players or tablets, and proprietary software. Many won't even recognize them. I rarely have issues with mp4. Sometimes a user has no choice on software/hardware.

    • by Artraze ( 600366 ) on Wednesday March 12, 2014 @10:33AM (#46463685)

      Agreed.

      Regardless of whether or not 20TB is hording / excessive / inefficient, what it almost certainly is is replaceable. Let's face it, you aren't CERN, most of you data is probably media that you can reacquire with relative ease. It's not being stored because it's irreplaceable it's being stored because it's convenient. A RAID isn't too bad, but add in managing backups and where has that convenience gone? If it costs $10+/month to backup your ripped/downloaded movies, why not just sign up for Netflix?

      Just make a list of all the replaceable data (e.g. videos you have the original disc for) you have and then buy an external hard disk / Blurays to back up the rest. If you lose your RAID, well, it'll be annoying to rebuild, but you built it once... (Besides, I doubt you could restore 20TB over residential internet less time!)

      • Agreed.

        Regardless of whether or not 20TB is hording / excessive / inefficient, what it almost certainly is is replaceable. Let's face it, you aren't CERN, most of you data is probably media that you can reacquire with relative ease. It's not being stored because it's irreplaceable it's being stored because it's convenient. A RAID isn't too bad, but add in managing backups and where has that convenience gone? If it costs $10+/month to backup your ripped/downloaded movies, why not just sign up for Netflix?

        Just make a list of all the replaceable data (e.g. videos you have the original disc for) you have and then buy an external hard disk / Blurays to back up the rest. If you lose your RAID, well, it'll be annoying to rebuild, but you built it once... (Besides, I doubt you could restore 20TB over residential internet less time!)

        Some people have different use cases. A few years back I was visiting a friend in the boonies in Egypt and brought a TB of American movies and music along explicitly for her (she was putting me up for free while I was on a research project). With my 50Mbps connection and 250GB monthly cap, I could recreate the entire shebang in 4-5 months, but with her iffy ISP, she couldn't hope to download everything from me in her government's lifetime.

      • Regardless of whether or not 20TB is hording / excessive / inefficient, what it almost certainly is is replaceable. Let's face it, you aren't CERN, most of you data is probably media that you can reacquire with relative ease. It's not being stored because it's irreplaceable it's being stored because it's convenient.

        There's a large, flawed, assumption running through this thread that it's "easy" to reacquire all media.

        Between torrents and usenet, trying to find good rips of content that's over 6-12 months old is often impossible. Good luck trying to find any older show that's not sci-fi or super popular. Often studios don't sell DVDs or Blurays of these shows anymore, if they even did to start with.

    • Re: (Score:3, Insightful)

      by jedidiah ( 1196 )

      If you've already got one large array then you are already by definition half way there. If you then decide not to go the rest of the way then you are at the same time being both extravagant and a cheap bastard. It's a wonderfully stupid paradox.

      If you've got one then you should get the 2nd one or not bother with the first one to begin with.

  • Crashplan (Score:5, Informative)

    by rossjudson ( 97786 ) on Wednesday March 12, 2014 @09:34AM (#46462849) Homepage

    Crashplan has unlimited storage. I use their home plan; it's unlimited for up to 10 machines. I think I am backing up about 6TB there now.

    • I agree but... (Score:5, Informative)

      by Matteo De Felice ( 3574477 ) on Wednesday March 12, 2014 @09:53AM (#46463155)
      I agree, I've been using Crashplan for three years and the unlimited space it's really great BUT... ...I'm not sure about the bandwidth they provide: how long it will take to upload 20 TB? Anyway, I don't see what's the problem in using external drives for backup. Here in my lab I've realized that the best way to backup X Terabytes is to have another storage with X Terabytes...
    • Re: Crashplan (Score:5, Interesting)

      by Anonymous Coward on Wednesday March 12, 2014 @10:04AM (#46463303)

      Crashplan offers unlimited storage, yes, but they limit it indirectly by slowing down uploads.

      I recently paid for a crashplan account to back up ~6TB of media, and at the speeds I'm seeing the initial backup is going to take more than a year. I have 100Mbit/s fiber at my home and can max it easily with other services.

      So for 20TB, it's going to take many years to back up. I don't think that's a practical backup solution. There's a decent chance you're going to lose your data before the initial backup completes. And if crashplan goes under, you have to start all over again with the next "unlimited except for rate" provider, and have no backup in the meantime.

      • While I've never had the throttling issue that you have, I do want to point out that they accept seed drives if your initial backup is taking a long time.

    • by Skreems ( 598317 )
      Watch out for Crashplan in certain cases. I'm currently trying to restore a 50GB file, and it keeps restarting the download halfway through. Their support is useless, basically leaving it at "sorry dude, nothing we can do". For other files it's been good, but their testing and support of edge cases doesn't seem especially solid.
  • by DogDude ( 805747 ) on Wednesday March 12, 2014 @09:35AM (#46462863)
    I have a 16 TB media collection at home that I just back up on more hard drives.

    External hard drives in USB cases + Robocopy works great for me.
  • but you need real backup software. As you fill up drives you replace it and continue the backup until you have a full backup. This way you can take them off site. Like any other backup solution, make sure you test the drives every few months to make sure your data is not corrupt and have a failed drive.
  • by polymeris ( 902231 ) on Wednesday March 12, 2014 @09:36AM (#46462881)

    > It's not like you could just plug in an external drive [...]
    Why not? Maybe not one, but 10 or 20 of them.

  • I have a similar situation; 18.6 TB RAID-Z at home (8 3TB drives) using FreeNAS and with the new update it shows it was initially set up using a non-native block size (I was a bit naive regarding the settings when I first set it up) and I'd like to rebuild it but I have no way to backup 14+ TB. Also, I would like to have a backup in case more than one drive dies (1 parity works well but I could still suffer a catastrophic failure). I've looked into tape backup but anything that seems like it'd have enough s
    • by afidel ( 530433 )

      You can get an LTO4 SAS drive for ~$50 on ebay, they do 800GB native per tape, so typically ~1.2TB per tape for mixed content (obviously if it's all compressed media it will be much closer to native). 10-20 tapes doesn't seem that bad (we send that many offsite daily). The tapes will cost you ~$20 each unless you're willing to go used (ewww).

  • If your ISP doesn't have data caps, look at Backblaze ( http://www.backblaze.com/ [backblaze.com] ). $5 / month for unlimited storage for one computer. Only available for Mac and Windows, but I'm sure a virtual instance of Windows if you're using a Linux box would work... These are the folks that opensourced their hardware design for their storage "pods." http://blog.backblaze.com/2011... [backblaze.com]
  • Why not store it all in 20000 github repositories?
  • by Anonymous Coward on Wednesday March 12, 2014 @09:39AM (#46462949)

    "My friend (read I) lost 20TB of pirated content! What should my friend have done different?"

    How about, ask yourself, how much of that content were you intending to ever consume again. Yeah, you can most likely delete 95% of it, that's 1TB of content that you might use again.

    Hoarders! *lol*

    • by ustolemyname ( 1301665 ) on Wednesday March 12, 2014 @09:48AM (#46463077)

      Not all of us have access to the time machine required to know *which* 1TB that is.

      Are you willing to share yours?

      • by LordLimecat ( 1103839 ) on Wednesday March 12, 2014 @10:52AM (#46463895)

        This problem isnt unique; most people have trouble curating their data. That doesnt change the fact that the problem is mostly self-created, and the best solution isnt to find another place to stuff the 20TB. Its to take the time to cull it down to a reasonable size and then back it up.

        • You don't have to cull it down, you just need to organize it into logically distinct groups and assign them priorities. Hoarding isn't the problem, the problem is assigning too high a priority to the hoarded pr0n as compared to the really important stuff.

          • Group #0:

          Contents: Documents, source control repositories, user preferences, email archives
          Maximum Size: 10GB
          Protection: 3-way Mirror + Snapshots + Offsite
          Total Space Required (way upper bound): 150GB
          Total Cost: $3 a month for Crashplan

          • Group #1:

          Contents: Pe

    • How is this insightful. This is the typical useless - "Why would you want to do that, you idiot???" - non-constructive Slashdot answer that drives me insane. In short: Ask Slashdot means "Answer the fucking question", not "Attack the question". /Rant
  • Good luck. (Score:4, Informative)

    by TheCarp ( 96830 ) <sjc.carpanet@net> on Wednesday March 12, 2014 @09:41AM (#46462985) Homepage

    A quick check at one service which lists such large amounts, you would be looking at almost $20k/year to keep a single offsite copy of that. That is the posted price however, I imagine that is enough that you could shop around and find a deal, but, a deal is still going to be prohibitive for most people.

    At 20 TB I would start thinking about one of two things: Tape, and/or git-annex.

    Unless prices have changed since I last looked and the scales tipped, tape has the advantage of being cheap. Of course, you will need to test your tapes occasionally and likely want 2 copies just in case, but, at that point you are invested in tape, may as well.

    The other possibility is git-annex and lots of drives, but you can mix types. That way you can keep a catalog of your library and information on where it all is, and how many copies of each thing you have.

    Of course, any way you slice it, each physical piece of media is something that can fail so you have to occasionally test to ensure redundancy.

  • Don't hoard (Score:5, Insightful)

    by rainer_d ( 115765 ) on Wednesday March 12, 2014 @09:43AM (#46463019) Homepage
    Were those 20T of original movies and music or just stuff he downloaded via bittorent?

    He could have always bought a sufficiently large tape-library from ebay - but I guess the data wasn't worth that much.
    That's always the first pair of questions to ask: how much is it worth and how much would it cost to recreate?
    If the answer is somewhere between "I don't know" and "Well, it's not that much", then he just should stop hoarding that much stuff.

    He could have built a filer with ZFS and sent daily snapshots to a 2nd filer - but that wouldn't have helped him if the house burnt down...

  • My solution (Score:5, Funny)

    by StripedCow ( 776465 ) on Wednesday March 12, 2014 @09:45AM (#46463049)

    Figure out the theory of everything.
    Then you can always recompute your data from scratch.

  • As you noted, Bluray holds a lot of data, but would take some time. Since its audio/video media, odds are most of it is pretty stagnant. I'd do an initial rsync job to write out to Bluray... then once a month or so repeat the job but now rsync will only get what's changed. Depending on the media type and age, you could also look at dedup'ing it and if the dedup'd copy is significantly smaller than the source you might be able to put that onto say one or two 3-4Tb drives.
  • You could always just call up the NSA and ask them to restore the data. Odds are good they have a copy of it...
  • by Cmdr-Absurd ( 780125 ) on Wednesday March 12, 2014 @09:51AM (#46463119)
    Same as always.
  • by Ktistec Machine ( 159201 ) on Wednesday March 12, 2014 @09:51AM (#46463125)

    Whenever you buy storage, you should buy the necessary backup capacity at the same time. You should never buy storage without buying backup capacity. Budget for it right from the start. If you can't afford the backup, you can't afford the storage. This may mean getting half as much storage as you'd like, but that's just the way it has to be. You probably wouldn't buy a car without an engine. It wouldn't do its job. So don't buy storage without backup. If you do, you have a storage system that can't do its job.

  • Hilarious (Score:5, Insightful)

    by jeffmeden ( 135043 ) on Wednesday March 12, 2014 @09:56AM (#46463189) Homepage Journal

    It's not like you could just plug in an external drive, and using any cloud service would be terribly expensive. Blu-Ray discs can hold a lot of data, but that's a lot of time (and money) spent burning discs that you likely will never need. Tape drives are another possibility, but are they right for this kind of problem? I don' t know. There might be something else out there, but I still have no feasible solution.

    Lets start from the top: You *can* plug in an external drive, it's called a complete hardware duplicate of your array (or perhaps for space/cost consideration, a single disk based copy held offline and synced regularly). Not hard and not terribly expensive (i would go with this solution personally). Cloud? Yep the bandwidth and storage even on something like Amazon Glacier would be prohibitive to all but the most financially independent geeks. Bluray doesnt hold enough (even at 50gb/disc you need 400 of them, groan). So, tapes? You bet your ass tapes are designed to do exactly this task, why do you think they are still in use? You can get individual tapes at 1/1.5TB, but for a one man operation they are probably going to cost you more than the first solution (offline spinning disks) and they are a pain to manage properly.

    Now what is this doing on ask slashdot? A pencil, some scratch paper, and 15 minutes between amazon.com and newegg.com would tell you the prices of every solution. Oh, right, they need a chance to tee up some targeted ads for Carbonite, Mozy, Crashplan, etc.

  • by jones_supa ( 887896 ) on Wednesday March 12, 2014 @09:57AM (#46463197)

    How about backing up only the crown jewels of the collection?

    Make a directory like /entertainment/premium and put the best stuff there, with a 4 TB limit. Rotate two external 4 TB HDDs and copy the stuff over periodically. Put a little sticker or some other mark on the newest, so you remember which one it is. If your main RAID array fails, build a new one, and restore the premium stuff from the most recent one of the two external disks.

  • Amazon Glacier (Score:3, Interesting)

    by uiucgrad ( 325611 ) on Wednesday March 12, 2014 @09:59AM (#46463231) Homepage

    I use Glacier and its great. 20 TB is about $200 a month which to me does not seem like all that much money for backing up that much data. The biggest problem from a home users perspective is getting all of that data to Amazon. Hopefully he lives somewhere where fiber is available to his house.

  • by Coeurderoy ( 717228 ) on Wednesday March 12, 2014 @09:59AM (#46463239)

    Connect a raspberry pi and configure it as a backup server and let it copy all to /dev/null...
    Then put aside the money you would have invested in a "better" solution, put it in a safe bank (under your mattress)
    and wait until you need to restore something..
    Most probably you'll enjoy the money more ...

  • by obarthelemy ( 160321 ) on Wednesday March 12, 2014 @10:06AM (#46463321)

    1- if you need to backup 20 TB today, you need to budget for 40TB in the medium term.
    2- a backup is off-line, off-site, tested, and multiple. The "multiple" part is pricey, and the other 3 you can get cheapest with a PC filled with HDs. Or two (I'm making do with one). $200 for the BC, $150 per 4TB HD x 5 = $950. Hide that backup in a place safe from theft, floods, fire...

  • by Lawrence_Bird ( 67278 ) on Wednesday March 12, 2014 @10:10AM (#46463385) Homepage

    There are many > 1TB tape back up systems, many with very high speeds, assuming you can feed it data fast enough.

    I have to wonder though.. 20TB for a single person? I'm not gonna do the math but that sounds like so much stuff to be impossible to listen/watch all of it.

    But at least he has proven once again, RAID is not a backup. RAID will merrily do what ever you wish, including copying drive corruption.

  • by account_deleted ( 4530225 ) on Wednesday March 12, 2014 @10:16AM (#46463477)
    Comment removed based on user account deletion
  • by cyn1c77 ( 928549 ) on Wednesday March 12, 2014 @10:24AM (#46463571)

    What does it mean that he didn't have "a good way to backup that much data, so he never took one"?

    The concepts behind backing up data have not changed. You need to manage the size of your data to redundantly fit into the storage of your system. So either pony up the cash and time to properly store your files, stop collecting TBs of crap, or stop complaining about losing it when your system crashes.

    It's frustrating to see people continuously complaining about how they have too much data to back up cheaply and conveniently. It's even more frustrating to see them complaining about losing all of their data because they didn't back it up properly.

    I think that the main issue is that most people do not realistically or conservatively plan their actual storage capability. For example, it seems like 90% computer users believe that having 4 TB of hard drive space means that they can safely store 4 TB of data.

    After a conversation about scratch space, redundant drives, and timestamped backups, they then will grudgingly agree to allocate 25% of their available storage to RAID/Backup space, which obviously does not get the job done! Very few are willing to accept using 66% of their available hard drive space for RAID and Backups, which is really the minimum metric for any sort of storage longevity.

    20 TB is an awkward amount of data for a non-corporate individual to be storing. It's more data than most people actually need for their media and it is getting into a very expensive price range to backup for basic music/movie content. (By expensive, I mean that it would be cheaper to just re-purchase the media rather than back it up.)

  • by advid.net ( 595837 ) <slashdotNO@SPAMadvid.net> on Wednesday March 12, 2014 @10:25AM (#46463599) Journal

    To /.ers saying that 1TB+ tapes would be a good idea to do this backup, please:

    Add some references and price of such hardware and media that would suit best home usage.

  • by mr.dreadful ( 758768 ) on Wednesday March 12, 2014 @11:59AM (#46464733)
    Do you really need to back-up that much data?

    I'm just speaking generally here, there are certainly cases where someone would need to back up this much data, but for your home media library? If we're talking movies, 20 TB is roughly 20,000 movies (for sake of argument, I'm not considering music). At what point is this just digital hoarding? I used to keep a large collection of movies, mostly pirated, and eventually realized that:

    a) I was spending more time and money managing the collection then I wanted to. b) That I rarely watched many of the items in my library. c) That I was placing myself in legal jeopardy by storing so many illegal copies. d) Anything I did want to re-watch I could get from Netflix, the public library, or download.

    Music would be slightly different, as I could see where music is in some kind of constant rotation, but again, how much of it are you actively using? I'm just playing devil's advocate here, but I think this kind of collecting/hoarding is a byproduct of pre-internet scarcity.
  • by anewsome ( 58 ) on Wednesday March 12, 2014 @01:37PM (#46465829)

    I really did get a kick out of some of these responses. I sell data protection products for a living and 20TB is what I would consider an average small/medium customer. Every business these days has tens of terabytes of data. Of course they all need to backup their data, so there is nothing novel here. We have plenty of customers backing up hundreds of petabytes of data. Every dataset just needs a plan for backup, pretty simple.

    The way I see it, this guy has a few options. One option is to just get more disk and make redundant a redundant copy. This would have have saved him in this case of the mistakenly erased raid, depending on how smart his sync script is. But a redundant copy is not a valid genuine backup plan. So many types of failures will show the holes of the dumb redundant copy.

    The other option for a home user who's not looking to spend a bunch of money, is LTO6. They hold a sufficiently large amount of data, so only a handful of tapes will be needed. LTO6 drives are cheap enough, they won't break the bank. Since the data is on tape, you can shuttle the tapes to an off site location. Seems pretty simple.

  • by cbeaudry ( 706335 ) on Wednesday March 12, 2014 @08:13PM (#46469777)

    http://www.flexraid.com/ [flexraid.com]
    http://lime-technology.com/ [lime-technology.com] (UnRaid)

    Best solution for big media collections.

    All data is stored seperatly on each drive, and 1 separate parity drive can protect up to 21 drives (as long as its as big or bigger than any 1 of those 21 drives).

The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.

Working...