Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Data Storage Hardware

Bulk Data Storage For The Common Man? 483

Vigyaan writes "Lately, I have been looking into different bulk data storage options available to a common man. My work depends on generating, storing and analyzing a large amount of data -- averaging about 1 TB per month. I would like to have a storage system which is automated, fast, reliable and most importantly does not cost the price of an eye. Right now, I have a 4 node Linux cluster with 10 large hard disks (total capacity 1.6 TB); data storage roughly costs about $0.60/GB (excluding the cost of PC hardware). But long term storage is painful -- DVDs cost about $0.10-$0.15/GB but takes too much human time and leaving data on hard disks makes me nervous because of possible failures. RAID is a possibility, but it increases the cost significantly. I was wondering, if Slashdot readers have any recommendations for a cheap automated way to store and retrieve data."
This discussion has been archived. No new comments can be posted.

Bulk Data Storage For The Common Man?

Comments Filter:
  • by kfg ( 145172 ) on Monday July 05, 2004 @06:58PM (#9616525)
    I was wondering, if Slashdot readers have any recommendations for a cheap automated way to store and retrieve data."

    Although the good ones don't come cheap. I guess this another case of "pick any two."

    KFG
  • Cheap solution (Score:4, Insightful)

    by codeguru73 ( 689454 ) on Monday July 05, 2004 @06:59PM (#9616530)
    Buy some inexpensive IDE drives with high storage capacity and use a software raid solution. What kind of budget do you have anyway?
  • age old problem... (Score:5, Insightful)

    by Lumpy ( 12016 ) on Monday July 05, 2004 @07:00PM (#9616538) Homepage
    Ahh the large amount of data that has X value versus a storage solution...

    If your data is worth $20,000.00 then a $2000.00 solution is dirt cheap.

    what is your data worth? that is where you need to start and then look at the 10-30% of the data's value to start looking at how must to spend on it's storage.

    If 1 month's data was lost forever, how much money would it cost the company? that is your actual $ amount that you should be shopping at.

    and that is how I got the company to buy a $20,000.00 1000 tape DLT jukebox.

    my data is worth over $100,000 a month and is much lower than yours is size.

    That is where you need to start. Justify your storage costs by figureing out what it is worth to begin with.
  • 1 TB/Month (Score:2, Insightful)

    by nikonius ( 17796 ) on Monday July 05, 2004 @07:02PM (#9616551) Homepage
    It does not sound like your needs are anywhere near that of the 'common man'. You sound more like a power user to me. Somethimes you have to pay for heavy-duty storage as the cost of doing business.
  • Cheap and Big (Score:3, Insightful)

    by guamman ( 527778 ) on Monday July 05, 2004 @07:02PM (#9616556)
    Tape Drives - Probably the cheapest way to store large amounts of information. The only drawback is that they aren't fast. However, If your harddrives are large enough to hold the data you are currently working on and tapes are used exclusively for backup then a speed problem shouldn't be . . . a problem.
  • Re:!RAID (Score:5, Insightful)

    by ecalkin ( 468811 ) on Monday July 05, 2004 @07:04PM (#9616563)
    because it protects against device failure, not *user* error. if you delete a file from a raid array, it's gone. that's part of what offline is all about.

    eric
  • by syberanarchy ( 683968 ) on Monday July 05, 2004 @07:07PM (#9616589) Journal
    The best idea I could give you is to just create a sister system, where you mirror all your data. Not cheap, but cheaper than getting a pro-grade solution.

    The reason you won't find such things on the cheap is because the average person with a PC doesn't even know what a GB is. He simply goes into the store, the sly salesman says "oh, what do you need it for," and then says "well 60-80 gb should be all you ever need."

    Now, contrast that to me - my friends shit when they hear I have a 250 gb drive and a 120 gb drive, as well as an extra 60 gb on a networked machine. They can't fathom ever needing that much space. I know that's probably a pittance by Slashdot standards, but it's true :(

  • by Silent8ob ( 638046 ) on Monday July 05, 2004 @07:08PM (#9616599)
    Look at what the rest of the corporate world uses for large scale storage management. It is still ruled by Tape drives.

    I don't know how much an eye goes for at the moment, but if you can spring for a Super DLT drive you'll get up 320GB (Compressed) for each tape.

    It all comes down to the Quality:Cost:Time triangle.
  • by glinden ( 56181 ) * on Monday July 05, 2004 @07:15PM (#9616654) Homepage Journal
    Build yourself a cluster [computer.org] of cheap boxes with cheap IDE disks and replicate your data across them. Because the data is replicated across your cluster, no need for backups or RAID.
  • by dfghjk ( 711126 ) on Monday July 05, 2004 @07:16PM (#9616656)
    How many months at 1TB/month do you require access to online? After you are done with data can you discard it or do you need it archived? What is the cost of losing your data set at any given time? In what manner do you expect to access it (read/write mixture and sizes plus aggregate throughput and number of client connections). The answers to these questions could cause the cost of a solution to vary but a couple orders of magnitude.
  • by segfaultcoredump ( 226031 ) on Monday July 05, 2004 @07:17PM (#9616660)
    Lets see.... hard Drives are running about $0.50 per GB, DVD's are running about $0.06 per GB (100 pack, "house brand", not something I'd put my data on but this is slashdot, and there are idiots out there who think that it is a good idea), and tapes are also running about $0.20 -> 0.50 per GB (for the DLT/AIT/LTO type, the ones that have enough capacity to not drive you nuts)

    So, you can put your data on 4-5 HD's, 10 tapes or 232 DVD's per month. The Cost of doing so will be about $500 per month for the tapes or HD's and $50 for the DVD's (assuming your time cost $0)

    At work, we had a need to keep a few TB of data online permanently, so we purchased a few NexSAN [nexsan.com] ATABeast's. At $50,000 for 10TB of usable storage ($5/GB), they may be a bit out of your price range. The advantage is that you can hold almost a years worth of data and it is protected by RAID5. It also makes management a lot easier, since it is very difficult to mount 42 300G drives in a single chassis (and it takes only 4U of rack space).

    On the low end, NexSAN has the ATABoy2 or ATABaby (2TB or 1TB) for the $8-$15K range. This will let you hold a months worth of data

    On the high end, You have EMC disk arrays (Think upwards or $20+/GB for the 'cheap' stuff from them.

    Overall, if you have 1TB per month, you need to either a) get a grant to fund your work, b) hire somebody to swap DVD's for you or b) seriously rethink your data generation.

    Any of the "cheap" storage methods have serious drawbacks, and the low cost ones are, well, not so low cost if $15,000 sounds like a lot of money to you.

    otherwise, good luck

  • harddrives (Score:2, Insightful)

    by Anonymous Coward on Monday July 05, 2004 @07:32PM (#9616747)
    For cheapest backup possible, just use harddrives. Create a software raid5, backup to it, then powerdown and remove the drives to someplace safe. You'll also be able to recover the drives on any machine that can boot linux.

    Hotswap or removable drive cages can be pricy, and aren't designed for lots of swap-ins and outs, so I'd just buy new IDE or SATA cables every few backups. If you're using the same set of drives multiple times, then leave the cables connected as not to wear out the drive's pins.

    Eventually you'll wear out the ide connectors on the motherboard, so use one of those cheap ide adapter cards and replace as needed. Or use a cheap motherboard.

    It's too labor intensive to be in the same realm of solutions as a nightly tape backup, but not nearly so much as CD or DVD backups. It's easy enough to do once or twice a month.

    If you're cheap, you're not after disaster recovery, you want disaster mitigation.
  • DVD Autoloader (Score:3, Insightful)

    by SuperJason ( 726019 ) on Monday July 05, 2004 @07:33PM (#9616751) Homepage
    Explain this to me, I can buy a 200 disc cd changer for $100 bucks, but the same thing with a burner (cd/dvd) runs thousands of dollars. Isn't there any company out there that can do it cheaper?

    Heck, I remember a slashdot article about a guy who built one out of WOOD!

    This would be a great solution for short term recovery storage. Just keep a stack of CD's or DVD's ready, and it will load them in and burn them all automatically.

    On a site note, it would be great for converting a 400 disc cd collection into MP3's.
  • by hattig ( 47930 ) on Monday July 05, 2004 @07:35PM (#9616761) Journal
    So spend some time and money on making sure it is safe!

    Even if you had a Bluray DVD burner, that would be 20 discs you'd have to burn to backup 1TB. So that is out of the question.

    Really what I'd set up is:

    1) Local: 1TB of hard drive space on IDE RAID (mirrored). An 8-port SATA controller would do, with 8 250GB SATA drives.

    2) GigE ethernet to somewhere else (got a separate garage?), or something faster if affordable

    3) A file server there with the same config for "off-site" backup. Should your PC catch fire and melt, you'll still have your data. Yeah, backing up 1TB of data over GigE will take around 15000 seconds a go, or 4 hours or so. That's okay overnight, and better than swapping 50 BluDiscs or tapes and then carrying them out there.
  • by Anonymous Coward on Monday July 05, 2004 @07:38PM (#9616772)
    I know an amateur digital photographer who generates close to 1TB / month with a Canon 1D MarkII. Storage is a major problem.
  • Re:Give Up Now (Score:5, Insightful)

    by ePhil_One ( 634771 ) on Monday July 05, 2004 @07:39PM (#9616777) Journal
    Now call me crazy, but have folks completely forgotten the age old solution, TAPE? A SDLT tape goes for about $50 and holds about 320GB, LTO holds even more, and I believe Quantum has just released the latest generation of SDLT. While its not "cheap" an autoloader can be had for about $15,000 that can backup many TB hands off. Might be a bit much initially, but it the best solution long term
  • by swb ( 14022 ) on Monday July 05, 2004 @07:45PM (#9616811)
    If your "work" (as in food, housing and income) requires this kind of storage, you should be charging the kind of money that can make the ecomomics of such data storage actually viable. I'm assuming that some of the really high-end storage devices from EMC, Hitachi, et al could handle your data generation/replication/backup needs effortlessly.

    If that's too expensive (and it usually is), you can kludge your own system using low-end stuff from Hpaq/IBM/Dell's x86-server-oriented product lines. LTO1 drives are pretty cheap and we've found them to be very reliable over the past 3+ years, as well as offering 100 gig native per tape.

    If even that's too expensive, then I seriously think you need to re-think the economics of your work situation. If your work doesn't cover your capital costs, you're not charging enough. If the work and data are business valuable enough, cutting your storage bill to the bone by building Linux clusters crammed with IDE HDDs is just a bad business decision.

    If this is just your hobby-type work, then you need a cheaper hobby, like heroin addiction or something affordable. Physical space and electricity aren't cheap enough in a metropolitan area to burn through 1TB of storage per month, let along reliable data storage.
  • by swb ( 14022 ) on Monday July 05, 2004 @07:50PM (#9616846)
    It's a great idea, but one of the problems is what happens when your data goes bad before you realize it and it gets replicated. Then you want what you had yesterday, and that means tape.

    You can solve this by ensuring some kind of in-process backup (like a SQL maintenance schedule, where it replicates itself), but then you're loading your replication process with a bunch of data that doesn't really need to be online, it needs to be in a vault someplace.

    Besides, Sarbannes-Oxley and the IRS want you to keep backups 5+ years anyway, so this replication-only model is only good for data whose internal integrity isn't meaningful to anyone but the owner.
  • by Anonymous Coward on Monday July 05, 2004 @07:59PM (#9616902)
    USB2 is *not* faster then Firewire 400
    USB2 is *bustable* up to 480mb/s transfer
    Firewire can *sustain* 400mb/s transfer
    In almost all cases, you'll find Firewire much faster.
  • by Anonymous Coward on Monday July 05, 2004 @08:26PM (#9617070)
    I can't image his data sets to be non-visual data (unless he's CERN doing particle physics). In that case, why doesn't he use some compression that's tuned to the human visual system; what the compression algorithm throws out he wouldn't have seen anyway. There are a bunch of wavelet based algorithms (much better than blocky MPEG-2 which needs interframe compression anyway) which are extremely good, he should see 30:1 ratios without seeing any "loss" or artifacts.

    This is what the studios are moving to. A typical feature film scanned at 2K takes 1.5 terabytes uncompressed. These files then tie up multi-million dollar telecine machines, so they need to move them off fast. (Imagine backing up 1.5 TB onto DLT every day). With scanning going to 4K this problem will get even worse.
  • by swordgeek ( 112599 ) on Monday July 05, 2004 @08:55PM (#9617230) Journal
    Let me get this straight: You have a four-node cluster, you have 1.6TB of online storage, and you need some sort of permanence; and you're not using RAID of any form?

    This is utter insanity! Without RAID, your only hope of safety is in your backups--which you're only asking about now!

    RAID your data ASAP, and then start looking for backup systems. Take a look at some of the DLT4000 replacements.
  • by Propaganda13 ( 312548 ) on Monday July 05, 2004 @09:15PM (#9617357)
    Not enough feedback or information!

    OK, 1TB/month that doesn't say much.
    Always look at different levels of case scenarios and work from there. I usually start with loss of building by fire and work down through limited hardware failure or data corruption.

    There are several factors that determine how often you should backup. Here's just a couple of questions to answer.

    How much is the data worth?
    How much is your time worth? If you lost a day or week of processing time.
    Is your work time dependent? (deadlines)
    If you lost the data, did you lose the data completely or just lost processing/analyzing time on the data that you can get from your clients again?

    How long do you have to store the data, and have it retreivable? One month compared to several years really changes your options.

    How financially responsible are you for the data?

    Multiple backups(daily, weekly, monthly)(full and incremental) in multiple locations are key to a successful backups.
    Raid is for redundacy or performance not backups.

  • Re:!RAID (Score:2, Insightful)

    by MasTRE ( 588396 ) on Monday July 05, 2004 @09:18PM (#9617377)
    > because it protects against device failure, not *user* error. if you delete a file from a raid array, it's gone. that's part of what offline is all about.

    You can add to that getting hacked. They can't hack your off-line data.
  • Xraid (Score:3, Insightful)

    by Phrack ( 9361 ) on Monday July 05, 2004 @09:42PM (#9617500)
    I haven't tried it myself, but Apple's Xraid appears to be gaining in popularity as a reasonably priced bulk data storage solution. It reportedly works with Linux, Windows, Netware and, of course, Macs.

    If that doesn't suit ya, and it's bulk storage without necessarily speed you're looking for, check into the ATABoy line from Nexsan.
  • by wik ( 10258 ) on Monday July 05, 2004 @11:38PM (#9618149) Homepage Journal
    Pray tell, what are you going to do with just a network? Store bits on the wire and network card rx/tx buffers? There's gotta be something big on the other end of the cable, dude.
  • Re:Cheap solution (Score:4, Insightful)

    by Wudbaer ( 48473 ) on Tuesday July 06, 2004 @06:08AM (#9619698) Homepage
    Repeat after me:

    RAID is not backup !
    RAID is not backup !
    RAID is not backup !
    [..]

  • by Anonymous Coward on Tuesday July 06, 2004 @06:48AM (#9619848)
    Ok, a few questions:

    1) How much is the data worth per day, week, mouth, year? Your final solution should reflect these data points.

    2) How quickly do you need to have access to it? Quicker means more money longer lowers the price, but add complexity.

    3) How stable does the data need to be? Is year old data worth the same as current data? What about 2, 3, 4 years later. Do you need to get that data back?

    4) How much physical room is available for the backup systems and offsite storage? Is it climate controlled, yet convenient? Is it in a different state to avoid disasters?

    5) How secure does the data need to be? Is this your customers' credit data that cannot be leaked or there are federal fines or will it just be inconvenient?

    A storage engineer would use your answers to help design a total solution. If your data isn't worth very much, then you've also shown that by this study. OTOH, if it is worth millions, don't expect to "get by" with a $20k answer from ./ readers.

    I work where there are daily penalties of $400k if we make a mistake or our systems go down. Other systems will cost $5M / hr if they aren't up. What do you think the cost of our backup and recovery system is? We have data stored in multiple locations - near, a little further and on the other side of the Earth. It takes a little longer to get the data back the further away it is. I can imagine insurance and banking where the cost of data is in the $10M per second.

    How much is your data worth?

It's a naive, domestic operating system without any breeding, but I think you'll be amused by its presumption.

Working...