Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage Hardware

Bulk Data Storage For The Common Man? 483

Vigyaan writes "Lately, I have been looking into different bulk data storage options available to a common man. My work depends on generating, storing and analyzing a large amount of data -- averaging about 1 TB per month. I would like to have a storage system which is automated, fast, reliable and most importantly does not cost the price of an eye. Right now, I have a 4 node Linux cluster with 10 large hard disks (total capacity 1.6 TB); data storage roughly costs about $0.60/GB (excluding the cost of PC hardware). But long term storage is painful -- DVDs cost about $0.10-$0.15/GB but takes too much human time and leaving data on hard disks makes me nervous because of possible failures. RAID is a possibility, but it increases the cost significantly. I was wondering, if Slashdot readers have any recommendations for a cheap automated way to store and retrieve data."
This discussion has been archived. No new comments can be posted.

Bulk Data Storage For The Common Man?

Comments Filter:
  • Hard disks (Score:5, Informative)

    by ConsumedByTV ( 243497 ) on Monday July 05, 2004 @06:56PM (#9616508) Homepage
    You're always going to get a better rate with Hard drives but you're going to be prone to failure.

    If you buy them in bulk you can save.

    Burning DVDs is going to take you forever and drive you nuts.

    Find a hotswappable set of drives and use that for your offline backups. Use a raid for your current backups.
  • !RAID (Score:1, Informative)

    by therandthem ( 602983 ) on Monday July 05, 2004 @06:57PM (#9616515)
    On the subject of RAID please remember, if it's spinning, it ain't a backup!
  • Rev Drives (Score:1, Informative)

    by jtwronski ( 465067 ) on Monday July 05, 2004 @06:59PM (#9616531)
    Iomega has a somewhat new backup solution out called a rev drive. Its quite a bit like a hard drive, but removable. Mine holds 90GB compressed, and the transfer speed isn't all that bad. I haven't had the opportunity to test it on Linux, So I can't say it will work with your setup. the drive is under 300USD, and about 50USD per cartridge.
  • by Apparition-X ( 617975 ) on Monday July 05, 2004 @07:01PM (#9616548)
    Look for an LTO gen 1 or SDLT220/320 on ebay, with a SCSI connection (some of them are fibre, and I assume you don't want to go there!). Don't forget to pick up some tapes. In general, this sounds like it would work if you plan on doing this for a while, and can leverage the initial investment over months or years.

    Capacities are (for the cost of a sub $50 tape):
    - LTO1: 100 GB uncompressed
    - LTO2: 200 GB uncompressed
    - SDLT220: 110 GB uncompressed
    - SDLT320: 160 GB uncompressed

    If your data is particularly ammenable to compression (i.e. database data) you could easily get 3 or 4 to 1 compression with these drives without sinking your CPU utilization.
  • Re:Wirewire drives? (Score:4, Informative)

    by Rik van Riel ( 4968 ) on Monday July 05, 2004 @07:02PM (#9616557) Homepage
    For long term storage, how do you feel about firewire drives? Maybe not as cheap as you'd like,

    Oh, but they are cheap. Just buy a large IDE disk and a $30 firewire/fast-usb enclosure.

    I'm just not sure about the "long term", though. I have no idea what the shelf life of a hard disk is.

  • compression (Score:3, Informative)

    by Suppafly ( 179830 ) <slashdot@sup p a f l y .net> on Monday July 05, 2004 @07:05PM (#9616575)
    First off, if you aren't already compressing that data, start. You may be able to cut the size down dramatically using compression.

    Then backup using tapes just like every other place that has to do backups. Generally do full backups once a week and incremental ones nightly or whatever is necessary based on the data you are working with.
  • by Anonymous Coward on Monday July 05, 2004 @07:06PM (#9616581)
    All forms of media/backups have their own drawbacks... but some aren't as bad as others, and the others often are more accessable.

    Tape: Tapes break, they wear, they have dropouts, take a while to back everything up, can't always access files if you just want to restore something (Different methods vary, folks)... but ultimately, it's cheap when you use DAT because they're a common media. Swap the tapes twice as often (and throw old ones out) if you're paranoid about tape related failures.

    Hard Drive: Most common form of backup I see now, mainly for the 1:1 size factor. Yeah, drives fail, too. Sometimes you have a pretty good warning when this is going to happen, sometimes you don't. (My 13GB Maxtor and 40GB IBM Deathstar drives both went *pfft* on reboot.) Get enough of them at once, you could swap out the logic boards if one does fry out. Ultimately, RAID or just simple 1:1 mirroring is probably the most efficient and easy method. Accessing bits and pieces is also easiest under this method. I personally just use an external USB2 case with a 120GB drive in it. Everything I want to back up goes on that drive, and then eventually... DVDRs. I turn off the drive when I don't need it... hopefully prolonging the life of it when I need it most.

    DVDR: Not anymore. If we had these new-fangled DVDR discs (+ or -) say... when 2 to 6GB drives were common.... sure... But in addition to hard drives, recovering selective files is easy under this method too... Unless you use a backup program that crunches everything together on the disc in some spanning format. Burn times can be tedious... but it's not bad if you consider the overall amount of data you're putting on the disc. Cheaper than quality-brand name CDRs, though, in terms of price per mega/gigabyte. Only an idiot would trust $0.01-per-disc spindles for long-term backups. Even the longevity of DVDR has yet to be seen...

    CDR: I'm not going to bother.

    Network: Well, still relies on hard drives and other components... but good if you don't want to saddle one room with a ton of boxes. Simply for space and efficiency... external drive is probably better anyway.

    Old fashioned method: Print everything out and keep it in a filing cabinet somewhere. You could always OCRA the stuff later. ;-)
  • by jeffgeno ( 737363 ) on Monday July 05, 2004 @07:07PM (#9616587)
    The drive will run about $4000, but the tapes are only around $0.20/GB assuming a 1.5:1 compression ratio. And keeping that assumption, 1 TB of data should only take 3 200 GB native tapes per month, so swapping wouldn't be so bad with the single tape drive. An autoloading library would be significantly more expensive, but if you really need automation, that's the way to go.
  • Re:Wirewire drives? (Score:5, Informative)

    by littlerubberfeet ( 453565 ) on Monday July 05, 2004 @07:11PM (#9616619)
    Lemme address the firewire thing: I work in a sound studio, and we generate about 5-8 gigs of data a month, mostly music for TV. This isn't a huge amount, but we rely on multiple sets of Firewire drives for backup and then internal hard drives for current projects. This means we have all 400 or so projects at our fingertips. Given how fast we do things, this is important.

    Lacie makes their 1 terabyte firewire (943 gigabyte formatted) drive. I we get them for $1,080 a drive (Macmall matched Provantage's price). This is more then the article author spends now per gig, but these drives have done quite well in the studio. You can find cheaper firewire though.

    We are at the point where hard drives give the best bang for the buck. The only fault of firewire is that my bosses have burned several bridges. ground yourself before unplugging the drives. The bridges were cheap though. In any case, hard drives are probably the most failsafe and cost effective solution, with firewire being the easiest interface to use those drives with.
  • by TheUncleBob ( 791234 ) on Monday July 05, 2004 @07:17PM (#9616663) Homepage
    If you are more interested in volume than speed, then the emphasis should be on the 'ID' part of RAID. Inexpensive Disks. If you used 160GB Drives, which appear to have the best bang for your buck at the moment, and put 6 (yes 6!) in a pc. Just use any old cheap pc (I use 200-400Mhz PII)

    Run the disks RAID 5 and you will get about 800GB of storage for $600 . Now get two cheap ata100 cards so you have a total of 6 channels, and mount each drive as a master on each channel. Build a 2gb root partition on the first disk (mirror it if you want) and then set the rest of the space up as a huge raid 5 array.

    Et Voila cheap, big server. To archive data, turn off pc, and throw into attic :-)
  • by Servo ( 9177 ) <dstringf@noSPam.tutanota.com> on Monday July 05, 2004 @07:17PM (#9616665) Journal
    First of all, don't bother with DLT. It is slow, and increasingly more unreliable as DLT is phased out of production and replacement parts are actually refurbs.
  • Re:Give Up Now (Score:5, Informative)

    by Zone-MR ( 631588 ) * <slashdot@NoSPam.zone-mr.net> on Monday July 05, 2004 @07:17PM (#9616667) Homepage
    No figures, but I think the opposite. I've had several DVD-R disks which I've written backups to only to discover that they are unreadable a year later. My personal experience has been that HD's are unreliable, but less unreliable than writable DVDs.

    Of course higher quality media might be better, but then you can no longer quote the $0.10/GB figure.
  • Re:Hard disks (Score:4, Informative)

    by littlerubberfeet ( 453565 ) on Monday July 05, 2004 @07:18PM (#9616672)
    hard disks are good.

    If you want one of those nifty things with robotic arms and whatnot, plan on spending upwards of $3500. The AIT Automated Tape Library goes for that much and holds only 15 tapes. Plan on spending tens of thousands for something like Ampex's DIS 914 for 30 Terabytes.

    Your friend is right: tapes or cheap. The equipment needed to support them is expensive, slow and error prone. It gets cost effective once you have enough money for a new Porsche though...
  • by da5idnetlimit.com ( 410908 ) on Monday July 05, 2004 @07:23PM (#9616695) Journal
    depending on the value of your data, you should try having a nice 4*400Go SATA in raid 5 *2, possibly using a distributed file system for redundancy...

    Not the cheapest, but fast, simple and saves you the unholy pleasure of having 2-3 DLT boxes to archive/cycle each month...

    You already have a linux cluster, so implementing a distributed file system, or even simply a nightly incremental mirror to the target server if you can afford losing one day work/computation...

    It would help if you told us what sort of data you work with... from databases and to automated telescope tracking system, both need large amount of storage, but you won't need the same system array for each...

    I seem to remember a /. story [slashdot.org] on a rackable Petabyte storage system [archive.org]

    You don't need to go to the Petabyte capacuty but you will find some interesting comments on filesystems, disk virtualisation, 1U rack providers and so on....so a 1 Terabyte rack server is definetly possible...

    Good luck...

  • DLT is the way to go (Score:3, Informative)

    by pastpolls ( 585509 ) on Monday July 05, 2004 @07:23PM (#9616696)
    I actually use a DLT with autoloader I got off ebay for under $200. I then bought a lot of used DLT tapes (100) and use them to backup my Video and DVD projects. It is great because when I fill my offline storage (about 1TB) I just fire up the backup software and get the old DLT going overnight. It is done by morning and the shelf life for those tapes is about 20 years.
  • Ultrium (Score:3, Informative)

    by 7vEn_T_7vEn ( 794241 ) on Monday July 05, 2004 @07:24PM (#9616698)
    I'm not sure what your budget is but if your like me you want something that complies to standards so it will be around, is cheap and effective. For this I would have to recommend an Ultrium tape backup drive. The drive is standards based (google it) and the tapes are dirt cheap a 200/400 gb tape pulls up for $55. If you figure (hardware compression) 250gb of storage per tape then it will cost just $.22/gigabyte. The problem is that the drive itself is listing for about $2600, not exactly cheap but it's guaranteed to be backwards compatible with future lto standards and the media is as cheap as you could possible ask for. One more thought, look into an LTO Gen 1 solution (100/200) for a cheap drive, cost per gigabyte is roughly the same, it will just take more swapping.
  • by Doc Ruby ( 173196 ) on Monday July 05, 2004 @07:29PM (#9616729) Homepage Journal
    A DVD-R jukebox [dvdchanger.com] can give you 200 DVDs at once. That's $3600 (drive/changer) [nextag.com] + $268 (1000 DVD-Rs) [pricewatch.com], for (1000*4.7GB) 4.7TB@$4000, or $1.18:GB. That's almost double your HD cost, but you'd need at least another host PC, and multiple controllers for the 16HD RAID, which is probably another $1000. And another $268 buys you another 4-5 months storage, so by next April you're down to $0.14:GB; in a year you're at $0.12:GB. A shelf of 200-disc "CD" books will hold your archives, 1 book per carousel for "fast" retrieval. Backup all your DVDs offsite at $0.27:GB. As DVD-R prices fall over time, you're probably looking at something like $0.05:GB, probably less than even plummeting HD prices. And the DVDs (especially with the cheap backups) are much more reliable, especially over 10 years, than the HDs. If you are looking at 10 year archive, at $80:month in DVDs, for 29% more money you can add a second host PC/changer set, left in their boxes, in case the original PC/changers fail.
  • by ippearx ( 794245 ) on Monday July 05, 2004 @07:29PM (#9616730)
    http://www.wiebetech.com/products/ComboDock.html [wiebetech.com]

    Makes an cheap, fast way to put lots of data onto lots of hard drives. Using one of these bad boys means no extra money is spent on drive enclosures, cases etc. You only buy raw standard hard drives. Excellent if it's only backup, and you do not need lots of access. This solution is not automated however.

    Hard drives are prone to failure. I was thinking of buying at least 2 drives of different brands to mirror, storing them in separate locations in sealed, air tight containers at just the right humidity/temperature. Also I think a disk check every 6 months or year would be necessary, and if any problems are found, replace the disk with another.

    One beauty with this method is you only need to pay for disk space as you need it, and hard drives may still get much bigger. I was going to buy drives at the lowest cost/megabyte which at the moment is 160GB drives.

    I would love to find more information on the physical storage of hard drives, especially how long they would be expected to last without use - months? years?

  • by Chess_the_cat ( 653159 ) on Monday July 05, 2004 @07:29PM (#9616731) Homepage
    Too bad the maximum attachment size is 10MB.
  • by millisa ( 151093 ) on Monday July 05, 2004 @07:32PM (#9616743)
    I would hope that if you are working with a TB of data, the value of that data is pretty high . . .

    Promise SX6000 = $255.95. (6) 200GB IDE drives in a Raid 5 = $624.95

    If you had a separate boot drive from the SX6000, you could just bring the system down for a couple hour maintenance once a month and slam all the drives out and put fresh ones in.

    Just keep buying new 200GB drives anymore and shelf the old ones (or if its *really* valuable and your home firesafe isn't enough, pay Iron mountain or someone to keep it).

    There aren't hidden labor costs outside of those two hours it takes to setup a new array every month (DVD's are about 60 bucks a month for a TB, with a hundred or so for a drive (which *will* need to be replaced occasionally if you are burning that much) but you'll spend hours and hours just dealing with the swap outs and breaking up your data . .. )

    If you don't have to keep the TB of data after a month or three, then your price gets even cheaper after you invest in your initial hard drive media sets . . . and you can put all the drives in hot swap chassis to further minimize your time dealing with the issue.

    Of course this is all moot if your 1TB of data isn't valuable enough to invest 600 a month in . . .
  • Exabyte (Score:2, Informative)

    by HonkyLips ( 654494 ) on Monday July 05, 2004 @07:37PM (#9616767)
    Try Exabyte - for hardcore tape storage.

    http://www.exabyte.com/products/prodviews.cfm

    I think you can store about 1.6 TB on a single tape or similar, but check them out. Tape drives have come a long way from old SCSI DATs transferring 20meg a minute. And they're fully automated and although there's an outlay cost for the tape drive, over time the cost per gigabyte for storage will be lower than hard drives.
    If you have a security company do patrols of your office you can get them to take the tape offsite with them after nightly backups for added security... etc etc.
  • Bad idea (Score:5, Informative)

    by Anonymous Coward on Monday July 05, 2004 @07:47PM (#9616825)
    Google stores data for fast access, not for reliable storage. They don't care if they lose a few hundred gigs when a handful of disks die, they'll just re-spider it in a few days when the Googlebot hits the sites which were lost. Their solution is NOT optimized for reliable storage and it's not suited in the slightest to this guy's problem.
  • Re:Wirewire drives? (Score:5, Informative)

    by SlamMan ( 221834 ) on Monday July 05, 2004 @08:00PM (#9616907)
    USB makes the computer actually do work, while firewire ports handle it themselves. For a normal user, not much of an issue, but over a couple drives, you'd notice.
  • by macdaddy ( 38372 ) on Monday July 05, 2004 @08:00PM (#9616909) Homepage Journal
    I have a dual MP2400 with 4 x 120GB WD 1200JB drives. I have a single XP2800 machine with 4 x 120GB WD 1200JB drives, 2 x 200GB Maxtor 6Y200P0, and 2 x maxtor 7Y250P0 drives. I have a dual Xeon 2.8Ghz machine with 4 x 120GB Maxtor 6Y120M0 drives. That accounts for all my regularly used machines. I guess I'm not a common man. :-) Not to brag...

    I have to disagree with the sister system though. For most geeks like you and I a sister system would be fairly adequate. It would be better with an occasional off-site backup. However it really sounds like this guy's data is far too valuable to have only one copy of it and to have all copies be at one physical location. He really needs an off-site backup somewhere. Imagine for a moment if his home (I'm guessing he works from home, but this still applies to a real store-front business) was robbed. The crooks didn't know what they were taking. They saw two shiny computers in an office and figured they could hawk them on the street. There goes all his data, both copies. D'oh! So in short a sister system is a good idea but it probably won't do this guy much of any good. It would be a good local solution for a short term live mirror (ie, data is archived that night but the sister machine gives you a backup for that one day's work).

  • Re:Hard disks (Score:3, Informative)

    by silas_moeckel ( 234313 ) <silas.dsminc-corp@com> on Monday July 05, 2004 @08:31PM (#9617095) Homepage
    If your going to just plug in backup and swap try the USB 2.0 to IDE backup boxes pretty much its a power brick and an US to IDE chipset in a plastic case with a 40 pin IDE connector on it. You plug in the drive and your good to go. No cases or hot swap caddies to deal with. And 5400 RPM drives dont get hot to the touch sitting on the desk. It's not pretty but if your just running backup keep on buying $100 IDE disks (generaly best cost per GB)
  • Re:Give Up Now (Score:5, Informative)

    by tchuladdiass ( 174342 ) on Monday July 05, 2004 @08:53PM (#9617215) Homepage
    Come on, this is Slashdot. A tape changer doesn't have to cost that much money if it's make of lego [boeldt.net] (shamelessly pulled from an earlier slashdot story which I can't find at the moment).
  • by anakin357 ( 69114 ) on Monday July 05, 2004 @09:02PM (#9617267) Homepage
    An invitation has been emailed to your friend.

    Yee. Sent someone else who replied a invite too.

    Mod this up and I'll send you one too. :P
  • by Anonymous Coward on Monday July 05, 2004 @09:09PM (#9617320)
    VXA2 has a much higher storage capacity at a better price point than DLT. For example, a 10 tape VXA2 autoloader is about $2300 USD and holds 1.7 or 1.9TB compressed.
  • Re:Hard disks (Score:4, Informative)

    by eric76 ( 679787 ) on Monday July 05, 2004 @09:14PM (#9617351)
    Tapes can be pretty dependable, but you need a better quality tape system than that typically sold for PC backups. The 20 GB tapes are just not that dependable.

    If I had the money, at a minimum I'd get a tape drive that could handle the 200 GB (uncompressed) tapes. Something like IBM's LTO Gen-2 Tape Library. That should run a bit less than $6,000.

    For that matter, if I won the lottery, my first purchase would probably be a top of the line tape backup system instead of a the usual new car.

    Since I can't afford it, I use DVDs and CDs for backups. They are a pain in the neck and are not that dependable, but I keep backups up to a year on DVD+RW so if one fails, hopefully the others will have the data.

    Instead of writing directly to the DVD writer, I write the backups to disk and then copy the backup sets to the DVDs.

    I also keep a complete current backup of nearly everything important on a seperate computer.
  • Re:Bad idea (Score:4, Informative)

    by Anonymous Coward on Monday July 05, 2004 @09:21PM (#9617392)
    this is incorrect. GFS (Google File System) has many systems with the same data on each node. These nodes have 3 copies of each data slice. If one server fails then the other two mirrors re-copy the data.. If two fail then the server mirrors the data to ensure it is never lost.

    google does not want ANY data to be lost. The have many mirrors of all data.
  • by TinyManCan ( 580322 ) on Monday July 05, 2004 @09:31PM (#9617440) Homepage
    LTO 2 [qualstar.com] drives are the current trend in large enterprise storage. LTO is the new hotness, DLT is old and busted.
  • by bheerssen ( 534014 ) <bheerssen@gmail.com> on Monday July 05, 2004 @09:31PM (#9617442)
    mods, this is not off-topic.

    KFG meant to say "You can have fast, good, or cheap. Pick two."

    It's an old software design maxim that applies suprisingly well to this subject.

  • by Bananas ( 156733 ) on Monday July 05, 2004 @09:40PM (#9617495) Homepage
    Something I've noticed in all posts, the price is a prime factor, but the original poster also seems concerned about access times (hence DVDs not being an option, due to the time it takes to retrieve data).

    For simplicity, I'm not going to go into RAID tradeoffs, etc. and just stick with "striped data", which gives you maximum bang for the buck. You should draw up a simple spreadsheet with the following headings:

    1. Media Type: Enter the type of media, it's manufacturer, etc. here. Example: HDD, Hitachi, Model XYZ, 320Gb.
    2. Size: How much raw data the medium will store. All figures should be in Gb.
    3. Speed: The expected data transfer rate of a single unit of storage. IDE drives vary and can range from 5mb to 30mb/sec, tape also ranges. All figures in Mb/sec.
    4. Watts per Unit: how much power does each media unit draw? Tape drives will be difficult here, but HDD units are typically around 20-30 watts. Go conservative and plan on allocations of 30 watts for HDD units.
    5. Cost per Unit: How much does it cost for 1 HDD/Tape/whatever?
    6. Cost per Gb: [Size] divided by [Cost].
    7. Units Needed: Given a target of 1024 Gb (1 Terabyte), how many units of storage are needed to reach that size, assuming data striping and no RAID-5? Forumula is '1024' divided by [Size]. then round all decimals up to the nearest whole number.
    8. Expected Size: Take [Units Needed] times [Size]. If you have 4 units of 320, you'll end up with 1280 Gb (sans redundancy).
    9. Total Cost: cost of non-redundant array. [Cost Per Unit] times [Units Needed].
    10. Aggregate Speed: Assuming a 1:1 ratio of controllers to units, what kind of speed can we expect? [ Units Needed ] times [Speed]. All figures in Megabytes. Note: a huge array of 1+ TB can be made unusable if you can only process 10 megabytes a second.
    11. Power Consumed: [ Units Needed ] times [ Watts per Unit ]. Important - your power supply should be rated at about 120% of this figure to make everything work reliably. Also, if you're going above 400-500 watts, then plan on some additional cooling - there will be an increase in BTU's

    It's not exactly a great spreadsheet layout, but it should be enough to enter everything in and start seeing what is practical and what isn't. I'm sure that someone else would be able to enhance this a little further - any takers?

    By the way, you really should think about RAID-5 at the very least. All it would take is just one drive to hose your data completely. Besides, as the array grows in size, the price tradeoff becomes smaller and smaller, to the point where it's really not worth your time to stripe all of your data without redundancy. I believe that the md drivers in linux support up to 32 devices per RAID set. That takes your overhead from 1/5 of your array (in a 5-drive setup) down to 1/32 of your array.

    A SAN-style setup lends itself well to this, but the price is very prohibitive to "the common man", as it requires very expensive hardware. You can emulate something like this via GFS support in Linux, which (theoretically) would allow you to aggregate your data.

    If there is a requirement to keep the data online at all times, you'll need to spend more on some PC cases, as well as some networking to string the units together. Pick a reasonably-priced case that will house all the media units, have adequate power (at least 250 Watts, 300+ would be ideal) and keep them cool. Use a motherboard that is reliable, and can adapt to several different clock speeds for a given CPU; you'll want something that can be thrown out for less than $99.00 if it should go bonkers on you, but if the CPU burns up, you should be able to still get parts off the shelf and get the Motherboard running again. Stick will the "commodity" or low-end CPUs, as (a) they tend to be cheaper, and (b) having been through a complete lifecycle, any bugs or issues with the CPUs will be well-known by now. Don't worry about the speed of the board or CPU at this point, as most "modern"

  • Mmmm, hard disks (Score:1, Informative)

    by Anonymous Coward on Monday July 05, 2004 @09:53PM (#9617561)
    Well, not this is a soluation, but if you just have lots of money to blow. I just put some of this stuff in at work.

    I've got an EMC Clariion CX200 fibered to the servers, that means its a SANS. Its 3 TB. I think it was $40k. Then I've got a Win2k3 Appliance Server fibered to the EMC Clariion CX300 with 12 TB. It shares out through various filesystems ( nfs, smb, ftp etc.. ). So its' my NAS. The CX300 I think was about 100k. The SANS holds the online data, the NAS holds the archived data. Oh, and the NAS head is fibered to a Dell PowerVault 12TB LTO2 tape jukebox. Which backs up the NAS. It was pretty cheap, i think it was 18k or so.

    Oh yes, and just for the slashdot crowd. Yes, this equipment will be holding fMRI studies. As well as MRA, CT, DR, CR, CTA, US and variaty of other modalites. Being a PACS engineer is fun, although, I do make less then a teacher. Fucking Economy.

  • by Anonymous Coward on Monday July 05, 2004 @10:20PM (#9617684)
    LTO1: 100GB/200GB (native/compressed)
    LTO2: 200GB/400GB
    SAIT1: 500GB/1300GB
  • by rhizome ( 115711 ) on Monday July 05, 2004 @10:27PM (#9617711) Homepage Journal
    > I'd love to see a Firewire hub that could act as a hardware RAID controller.

    Firewire drives can be daisychained, and in fact OS X allows you to set up software RAID on multiple firewire drives attached to the system. You can't move them to another system and get access, but that's about the only limitation that I've found and it's more than decent for local high-density storage..
  • by Anonymous Coward on Monday July 05, 2004 @10:31PM (#9617735)
    >>"Lately, I have been looking into different bulk data storage options available to a common man. My work depends on generating, storing and analyzing a large amount of data -- averaging about 1 TB per month. I would like to have a storage system which is automated, fast, reliable and most importantly does not cost the price of an eye. Right now, I have a 4 node Linux cluster with 10 large hard disks (total capacity 1.6 TB); data storage roughly costs about $0.60/GB (excluding the cost of PC hardware). But long term storage is painful -- DVDs cost about $0.10-$0.15/GB but takes too much human time and leaving data on hard disks makes me nervous because of possible failures. RAID is a possibility, but it increases the cost significantly. I was wondering, if Slashdot readers have any recommendations for a cheap automated way to store and retrieve data."

    First of all, is the 1TB of data that you collect every month mostly different or mostly the same as the data you collected in the past month?

    Secondly, how compressable is the data you are collecting?

    Thirdly, how much random access do you need to the data, or is a serial stream of the data good enough?

    --

    If the data you have is mostly the same from month to month then you only have to perserve the difference between 2 months

    If the data is highly compressable then you can use bzip2 -9 to make the data much smaller, therefore needing a lot less of the media than otherwise.

    A 1TB file that compresses 50 to 1 will only be 20 GB. This will easily fit on 5 DVD's.

    If you collect 1TB of data and diffing it with the previous months data outputs only 100 GB differences, and that compresses down 21 to 1 then you can fit it on a single DVD.

    rsync is also good for copying the data on a system to a remote system.
  • more about the app. (Score:1, Informative)

    by Anonymous Coward on Monday July 05, 2004 @10:49PM (#9617832)
    I wish the poster told us more about the application. Is the data from sensors (someone mentioned FMRI), or is this from sort of simulation modeling?

    This might sound off topic, but I have used different compression schemes (like wavelets) for sensor data, and regenerative techniques with statistical sumeries that gave me me about 2000:1 equiv. compression. That still does not address the 1TB/month backup problem, but might help reduse the problem somewhat.

    As for backup. In the past I have chosen to go with a modified cluster solution, where I set up a data server in another building and auto-backuped up the new/unique data to it. The reason I chose slow/large HD's instead of tapes is my experience with using tapes over a course of 10 years. I cannot even get replacement tapes if I needed them not to mention the actual tape drives...

    How long do you realistically need to keep the data around for? Can you recycle the media (say after 1 to 2 years? Does it need to be truly perminate? ...
  • by TBone ( 5692 ) on Monday July 05, 2004 @10:50PM (#9617842) Homepage

    I looked through some of the answers here, and as near as I can tell, you've got a bunch of home hobbyists telling you how to back up your home computers. Perhaps all your needs entail is a computer with an external IDE drive array and 4-10 200G SATA drives in it. But from your initial post, it's not clear what you need your offline storage _for_.

    First of all, you mention that you generate and use 1G of data a month. What happens at the end of that month? Does all of the data become useless? Is some of it carried through? Is it useful for historical processing for some time after it's not "live" any more? The disposition of that offline data is important; you can't determine how you can most effectively back up your data until you know what you need to do with that data once it's backed up.

    Since no one cares about backing up old data that they never use any more, I'm going to assume you need this data in some form in the future. I'm also assuming that your data ages out completely every month.

    Realistically, you have two options: Large redundant disk arrays, or tape. Various factors give credence to one or the other.

    First of all, get off of the SATA hacks, and realize you're going to need to go to SCSI, whether you end up with disk or tape. You're backing up data, you're going ot want it to be reliably written out, and SCSI is the de facto standard for backup architecture. Yes, you pay more for it, but there's a reason for it: the SCSI equipment I manage at work fails a fraction of the percentage of time that the various IDE/ATA systems fail. While SATA is marketed as a consumer technology, it will never meet the rigors of being a reliable backup methodology.

    • Media Cost: Tape wins over disk here. LTO tape is running, at a quick check, for about $75 retail for 200/100G tapes. Even assuming only reasonable compression, you're looking at 150G for $75 bucks. And that is single-cart pricing; tape pricing quickly drops if you're ordering in bulk (typically in packs of 10, then at the 3-packs level, then more, check with your preferred media vendor)
    • Hardware Cost: Disk wins, but it's a double-edged sword - every disk you own has electrical and mechanical failure chances. The more disks you have, the more likely you are to lose one of them. The more you're storing on disk, the more you open yourself to a catastrophic failure of those disks themselves. High-end fast tape drives and libraries are expensive, but they just _work_. You plug them in, load your preferred tape management software (hell, run mtx for that matter), and start backing stuff up. No formatting, settings up arrays, hot-swap schedules, anything like that. But you pay through the nose for it - expect to spend into the $10K range for a large-scale tape storage solution that you could match (in short-term storage duration) for a couple of thousand dollars for a disk-based solution.
    • Hosting Space: Try to store 10TB of disk, and you'll need an air conditioner in that room just to cool down the disk cabinet and controllers. 10TB of tape just sits there though; you can store 4TB of tape online in a small 3U (about 6 inches) tape library - that's 24 tapes, and such libraries typically also support two drives. Go to 5-6U, and you can get 4 drives and over 50 tapes. If those were 200GB LTO tapes, you'd be looking at up to 10TB of storage available online, or easily offline and off-siteable. In addition, tape is easily expandable. Need more storage space? Buy another tape. No new hardware needed, no power concerns, just drop it in the drive or library and go.
    • Speed: Disk definitely has an edge. Set up an decent SCSI RAID5 array (real hardware raid across multiple disks on separate physical controllers, not this playtime software 0+1 homebrew IDE raid crap) and watch your write speeds triple. If you need to back up that 1 TB overnight, you don't have much of a choice but to go to disk in some form. But again, you pay a price for it. The speed you save in the
  • Re:Hard disks (Score:3, Informative)

    by eric76 ( 679787 ) on Monday July 05, 2004 @11:28PM (#9618086)
    Use RAID to increase your on-line availability.

    RAID does not a backup system make. You still need backups.

    For increased on-line availability, how about a good distribued file system with several servers? And, of course, back everything up anyway.
  • by wik ( 10258 ) on Monday July 05, 2004 @11:47PM (#9618193) Homepage Journal
    There's no reason why you couldn't read each of the DVDs in serially and incrementally rebuilt the lost DVD. On recovery, you should only need enough space to hold a single DVD to rebuild the remaining disk.

    A disadvantage is that the data cannot change while you write all N+1 DVDs and restoring would require lots of DVD swaps (regardless of whether you've lost a DVD or not) and the ability to incrementally write files with gaps in them (not an issue with most filesystems).
  • by superpulpsicle ( 533373 ) on Monday July 05, 2004 @11:53PM (#9618229)
    People are so hyped up about Gmail. Did you know that www.hotmail.com is NOT effectively backed up as of 2002 (cough, inside source, hmm).

    Knowing how long hotmail and M$ has been around, and still failed to backup hotmail with their infinite windows license. What makes you think your 1 Gig will be backed up by Google.

  • Re:Wirewire drives? (Score:2, Informative)

    by insert 3 letters ( 768692 ) on Tuesday July 06, 2004 @12:09AM (#9618304)
    Exactly I've run the exact same drive (WD 120 SE) in a usb2 enclosure and a firewire. On benchmarks, the firewire was generally about 30% faster. More reliable connection too.
  • by anubi ( 640541 ) on Tuesday July 06, 2004 @12:31AM (#9618393) Journal
    I haven't seen any mention of these guys here, but a few years ago, I remember a company, CREO, working on a data recorder which used a spoolable optical tape. I believe this tape was made by ICI over in Europe.

    There were several packaging options for this tape.. including reels of 2" wide tape and cartridges.

    I've lost track of what happened to it. All I remember is that this tape existed at one time and some research was being done to make data recorders of phenomenal storage capacity.

    Back in the early 90's, there was one company in Campbell, California, known as "LaserTape" which was trying to design a tape drive for the PC which used cartridges of this tape. I have lost track of whatever happened to the company.

  • by __aafkqj3628 ( 596165 ) on Tuesday July 06, 2004 @02:02AM (#9618885)
    (cough, inside source, hmm)

    Inside source? Just call them up and ask! It's not hidden knowledge [com.com].
  • Re:Firewire drives? (Score:2, Informative)

    by silentbozo ( 542534 ) on Tuesday July 06, 2004 @02:17AM (#9618947) Journal
    Word of warning. Don't cheap out on Firewire hardware - it's touted as being bulletproof, but in practice I've found SCSI-voodoo-like interactions between cheap cards/cases, and questionable power supplies. I've pretty much given up using Firewire for applications where I need to swap drives a lot, as weird crap happens, just at the worst possible moment.

    In those applications, I've gone with dedicated ATA/133 cards with a nice roomy case with a bunch of removable drive bays. It's a pain to have to shutdown to swap drives, but less of a pain than Windows bluescreening, rebooting, and "fixing" your attached Firewire drive, scrambling all of the data on it, and making it impossible to run a recovery (no, I didn't have a backup of that data...)

    I've also had weird crap happen with my Macs as well - some hardware doesn't show up unless you have it plugged in on startup. In theory it was a great idea (mix Firewire cases with removable drive bays), in practice, you're asking for trouble if you're using cheap parts (ie, bottom-basement cases, with cheap cables.)
  • by Cabeiroi ( 701234 ) on Tuesday July 06, 2004 @02:24AM (#9618988)
    I'm not sure what your price range is, but one method I've had success with is a Promise SATA add-in card and removable hard drive enclosures. SATA is hot-swappable and combine that with a cheap hard drive enclosure ($10-$30+) with any SATA hard drive of your choice and you have a relatively cheap solution.

  • Re:Use those HDDs! (Score:2, Informative)

    by kava_kicks ( 727490 ) on Tuesday July 06, 2004 @09:18AM (#9620537)
    That Paper Disk idea is pretty interesting. I read ages ago that a large Australian government agency (whose data storage goal was LONGEVITY rather than short-term backup) - might have been the Bureau of Statistics - chose to store/archive all of their data onto microfilm ...

    "Microfilm!?" I hear you say, "But this is 2004!". I was suprised too until I heard their reasoning: the only thing you need to read microfilm is a magnifying glass and a light source.

    And that, ladies and gentlemen, makes it virtually future-proof. Try finding a hard-drive for your old 5 1/4 C64 floppy disks (or better yet, those bloody tapes that took 30 minutes only to fail to load)
  • by Vigyaan ( 794425 ) on Tuesday July 06, 2004 @11:11AM (#9621750)
    Few points are emerging (I am still reading the responses, so pardon me if your comment didn't make it here):

    Hard drives based solution seems to be (currently at least) cheap + easy to use for immediate use

    DVDs are cheaper for long term storage but automation devices are still not commonly available (plus capacity per DVD is small)

    Software RAID is slow for writing but okay for reading data

    Tapes may be an viable alternative for long term data storage but the tape drives require an initial investment

    Some readers have mentioned "LaCie Bigger Disk". $1199 for 1 TB disk space ... a price to pay for convenience.
    Since a lot of you have asked me, now I will explain the nature of my data, its storage and analysis.

    I am a scientific researcher working in computational biology. I do atomistic modeling and generate snapshots of protein conformations along MD trajectories. The data is analyzed several times to calculate different quantities. Those of you who do this kind of stuff know that we can collect and store only a fraction of data we want. I generate this data on supercomputers and then compress it (using bzip2 and gzip) to store temporarily and permanently.

    The data generated is ususally analyzed within the next month. A good fraction of the data (about 70-80%) needs to analyzed again several times for different quantities in the next few months. In my experience 80% of data is usually discarded after a year of so. Therefore 2.4 TB/year need permanent storage.

    So 500 GB at least is required for "daily use", 1-2 TB would be nice to have for intermediate use and over 2 TB will need "permanent" storage.

  • Re:Give Up Now (Score:3, Informative)

    by WuphonsReach ( 684551 ) on Tuesday July 06, 2004 @03:10PM (#9624479)
    First rule of archiving data on optical media:

    It will get scratched and damaged.

    Which means that unless you're adding recovery data (using QuickPar) or burning 2 copies, you will lose at least some data on the media within a few years. (Cheap media sometimes only lasts a few months if not stored in dark and climate controller conditions.)

    QuickPar is nice because you can pick how much redundancy you want on the disc. I find that 5-10% is plenty for most uses and guards against all but catestrophic damage to the disc.

    (The guideline for redundancy is based on how often you check the media vs how fast the media degrades or is damaged. If the media degrades at a rate of 1% per month and you only verify the disc annually, you'll want at least 12% redundancy but more like 18% redundancy.)

New York... when civilization falls apart, remember, we were way ahead of you. - David Letterman

Working...