On Building Massive Data Storage Systems... 7
datazone asks: "I know there are companies that provide data storage solutions, but they are very expensive. Has anyone built anything in the range of 1TB or more of storage? What sort of hardware & software configuration would be used and was it worth it in the long run after taking the time, money and support into consideration?"
Building a RAID (Score:1)
IDE only if money is very tight. Use a lot of controllers, one driver per channel. If a drive dies, it will often freeze the channel.
We have an ICP Vortex controller, it works great. We tried an AMI MegaRAID, but the performance was a bit spotty.
An external case, e.g. CI Designs, will get you 9 1,6" SCSI drives in a 4U 19" case with redundant power etc. Expect to pay $1500 and up for the case.
With Seagate Cheetah 73 drives you can get about 450 GB per case at RAID 5, assuming a hot spare. (Which is a good idea.) With LVD, chaining multiple cases is not usually a problem.
Good luck!
Storage (Score:1)
You say you want 1TB of storage, it'd be pretty difficult to store that inside your pc, so you'll need to look at an external system.
Now you don't say what this data is, or how it's changing over time or how critical it is. Here's a bunch of questions to ask yourself, and some pointers...
Am I backing up data, or will it be frequently used? Tapes can store a lot cheaply, not very convienient tho.
Is this data Read Only or Read Write? You may get by on something like a CD caddy.
Am I performing processor intensive work on the data? 1TB is a *lot* of data, depending on what you want to do with it, 1 machine won't be enough - have a look at systems that allow more than one machine to access it.
How fast do I need to get at the data? If it's time critical, then you'll probably want to choose a solution that has lots of smaller drives and provides say a fibrechannel interface. More heads = more speed (usually)
WHat availability do I need on my data? If you don't want embarassing downtimes (1 drive blows, need to take the whole array down to replace it), then you need to look at getting a mirrored, hot swapable solution going.
How am I going to archive the data? If you are writing to the disks a lot, then you are going to have to take backup into account. Tapes are way slower than hardisks - is the backup of that 1TB table valid considering it took you 2hours to backup and it changed 10000 times during archival.
How am I going to manage the disks? Does the vendor provide management software. Is it available for your OS? Does your OS handle disks of 1TB? 10TB? You'll need to look at this, and also have a look at the websites of people like Veritas (hint: it's not cheap).
Now onto your final question...
> was it worth it in the long run after taking the time, money and support into consideration?"
that depends, if you need to store 1TB of data then you don't really have an option do you?
we did this at work (Score:1)
Think smart about your storage. (Score:1)
Threadedness: How many threads will simultaneously be fetching data? Lot's of threads likes lots of heads, meaning more smaller disks vs. few big drives. If you are in prepress or imaging and it's just you, then stick with a bunch of big drives. Seagate's new 73GB drives are FAT data movers. Great sequential access.
Do the threads look at common data or is the data very wide spread and the incidence of looking at the same data twice rather small. If you aren't going to look at the same data very often then cache is of little use to you and furthermore, if your sever has any beef to it at all, then use software RAID.
Hardware controllers add latency. It's rather humorous that a simple software RAID on JBOD which costs perhaps half of a hardware solution is twice as fast as those big Refridgerator RAIDS with all the gizmos in them. Cache lookups and XOR operations and other overhead just gets in the way. The guys that make those fridges will never admit to this. Stay away from fridges with cache if you are doing data-mining, have a BIG data base, are doing decision support or pre-press/imaging. Also, if you are streaming writes, forget about the refridgerators, you end up filling up their cache and then you pay heavy latency costs.
Hardware RAID is best when you have Junior System Admin's who can't formulate a storage plan and keep it up to date thus confusing everyone, or when the I/O justifies it, smaller databases, home directories and other shared storage etc, or added value storage like some of the new SAN devices that you will see being anounced this summer that mitigate the Latency with other value added features. Sometimes peformance isnt' everything, just mostly! =)
how about asking a question that can be answered? (Score:2)
what is meant by "expensive" (Score:2)
Hmm (Score:2)
But ANYWAYS...
It all depends on what setup your seeking. IBM offers many products that may suit your needs. I've used a lot of ibm's storage hardware and believe it's top quality. Check out IBMs storage site [ibm.com] and see for yourself. They have many good business solutions. Sun is also a good manufacturer for mass storage. Check out Sun's storage site [sun.com] for more information from them.
I suppose you just want storage space? Or do you want reliability and redundancy? If you do, the price is going to be jacked up, especially for a tall order weighing in at around 1tb. I don't forsee a big IDE+SCSI chasis full off 528 meg drives :-)