Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Hardware

On Building Massive Data Storage Systems... 7

datazone asks: "I know there are companies that provide data storage solutions, but they are very expensive. Has anyone built anything in the range of 1TB or more of storage? What sort of hardware & software configuration would be used and was it worth it in the long run after taking the time, money and support into consideration?"
This discussion has been archived. No new comments can be posted.

On Building Massive Data Storage Systems...

Comments Filter:
  • 1 TB isn't really so much. We have a 250 GB RAID on a Linux server and are installing another 450 GB next week. Some ideas:
    IDE only if money is very tight. Use a lot of controllers, one driver per channel. If a drive dies, it will often freeze the channel.
    We have an ICP Vortex controller, it works great. We tried an AMI MegaRAID, but the performance was a bit spotty.
    An external case, e.g. CI Designs, will get you 9 1,6" SCSI drives in a 4U 19" case with redundant power etc. Expect to pay $1500 and up for the case.
    With Seagate Cheetah 73 drives you can get about 450 GB per case at RAID 5, assuming a hot spare. (Which is a good idea.) With LVD, chaining multiple cases is not usually a problem.
    Good luck!
  • I'm just going through this process myself, and we've been talking to emc & sun about storage solutions with a view to commiting on one supplier in the next week or so.

    You say you want 1TB of storage, it'd be pretty difficult to store that inside your pc, so you'll need to look at an external system.

    Now you don't say what this data is, or how it's changing over time or how critical it is. Here's a bunch of questions to ask yourself, and some pointers...

    Am I backing up data, or will it be frequently used? Tapes can store a lot cheaply, not very convienient tho.

    Is this data Read Only or Read Write? You may get by on something like a CD caddy.

    Am I performing processor intensive work on the data? 1TB is a *lot* of data, depending on what you want to do with it, 1 machine won't be enough - have a look at systems that allow more than one machine to access it.

    How fast do I need to get at the data? If it's time critical, then you'll probably want to choose a solution that has lots of smaller drives and provides say a fibrechannel interface. More heads = more speed (usually)

    WHat availability do I need on my data? If you don't want embarassing downtimes (1 drive blows, need to take the whole array down to replace it), then you need to look at getting a mirrored, hot swapable solution going.

    How am I going to archive the data? If you are writing to the disks a lot, then you are going to have to take backup into account. Tapes are way slower than hardisks - is the backup of that 1TB table valid considering it took you 2hours to backup and it changed 10000 times during archival.

    How am I going to manage the disks? Does the vendor provide management software. Is it available for your OS? Does your OS handle disks of 1TB? 10TB? You'll need to look at this, and also have a look at the websites of people like Veritas (hint: it's not cheap).

    Now onto your final question...

    > was it worth it in the long run after taking the time, money and support into consideration?"

    that depends, if you need to store 1TB of data then you don't really have an option do you?

  • I work for Acer Australia and we did this for an ISP not so long ago. Essentially it was a RAID 5 system using external storage boxes and multiple drives. Backup was a bastard, but we just used a massive data tape library (Travan I think). It was all using standard off the shelf x86 type hardware, nothing special at all. Worked very well...
  • Dovetailing on others comments; you've given almost no parameters around your 1TB requirement. I'm in the storage industry but will spare you the 20 second commercial. Here are some things to think about.

    Threadedness: How many threads will simultaneously be fetching data? Lot's of threads likes lots of heads, meaning more smaller disks vs. few big drives. If you are in prepress or imaging and it's just you, then stick with a bunch of big drives. Seagate's new 73GB drives are FAT data movers. Great sequential access.

    Do the threads look at common data or is the data very wide spread and the incidence of looking at the same data twice rather small. If you aren't going to look at the same data very often then cache is of little use to you and furthermore, if your sever has any beef to it at all, then use software RAID.

    Hardware controllers add latency. It's rather humorous that a simple software RAID on JBOD which costs perhaps half of a hardware solution is twice as fast as those big Refridgerator RAIDS with all the gizmos in them. Cache lookups and XOR operations and other overhead just gets in the way. The guys that make those fridges will never admit to this. Stay away from fridges with cache if you are doing data-mining, have a BIG data base, are doing decision support or pre-press/imaging. Also, if you are streaming writes, forget about the refridgerators, you end up filling up their cache and then you pay heavy latency costs.

    Hardware RAID is best when you have Junior System Admin's who can't formulate a storage plan and keep it up to date thus confusing everyone, or when the I/O justifies it, smaller databases, home directories and other shared storage etc, or added value storage like some of the new SAN devices that you will see being anounced this summer that mitigate the Latency with other value added features. Sometimes peformance isnt' everything, just mostly! =)

  • ok, now I am all for giving people advice, however I have got to say that I am quite annoyed at the quality of questions being posted on ask slashdot. This quesiton is by all intents and purposes, unanswerable! I mean, how do you build a 1 TB storage solution? You get 1TB worth of drives... ok, that was rather worthless, don't you think? now tell us, do you want striping, do you want mirroring, do you want redundancy? do you want hot-swappability, do you want the ability to walk up to 1/2 your storage, shoot it, and have the database not go down? Come on, we don't ask much, just give us enough to answer your question, and the great hoards of slashdotters will be happy to assist you! (me included)
  • From a few back-of-the-envelope calculations I can see that a minimal 1TB setup would cost at least $7200, for the drives alone (this is using 18 60GB IDE drives without any kind of redundancy) A more reliable solution would be to use SCSI drives in a RAID configuration, which would need 1.5 times as many drives in order to deliver some redundancy. Now we are looking at 27 drives and 4 SCSI busses, which comes to about $40,000, again just for the drives.
  • Rather strange to see a story posted and no "first posts" after a 3 hour+ span...

    But ANYWAYS...

    It all depends on what setup your seeking. IBM offers many products that may suit your needs. I've used a lot of ibm's storage hardware and believe it's top quality. Check out IBMs storage site [ibm.com] and see for yourself. They have many good business solutions. Sun is also a good manufacturer for mass storage. Check out Sun's storage site [sun.com] for more information from them.

    I suppose you just want storage space? Or do you want reliability and redundancy? If you do, the price is going to be jacked up, especially for a tall order weighing in at around 1tb. I don't forsee a big IDE+SCSI chasis full off 528 meg drives :-)

"A car is just a big purse on wheels." -- Johanna Reynolds

Working...