Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Hardware

RAID Solutions For Terrabyte Databases? 28

gullevek asks: "We are about to implement a huge database. In the first two years it will grow up to 650 GB and it will get larger and larger afterwards and could grow up to 2-4 TB. I never before implemented such a large DB. The DB Software has been choosen but now I have to find the right hardware. The basic components are not a problem, but what about storage? I would prefer to use RAID, of course, but what type of RAID? RAID 5 may be the best for disk failure, but it can be quite slow. RAID1 is fastest but the most expensive. Especially when it is at this sizes. And what about the type of drive: SCSI-3? FCA? FibreChannel? Do you folks at Slashdot have any suggestions?"
This discussion has been archived. No new comments can be posted.

RAID Solutions for Terrabyte Databases?

Comments Filter:
  • Your argument about RAID 1 is half correct. When it is WRITING it is slow. When it is reading however, it is almost normal reading speed for the device.
  • by scotpurl ( 28825 ) on Monday January 15, 2001 @02:58PM (#506011)
    Folks like EMC make their living selling, servicing, upgrading, and guaranteeing uptime for big systems like this.

    Put the word out that you need a some RAID, and a service contract to go with it. Start talking with other folks who have similar sized solutions, and see who their vendor is.
  • "Stupid idea? No way! Six tape drives backing up the exact same thing at the same time are better than one, d00d!"
  • Massachusetts. They make refrigerator-sized RAID cages that can even keep your drinks cool. Of course, the prices for this stuff start at about eight figures, :CHA-CHING!!!!:
  • Unless you are using MySQL, your DB vendor probably has some handy tips on how to handle disks -- how many to have, how to configure them, what to allocate each one or group to...

    I don't see a question in this unless you're using MySQL or some other relatively "low-end" DB in which case, you probably have larger concerns to deal with.

    -JF
  • Yeah, I didn't say it'd be cheap. But if they're going to do this, you may as well pay someone who's got engineers who fantasize about data striping and mirroring.
  • In most companies money IS an ISSUE. To get approved funding for what the database may be in 5 years will be hard to get funding for. Since money is allocated in fical YEARS typically the accountants will talk to the IT staff and ask how much they need for a YEARS worth of equipment. While you are researching you options look at the best you can get for the size in the next couple years. if you need to upgrade look at scaleability and also if the DB is active 24/7 or if you needed to implement a larger stoarge solution in 2 to 3 years then how long will it be down or CAN it be down for any amount of time.
  • As usual, rant first then opinion.

    When I see questions like this on Slashdot, I get chills. This is obviously a big-budget job and yet the guy responsible for the project seems to be asking some very basic -- too basic -- questions. Honestly, this isn't a flame. I've been in that boat myself from time to time. However, before I'd every consider holding myself up to public ridicule, I'd do some heavy research. (And hope the boss never finds out. {grin})

    Anyway, don't even consider RAID5. That'll be double-dog slow. You need to go mirrored. Yes, that's more expensive. However, I can't imagine a database vendor recommending anything but mirrored. Actually, Oracle says folks should use RAID 0+1 which is mirrored stripes. We don't but our system was setup before that was the recommendation.

    In terms of drive size, the more spindles you have the better. That means, buy nine-gig, not 80-gig, drives. Of course, with your sizing requirements, you may buy 18-gig drives instead. However, my advice still stands. More spindles is more speed.

    Drive technology will be determined by what vendor you choose. We're an IBM shop and use SSA drives exclusively for our RS/6000 systems. It's fast, allows multiple paths (up to eight) to each drive and allows for easy clustering. Since you will have a large amount of drives, it's more important to have multiple paths to each drive than a single ultra-fast backbone. (Ie: sharing 120 mbps across 40 drives isn't as good as sharing 40mbps across groups of 10 drives.)

    As a shareholder of EMC, I highly recommend their products. They are the best bar none. If you are a big player (Wal-Mart, Charles Schwab, etc.), their cost isn't much more than a less qualified solution. (Of course, I don't think you'll be a big enough player to get the really good discounts.)

    Overall, the best advice I've seen in this thread is to ask your database vendor what you need to buy. Oracle|Sybase|IBM wants you to have a good database experience and will not give you bad advice on the hardware front. In fact, Oracle (who I most often work with) can sometimes help you to get bigger discount (no one ever pays list price) out of IBM|Sun|DEC.

    You're in over your head. Make sure you follow the George W. game plan and get yourself some fine advisors.

    InitZero

  • We used an EMC box at my previous job. It was great! I don't remember it costing 8 figures though... I remember something around half a mil, but I could be wrong since I didn't write the check for it.

    But anyway... if your serious about your data, and judging by the amount of data you are planning to store I would guess you are, then EMC is who you should talking to. They are totally first rate!
  • Check out IBM and their 'Shark' product (along with other HA SAN stuff). I've heard good things about it - might be worth an investigation. Supposed to work with any platform, not just IBM stuff, but you'd have to check it out.
  • Maybe MS is replacing the Terra Server.

    --
  • but, for Good performance/low price try Winchester Systems [winchestersystems.com] and look at the article on our website [missioncriticallinux.com] about using them with our cluster. (When reading the article keep in mind that the performance numbers are for one node while another node is also accessing the array. In RAID 0+1 these arrays get ~40MB/sec)

    For Extremely good performance, and many features at a high price, the EMC Symmetrix [emc.com] is definatly the way to go.

    RAID 1 is NOT the fastest, RAID 1 is mirroring. It is slow. RAID 0 is the fastest, but has no redundancy. RAID 0+1 is the way to go for speed/redundancy.

  • Is to go as expensive as possible, that way you won't kick yourself 5 years from now for buying it.
    Get the highest quality disks, fastest channel (Fiber all the way!) and as much storage as possible,
    and maybe, just maybe, it will still be in use 20 years from now, and the company will be glad they put that initial investment in.
  • well the real saying has more to do with big iron back in the day.... but the concept seems to be the same for storage....

    "No one ever got fired for choosing EMC "

    Its pretty much a given that if you are planning to go that big you should not even try to this on your own, thats asking for trouble (given that you admit this is new ground for you). A little while ago when i was researching hosting services (logictier, loud cloud etc.) EMC was pretty much the standard.....
  • [sarcasm]

    Who needs RAID or a relational database? Just read the linux clustering HOWTO and apt-get yourself an enterprise database. MySQL is a better and faster database than anything else out there. Who needs online backup or transactions anyway.

    Besides, since it is open source I can just skip lunch and write a perl script to fix all of my corrupted data!

    [/sarcasm]
  • Having worked in a datacenter with several high end databases, I can't recommend EMC enough. Their disk is fast, reliable, and robust. Their Symmetrix line has onboard cache (gigs of it), direct SCSI connects, fully redundant everything, and the best part of all, EMC support. The arrays dial out when a disk goes bad (or anything else for that matter), the next day an EMC tech is at your datacenter with disk under arm ready to replace it. You really never have to worry about disk problems again. You can even do firmware upgrades with the disk online!

    If you are running a medium to large database like a few terabytes. I would recommend investing in their TimeFinder solution, which allows you to make exact copies of the database (or data) by splitting the disks via a third mirror which they call Business Continuity Volumes (BCVs). This makes backups quicker and easier(simply split the third mirror and mount the volumes to your backup host), database schema changes less risky (split the mirror before the change, and you have a speedy backout), and overall your life easier (you will sleep at night).

    The above is making the assumption that you are also using a real database such as Oracle that can handle raw devices, and online backups, multiple nodes, etc.

    If you can't afford the Cadillac, you can go for the lower end which is their Clariion arrays which is also a damn good little unit if your budget is tight. You can essentially do the same things, except it's not as slick. Either one of them can go up to several terabytes of storage per unit.

    ---Hey, it's me.
  • Heh, ask your EMC tech sometime what happens if it loses a cache card.

    They'll give you like a 10 page document describing why this piece of hardware cannot fail.

    If it does, though, BOOM. The whole EMC frame locks up for any transaction whatsoever, until you get EMC onsite with another to replace it. Hope your data integrity isn't something you're concerned with.

    Bang for the buck, we've done better with Network Appliance installations than I think EMC could ever hope for...
  • If you are planning a large database, you must bring in different vendors. Bring in Hitachi, EMC,IBM, SUN ask to see their proposed solutions to your problem. Let them educate you compare price per megabyte of data, do set up performance benchmarks. Make them duke it out. Each of these coprorations should have a technical pre-sales support that can give you speeds and feeds of their storage product and the sales support to give you price qote. Then you can make an educated desision. And not before.

  • I love it when people who have no clue what they are talking about try to address complex technical issues on slashdot. The whole DBA team in my office almost died laughing at this post.

    Here's a few highlights:

    "Get larger, not necessarily as fast drives for your primary partitions. These can and should be on very large RAID/5 partitions"

    RAID/5 + Databases = Bad data. RAID-5 reduces write performance by about 30% (and uses more cpu), and does not protect your data from controller failure (or for more than one disk failure per volume).

    All data chunks need to be on simple volumes or RAID 0+1. This allows you to have up 50% of the disks in a volume fail without a loss of data. If you use DMP on the fibre channel array, you'll also get load balancing.

    "Get larger, not necessarily as fast drives for your primary partitions. These can and should be on very large RAID/5 partitions."

    In a perfect world, you would have more fast disks. In the past 2 years disk capacity has increased 10x while speeds have increased barely 2x. Multi-terabyte databases need to be on multiple, switched fibre channel arrays with the smallest (like 18GB) disks possible. This is expensive, but if you have 2TB of data online, you should have the money to buy a real solution.

    "Get a bunch of smaller, but at least 10,000 RPM drives for your index storage. They should be on quite a few different hardware RAID adapters, and you should be using RAID/0 for them. For this, you don't care about losing a drive. The worst that can happen is reduced performance while you rebuild an index, you'll never lose any data."

    This one is great. Any DBA who consideres it to be no big deal to lose a whole dbspace worth of detached indexes need to go back to Burger King. I'm sure everyone will be REAL happy when the database is in single-user mode while you 'just rebuild' all of your indexes. (All of your indexes were lost, since you have no mirroring, remember?)

  • My last SAN project was 5TB raw disk and the solution that EMC pitched me was close to 1Million dollars more expensive than the competition.

    In addition to being cheaper, the alternatives were mostly faster than EMC disk *AND* they played nicely with fiber channel hardware from other vendors (unlike EMC which likes to lock you in to their hardware only).

    Not to knock EMC (solid product & killer support) but their obscene (no other way to describe it) pricing only makes their stuff worthwhile in situations where you need a 'black box' solution where some other guy is on the hook for hardware failures. In the life sciences I see EMC disk being used on drug manufacturing process hardware as well as on databases and systems that come under FDA scrutiny (patient outcome and clinical trial data, etc.) Generally people with more money than sense purchase EMC for anything but the most absolute mission critical stuff.

    The other thing that annoyed me about EMC was the overly agressive frat-boy style sales force. The internal competition to make sales quotas is killer I've heard.

    I ended up going with Brocade fiber channel switches (Silkworm II) and Compaq StorageWorks disks. We needed a SAN that could talk to NT, Linux, Tru-64, Solaris, HP-UX and Irix systems all at once.

    I'm not a Compaq cheerleader but I like the StorageWorks line because although they are not always the first to market with the latest buzzword technology when they do come out with a product it is generally really solid and actually reasonably priced. The other cool thing about their new universal drive form factor is that all of their disks are now plug and play from the lowest end proliant server all the way up to their high end systems.

    As for RAID levels and such you really need your database architects to tell you what they need. It may end up being a mix of RAID 1+0 and RAID5 for some filesystems and they may ask for solid state disks to store indices and such. Hardware tuning for high-end databases is a whole field in itself and there are lots of people out there who can probably tell you exactly what should be needed.

    What you are going to find at the end of this project is that disk capacity is pretty simple and easily handled. The real problem you are going to have is figuring out how you are going to backup 5TB worth of DB data :) Not a trivial task by any means...

    just my $.02

    -chris

  • Sometimes it has to be your first time. For our company this is the first real big thing we do. And so I do a lot of research. Ask around. Read a lot. Phone around, etc.

    So I also asked slashdot. Hey, and I have to admit I get a lot of advise here and after reading all the posts, I see some common ground here.

    If we all would start with the "top" knowledge, well, wouldn't that be very boring. Our goal (from human point of view) is to learn. When I just do the same thing, I can do, I wouldn't evolve.

    Anyway, you didn't flame me, but I realised myself how much I still have to learn about High End DB things.

    Thanks anyway for your advice. Yours were one of the best here!

    mfg, Gul!
  • ...the next day an EMC tech is at your datacenter with disk under arm ready to replace it...

    Next day?! Try 4 hours. I don't know what support you guys got, but they have 4 hours to be onsite and have it fixed with our Symmetrix.
  • by MemRaven ( 39601 ) <kirk.kirkwylie@com> on Monday January 15, 2001 @10:10PM (#506032)
    In which case, things get a whole lot trickier than just a bunch of files, because you have to consider what your usage pattern is (in terms of what the database is doing) and how that impacts the disk usage (in terms of HOW the database does it).

    The first thing is to talk to your DBA and get his/her input. DBAs, competant ones, have done a lot of this type of work in the past, and they'll have an enormous amount of help to provide you. They'll know your usage pattern by heart, and be able to provide you with some help as to usage.

    The first thing to realize is that for most RDBMS usage patterns, RAID is a Very Very Very Bad Thing. But when I say "most", I mean "most with updates to live data."

    RDBMS' use data in 4 main types of storage, and it's important to understand them:

    • Main Table Storage. This is where your data actually "lives", and is ironically the least important storage wise.
    • Temporary Table Storage. This is the storage space for temporary space and temporary tables, which is extremely useful for performance management.
    • Index Storage. This is where data indexing structures live, and is extremely performance critical.
    • Log Storage. This is where the log for your system lives (physical and logical) and is also extremely important for performance.
    The most important thing for performance is to PHYSICALLY segregate ALL four types as much as possible. For example, if you're going into multi-terabyte databases, you might want all four types of data not just on different disk arrays (i.e. RAID controllers), but also on different SCSI channels, and different host adapters (i.e. multiple SCSI or FC-AL cards) altogether.

    You also want to bear in mind that your update speed is limited by the ability to handle log writes. Log writes aren't limited by bandwidth. They're limited by the latency of each disk. Every disk can handle a certain number of operations per second. Even if you add more disks in a RAID configuration, you're never going to be able to handle more transactions per second, because you're not increasing the number of operations of any of them, and all of them must be touched for every transactional write.

    So with that in mind, allow me to recommend something:

    • Get a bunch (as many as you can afford) of 15k RPM disks. Each of those should be a separate log device. Spread them throughout your SCSI or FC-AL adapters, as evenly as possible. If you are going to use RAID, which you actually should for this, you should have quite a few RAID/1 matches, each one one log to one mirror. If you're using Solaris or another commercial UNIX, software RAID is fine for this as long as you have hot swap. Otherwise, use cheap hardware raid. Even if you're using FC-AL for everything else, you might want to consider plain, old SCSI for this stuff, becuase latency is your #1 concern, not bandwidth.
    • Get a bunch of smaller, but at least 10,000 RPM drives for your index storage. They should be on quite a few different hardware RAID adapters, and you should be using RAID/0 for them. For this, you don't care about losing a drive. The worst that can happen is reduced performance while you rebuild an index, you'll never lose any data. Create as many logical units as you can get away with, and again spread them out.
    • Get larger, not necessarily as fast drives for your primary partitions. These can and should be on very large RAID/5 partitions. Any commercial RDBMS will handle slower drives for these with very little additional overhead. The log and index partitions are your bottlenecks. Each SCSI channel or FC-AL adapter should have the bulk of its bandwidth be taken up by these. THIS, coincidentally, is where EMC comes into play, along with the Index storage.
    • For temporary space, get some hardware RAID adapters and some reasonably fast drives, and put them on RAID/0, not RAID/5. Again, this is not your core data, who cares if it goes down?

    The number one advice I can give is to consult with others. If you haven't done this before, there are people (your DBA, your database vendor, your hardware vendor, your systems integrator) who have. This is serious business, and not something to screw around with. Terabyte-level databases are still NOT so common that everyone can and should attempt them. Having terabyte-levels of data throughout an enterprise is, but in one application it isn't. You'll probably not get it right the first time, so take your time and consult with every one of your vendors on capacity and performance planning.

    Not to be crass or mean, but if you're asking slashdot, you probably shouldn't be doing this all by yourself.

  • Of course, the prices for this stuff start at about eight figures

    No they don't. Try 10^6x10^7 to get started

  • The Sandataditeco is god. Networks. The SDD is what you want. 8 ports of Fibre Channel on the front end, 20 ports of Fibre channel out the back end. up to gigabytes of cache, up to 1080 fibre channel drives beind it. It's an I/O monster.
  • EMC has one tier of support for Symmetrix, as I understand it. Response times probably depend on severity.
  • EMC, Compaq, IBM etc all make systems that will do what you want, get them to make you formal proposals for systems. For a setup of this size you really need one of these systems and you need someone who can figure out what you really need.

    Good Luck.

    The cure of the ills of Democracy is more Democracy.

  • And how are they going to achieve TBs with 18 GB drives??

    Can you really not do the math?

    1000gb divided by 18gb seems to 56 (rounded up) drives. Double that since it's mirrored. That's 112 drives. An IBM SSA array (7133-040) holds 16 drives. That comes to seven drawers.

    We've got 148 drives (mostly 4.5gb and 9gb since we're a smaller shop) online in a similar SSA configuration.

    Of course, if the row size is substantial (binary objects such as images), it may make sense to use larger drives. However, if the data is primary textual in nature (ie: small), keep with the relatively small drives.

    InitZero

Love may laugh at locksmiths, but he has a profound respect for money bags. -- Sidney Paternoster, "The Folly of the Wise"

Working...