Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Databases Programming Software

High Availability Solutions for Databases? 83

An anonymous reader asks: "What would be the best high availability solution for databases? I don't have enough money to afford Oracle RAC or any architecture that require an expensive SAN. What about open source solutions? MySQL cluster seems to be more master/slave and you can lose data when the master dies. What about this Sequoia project that seems good for PostgreSQL and other databases? Has anyone tried it? What HA solution do you use for your database?"
This discussion has been archived. No new comments can be posted.

High Availability Solutions for Databases?

Comments Filter:
  • by cravey ( 414235 ) * on Monday November 14, 2005 @11:19PM (#14031861)
    While MySQL supports master/slave replication, MySQL Cluster specifically avoids that entire model. It's an entirely synchronous database storage engine. If you want master/slave, use postgres. If you want high availability and can handle the lack of a small number of features, MySQL Cluster is the way to go. The only real downside to the architecture required for CLuster is that all of the data is stored in RAM based tables. transactions are logged to disk every (configurable) time interval. If you're going to try for HA, you might want to RTFM on the available options before you settle on one.
  • by Scott ( 1049 ) <stl@ossuary.net> on Tuesday November 15, 2005 @12:08AM (#14032097) Homepage
    The submitter of this question seems to have confused the two, Cluster and the older replication. Cluster does not in any way rely on a master/slave setup. Think of Cluster as RAID for databases, where you can lose a node (or more, depending on your configuration) before you lose your db. The current drawbacks of cluster are that it is in-memory and doesn't support certain features, such as fulltext indexing. Replication isn't going to cause you to lose data either if your application is designed to handle a situation where the master server (which you kick your writes to) hits the bricks. Have the app go into a read only mode from your slave.

    Neither option is really "beautiful", though Cluster has a lot of promise for the future, especially in 5.1.
  • DRBD (Score:4, Informative)

    by Bios_Hakr ( 68586 ) <xptical@g3.14mail.com minus pi> on Tuesday November 15, 2005 @01:30AM (#14032462)
    Have you looked into DRBD? It works kinda like RAID1 over a network. It uses 2 computers to store the database. Another computer acts as a heartbeat server. You'll need 3 NICs in the database servers; one for the connection to the network, one (gig-e preferably) for the connection between servers, and one for the connection to the heartbeat server.

    http://www.drbd.org/ [drbd.org]

    If you are smart, you'll play around with this on a test network or VMWare first. Get it all tweaked out and actually test it by killing a server while in mid-transaction to see if it works for you.

  • by Joff_NZ ( 309034 ) on Tuesday November 15, 2005 @02:13AM (#14032646) Homepage Journal
    The other thing to note with MySQL Cluster, is that, even in 'stable' releases of MySQL, is horribly unstable, and prone to massive data loss..

    We deployed it ourselves, and it worked ok for a while, but things went very very wrong when we tried updating one of the configuration parameters, causing us to to inexplicably lose quite a bit of data.

    Avoid it. At least for another generation or two, or three.
  • by Anonymous Coward on Tuesday November 15, 2005 @04:24AM (#14033062)
    If you want to stay within open source Slony + PgPool [google.com] is a viable option. Slony is a very capable master-slave replication system and PgPool is an easy way of handling failover and load-balancing (of reads).


    For commercial postgresql options, Afilias (who wrote Slony and uses it to power the .NET domain registry [google.com]) will happily sell you commercial support for less than similar microsoft-or-oracle stuff. I think a company called Command Prompt also has a commercial solution for postgresql, but I haven't tried it.

  • by jorela ( 84188 ) on Tuesday November 15, 2005 @08:21AM (#14033681)
    Actually we have implemented disk storage.
    Hopefully it will make into version 5.1.

    MySQL Cluster 4.1/5.0 supports:
    * transactions
    * transparent data partitioning and transparent data distribution
    * recovery using logging
    * (multi) node failure and automatic non blocking hot sync for crashed/stopped nodes
    * hot backup

    MySQL Cluster 5.1 will support:
    * user defined partitioning
    * cluster to cluster async. replication (like "ordinary" mysql replication)

    The disk impl. supports
    * support putting column in memory or on disk
        (currently index columns has to be in memory)
    * all HA features mentioned above.
  • by delta407 ( 518868 ) <slashdot@nosPAm.lerfjhax.com> on Tuesday November 15, 2005 @10:49AM (#14034487) Homepage
    Name a DB from a 'big guy' that doesn't require a shared resource. Oracle? Nope. RAC requires a shared storage solution. Lose your single storage device device and you lose access to your db.
    ...
    The only time I will EVER have to take my cluster down is to add more storage space or to replace my switch.
    Sorry, what? You're bashing Oracle for having a system with a single point of failure, then add two of your own?

    "Shared storage" doesn't mean "not highly available". If you're serious about building a database that cannot go down, you get multiple servers, each with two Ethernet interfaces and two Fibre Channel cards. Get two Fibre Channel switches, and two Ethernet switches. Get two Fibre Channel disk arrays. Hook together as follows.

    Both disk arrays have one or more uplinks to both FC switches. Each server has one HBA connected to each switch. Then, use Linux multipathing to provide automatic failover in case either switch dies or either HBA dies, and use Oracle ASM or Linux MD to mirror the data so you're good even if you unplug the shared storage.

    Set up the servers to use 802.1q VLAN tagging. (Remember, it's a good idea to keep your inter-node communication separate from client-to-server communication.) Create two Ethernet bridges, using the Linux bridging driver, binding eth0.2/eth1.2 to one and eth0.3/eth1.3 to the other. Take the two switches, set up 802.1q on the ports going to your database servers, and connect them together (preferably using high-speed uplinks, but channel bonding works fine). Enable spanning tree on the switches and on the bridges. Connect both switches to the rest of your network.

    Now, if a switch starts on fire, spanning tree on the servers will fail over to the other one automatically. If a network cable gets cut or a network card goes out on one node, that one node will fail over all traffic to the remaining interface, and the inter-switch trunk will make everything keep working. Suddenly, you've got a robust network.

    So, we've got network and storage covered. What about the database software? Neither MySQL nor Postgres can use this sort of configuration, but Oracle RAC can. Add a dash of TAF, and suddenly, any component -- network switch, database server, SAN switch, disk array -- can fail in the middle of a SELECT and the application won't notice. That is highly available.

    Yes, this solution costs more. Our data -- more accurately, the cost of it not being accessible -- justifies the expense. But don't tell me that shared storage is a weakness.
  • by Wojtek_at_Avokia ( 931205 ) on Tuesday November 15, 2005 @11:05AM (#14034646) Homepage
    Hey, first, full disclosure: I work at Avokia. But we do have an availability solution that is cheaper than RAC (doesn't require a SAN) and combines the value props of both RAC and DataGuard. We virtualize the data layer enabling many identical databases to be in production serving your users. And you can put these databases into geographically distant datacenters. So you get a live-live-live... set-up without the need for manual conflict resolution that others require. Check it out at: www.avokia.com Wojtek (I'm the product manager).
  • Radiant Data PeerFS (Score:2, Informative)

    by darkone ( 7979 ) on Tuesday November 15, 2005 @11:24AM (#14034782) Homepage
    http://www.radiantdata.com/ [radiantdata.com]
    Radiant Data has a product called PeerFS which is a replicated filesystem (rw/rw, active/active) which allows you to also hold MySQL databases on it. You run 2 seperate MySQL servers pointing to the same data folder, and have it use POSIX locks for writes. The data is physicaly held on each server, and synced across the network.
      I am testing it at work ( http://www.concord.org/ [concord.org] ) now for our websites. VERY easy to setup, but it supports MyISAM tables, and NOT InnoDB (or the other way around).
      This does not solve IP failover, but there are other solutions out there for that. As a bonus we're holding our php webpages and MySQL databases files on the replicated PeerFS Filesystem.
      -Ben
  • by delta407 ( 518868 ) <slashdot@nosPAm.lerfjhax.com> on Tuesday November 15, 2005 @07:32PM (#14039435) Homepage
    Yep, Oracle RAC requires a quorum disk (CRS calls it a "voting disk"). However, if you use OCFS2 to store the data files (including the voting disk) the solution above cannot produce split-brain. Putting the data files in OCFS2 puts the responsibility of guarding the shared storage in the filesystem, which solves the problem as follows.

    Each node in an OCFS2 cluster updates a special file every two second and maintains a TCP connection to its neighbors. If this process doesn't work (i.e. writes are failing, are backlogged, or for whatever reason don't hit the disks in a timely fashion), the node will fence itself. Likewise, if the node starts on fire, it will naturally stop updating the heartbeat data. Either way, other nodes are aware of who is up and who is down, even if the IP network dies. If a node's access to the storage system dies, OCFS2 won't be able to successfully read back its own heartbeat timestamp, and the node fences itself.

    Note that this heartbeat procedure happens on the same block device that the data files are on. There's no way to reach the data files unless a node is in agreement with every other node, thus no way to cause "split-brain".

    Say, somehow, a series of failures occur all simultaneously that results in Array 1 being connected only to nodes 1-4 and Array 2 being connected only to nodes 5-8. Each node keeps heartbeating, and after a little bit, Node 5 notices that Node 1 hasn't updated its timestamp, but that Node 1 just said that it did (over Ethernet). OCFS2 notices this immediately, much inter-node communication happens, and nodes 5-8 decide to fence themselves, leaving 1-4 as the only nodes allowed to touch either disk array. Still working as advertised.

    Let's examine that failure. First, you'd have to terminate the links between both fibre channel switches. Then, you'd have to break uplink 2 on array 1 and uplink 1 on array 2. Then, you'd have to take out the second HBA on nodes 1-4 and the first HBA on nodes 5-8. Simultaneously. And the system handles this gracefully.

    Now, what happens when the network goes out too, you ask? Remember that each node has one connection to both switches. Say we lose a switch -- all the nodes fail over to the other one. Hmm. Say we unplug nodes 1-4 from switch 2 and unplug nodes 5-8 from switch 1 -- both switches are connected together directly, and thanks to VLAN trunking, all the nodes can still reach each other. Darn. Say we also unplug that link. Well, both switches are plugged into the rest of the network, right? Assuming you also set up VLAN trunking there, the switches will be able to reach each other through a third switch. Hmm. What if we unplug those, too?

    Congratulations! Split-brain! Nodes 1-4 proceed on Array 1 and Switch 1, and nodes 5-8 proceed on Array 2 and Switch 2. You only had to unplug 22 different cables -- eight Ethernet cables and eight FC cables from the nodes to the switches, two inter-switch links (one Ethernet and one FC), two disk array uplinks (one from each array to each switch), and two Ethernet uplinks (one from each Ethernet switch) to the rest of the network -- simultaneously.

    As I said, no single point of failure.
  • by msanto ( 81364 ) on Wednesday November 16, 2005 @12:00AM (#14040968)
    For the OCFS2 special file not to be a single point of failure, I assume it's mirrored between both storage arrays. If you lose all connectivity between data centers, assuming each DC has a storage array and a node then don't you still have both halves thinking they are it and still have split brain? That's 2 simultaneous outages and if your company is anything like those I've worked with 1 outage may suffice.

    In my current company, our network group tends to keep things to themselves and we'd never have known we didn't have redundant fiber if someone didn't shoot out our cable that ran through a bad neigborhood and it became a topic of conversation (yes, with a gun).

    I also worked at another company that had redundant fiber lines that were both taken out by a single backhoe, apparently they were bundled together.

"Experience has proved that some people indeed know everything." -- Russell Baker

Working...