High Availability Solutions for Databases? 83
An anonymous reader asks: "What would be the best high availability solution for databases? I don't have enough money to afford Oracle RAC or any architecture that require an expensive SAN. What about open source solutions? MySQL cluster seems to be more master/slave and you can lose data when the master dies. What about this Sequoia project that seems good for PostgreSQL and other databases? Has anyone tried it? What HA solution do you use for your database?"
Whatever else they tell you (Score:2, Insightful)
MySQL Cluster != master/slave (Score:5, Informative)
Re:MySQL Cluster != master/slave (Score:5, Interesting)
Unfortunately, open source hasn't caught up to the big guys yet in the area of replication.
Re:MySQL Cluster != master/slave (Score:3, Interesting)
Yes, the RAM only tables suck for large DBs. On the other hand, they're REALLY fast and they can be easily scaled up on commodity hardware rather than requiring faster and b
Re:MySQL Cluster != master/slave (Score:2)
I have yet to find anything oracle can't do. It is a bit of an 8000 pound gorilla though...
Re:MySQL Cluster != master/slave (Score:1)
I really don't think they had HA in mind when they designed multimaster. Actually, I don't know what they hell they were thinking...
Re:MySQL Cluster != master/slave (Score:1)
No, you won't.
Two phase commit, remember?
Re:MySQL Cluster != master/slave (Score:1)
Re:MySQL Cluster != master/slave (Score:1)
Re:MySQL Cluster != master/slave (Score:1)
I should clarify that I don't mean the real master. The scenario goes something like this:
Client commits a change to Node A. Node A queues the change to be sent to node B.
Node A explodes.
That change never gets to node B until you repair and restart node A.
This is why multi-master is really not a good solution for HA.
Re:MySQL Cluster != master/slave (Score:1)
Node A explodes."
Then client either won't recieve an OK status for its transaction (since it won't be declared as commited till Node A recieves confirmation from node B and confirms again it has recieved that OK -we are talking now about 2PC, remember?) and it will retry or it will recieve that OK status from node B if the system is configured to do so in case of loosing node. It will then recieve either a timeout (if still t
Re:MySQL Cluster != master/slave (Score:1)
The problem comes when a client commits to node A, then switches to using node B (because it has detected node A exploding). It expects the data that it committed earlier to be there now, but it isn't.
Re:MySQL Cluster != master/slave (Score:2)
Oracle? ... MS-SQL? ... SAP? No. Postgres? No. Firebird? No. Who's left?
Sybase?
Re:MySQL Cluster != master/slave (Score:2)
It was not trivial to setup back then (probably much better now), but that's what DBA are for. If you're a highly compensated admin, complexity is not necessarily your enemy.
Re:MySQL Cluster != master/slave (Score:5, Informative)
"Shared storage" doesn't mean "not highly available". If you're serious about building a database that cannot go down, you get multiple servers, each with two Ethernet interfaces and two Fibre Channel cards. Get two Fibre Channel switches, and two Ethernet switches. Get two Fibre Channel disk arrays. Hook together as follows.
Both disk arrays have one or more uplinks to both FC switches. Each server has one HBA connected to each switch. Then, use Linux multipathing to provide automatic failover in case either switch dies or either HBA dies, and use Oracle ASM or Linux MD to mirror the data so you're good even if you unplug the shared storage.
Set up the servers to use 802.1q VLAN tagging. (Remember, it's a good idea to keep your inter-node communication separate from client-to-server communication.) Create two Ethernet bridges, using the Linux bridging driver, binding eth0.2/eth1.2 to one and eth0.3/eth1.3 to the other. Take the two switches, set up 802.1q on the ports going to your database servers, and connect them together (preferably using high-speed uplinks, but channel bonding works fine). Enable spanning tree on the switches and on the bridges. Connect both switches to the rest of your network.
Now, if a switch starts on fire, spanning tree on the servers will fail over to the other one automatically. If a network cable gets cut or a network card goes out on one node, that one node will fail over all traffic to the remaining interface, and the inter-switch trunk will make everything keep working. Suddenly, you've got a robust network.
So, we've got network and storage covered. What about the database software? Neither MySQL nor Postgres can use this sort of configuration, but Oracle RAC can. Add a dash of TAF, and suddenly, any component -- network switch, database server, SAN switch, disk array -- can fail in the middle of a SELECT and the application won't notice. That is highly available.
Yes, this solution costs more. Our data -- more accurately, the cost of it not being accessible -- justifies the expense. But don't tell me that shared storage is a weakness.
Re:MySQL Cluster != master/slave (Score:1)
Re:MySQL Cluster != master/slave (Score:3, Informative)
Each node in an OCFS2 cluster updates a special file every two second and maintains a TCP connection to its neighbors. If this process doesn't work (i.e. writes are failing,
Re:MySQL Cluster != master/slave (Score:2, Informative)
In my current company, our network group tends to keep things to themselves and we'd never have kno
Re:MySQL Cluster != master/slave (Score:2)
As for availability on mysql, The switch limitation is an OS issue. I choose to run an OS that can't handle multiple interfaces per subnet, though that issue has been fixed in the latest version. As for adding storage space, I believe that fix is scheduled
Re:MySQL Cluster != master/slave (Score:2)
When was this? I'm fairly certain that much of what I described won't work on anything older than 10g, and it's really, really painful to attempt this using ASM. This only becomes practical with OCFS2, which (as I recall) is currently supported only by SL
DB2 UDB offers shared nothing and HA (Score:2)
Not quite. Check out the shared nothing architecture of DB2 Universal DB [ibm.com]. You can get DB2 for AIX, Windows, Linux, and other platforms. UDB offers a High Availability Disaster Recovery [ibm.com] solution.
Re:MySQL Cluster != master/slave (Score:2)
That's not true. along with some other things you've mentioned from various marketing materials.
For starters, there's no such thing as "shared nothing" clustering. You have to have a shared resource to have a cluster.
Solutions that claim to be "shared nothing" actually share a network. Once you've gone that far, there's no reason not to do replicated network block devices,
Re:MySQL Cluster != master/slave (Score:2)
Mnesia was built to run non-stop forever because it's supposed to automatically run on a clustered server. This gives it fault tolerance. The best kind, because no matter what PC you buy, some day it's going to break. You can reconfigure it while it's running and although you're better of using Erlang to interface to the database, SQL is av
Re:MySQL Cluster != master/slave (Score:1)
Re:MySQL Cluster != master/slave (Score:4, Informative)
Hopefully it will make into version 5.1.
MySQL Cluster 4.1/5.0 supports:
* transactions
* transparent data partitioning and transparent data distribution
* recovery using logging
* (multi) node failure and automatic non blocking hot sync for crashed/stopped nodes
* hot backup
MySQL Cluster 5.1 will support:
* user defined partitioning
* cluster to cluster async. replication (like "ordinary" mysql replication)
The disk impl. supports
* support putting column in memory or on disk
(currently index columns has to be in memory)
* all HA features mentioned above.
Re:MySQL Cluster != master/slave (Score:2)
Right now this is available only from source (you'll need to build the MySQL 5.1 server using --with-partitioning), but there should be some alpha -Max binaries released soon.
Re:MySQL Cluster != master/slave (Score:5, Informative)
We deployed it ourselves, and it worked ok for a while, but things went very very wrong when we tried updating one of the configuration parameters, causing us to to inexplicably lose quite a bit of data.
Avoid it. At least for another generation or two, or three.
Re:MySQL Cluster != master/slave (Score:3, Funny)
Re:MySQL Cluster != master/slave (Score:2)
They changed the database parms on part of a large clustered HA system and it is the systems fault you lost data? I would be very interested in hearing what parm(s) were changed. If you mean they changed a parm on part of the cluster, then have them "move away from the keyboard, before someone gets hurt."
Re:MySQL Cluster != master/slave (Score:2)
Believe me, it was MySQL's fault.. NDB is seriously seriously flakey
Re:MySQL Cluster != master/slave (Score:1)
MySQL - Master / Slave is the only option (Score:4, Interesting)
We chose to go with a Master / Slave option which basically gave us failover within 3 seconds. Any more fine grained monitoring and the CPU performance on the slave gets pretty high. Not ideal, but probably the best option that uses MySQL when you don't want to be tied so much to one platform.
Re:MySQL - Master / Slave is the only option (Score:4, Interesting)
Re:MySQL - Master / Slave is the only option (Score:2)
Transactions are what make DBs such great tools - the system is always consistent. No corruption issues if some process dies halfway through updating a pile of records.
Any system that expects to compete on this scale really needs to support transactions, unless
Do your own (Score:1, Funny)
How about you have a few Access
Or just write down all of your data, photocop
MOM - Motherly Object Model (Score:3, Funny)
Phone: Ring Ring
Son: BEGIN TRANSACTION!
Mum: Oh, hi dear. How have you been? Did you know that
Son: *cough* Work, remember?
Mum: Yes, dear.
Son: WRITE_DOWN into notepad ADDRESSES. fname='John'. sname='Smith'
Mum: Wait, i have to look up the next identity number thingie. Oooh, that's a big number, I can't even recite it.
Son: Hush! address='12 Rover Roa..
Mum: Oh, dear. My pencil broke.
Son: Ok, Roll back!
Mum: That's fine for you to say. I don't have any erasers.
Re:MOM - Motherly Object Model (Score:1)
Re:Do your own (Score:1)
It's debated [wikipedia.org].
Might want to consider Data Guard (Score:1)
Re:Might want to consider Data Guard (Score:1)
Re:Might want to consider Data Guard (Score:1)
Re:Might want to consider Data Guard (Score:1)
That said you can run a standby database and do manual transfer of archivelog files without using data guard. In this case you only need Standard Edition ($15,000/cpu) or even Standard One Edition ($5,000/cpu) if you only have a one or two cpu machine.
Standard One Edition on a single CPU is pretty cost effective.
Oh that's easy... (Score:2, Funny)
Re:Oh that's easy... (Score:2)
(Subeeism [google.com] - ar ar)
Thank you; I'll be here all week. Try the fish!
Re:Oh that's easy... (Score:2)
(it's blocked by crap stacked in front of it the other 0.00001%
Only 0.00001% of the time? What office environment have you been working in? My co-workers tell me that there's a desk and a filing cabinet under all this paperwork -- I still think it's just a rumor.
What are you doing? It's important. (Score:5, Insightful)
For example, we run a site with data from a thousand odd different data sources, with each source getting updated every hour or so. We do it by parsing the data into static pages. We we receive a datum, we rebuild the pages that depend on it.
We have another site that runs off an Oracle db. the static page site runs about 90x faster, and is basically in memory (disk access is nil.) Now take into account that we can (and do) replicate the static page solution with zero load, we get to a solution that is literally 900x faster.
Now folks are thinking 'oh, the horror!' well... tough! There is no substitute for thinking about your data, and how it flows. A DB is not a given, but a (potentially wrong) answer to a question after you have done some analysis.
Nice "question" (Score:5, Interesting)
Hello, anonymous Sequoia promoter seeking free advertising. (BTW, You might try picking a product name that normal people can spell without thinking about it).
Your solution is not database clustering, and should not be advertised as such. It's more a long the lines of a database connection proxy which supports multiple simultaneous backends and operating on them in parallel, with some added features to make HA-like solutions relatively easy.
The downside of this style of approach, as opposed to an architecture of the likes of Oracle "RAC", is that it doesn't scale up as you add backend nodes (at least not for writes, but in any case for read-only scaling there are simpler solutions for all of the vendors, even the free ones), and it must have limits on how many transactions it can backlog and replay to a temporarily-unreachable/down server before that server has to be re-synced from scratch in order to catch back up (and I have to wonder if there's really any real-world scenario under real transaction load in which the practical net effect wouldn't be a complete resync of a backend server anytime something goes wrong with it, in which case one could throw out any attempt to backlog transactions for a single failed server and just keep things simple - you fall out of sync, you resync).
The open source world really needs a RAC-like solution for PostgreSQL and MySQL (I'm a fan of friendly open-source competition, so while my personal preference is PostgreSQL, I hope both projects stay current and popular for many years to come). Unfortunately there is unlikely to be a generic way to do this, it will probably have to be re-invented for each database project.
I took a brief look around PostgreSQL's guts a while back, and it actually seemed like the architecture they use isn't far off from something RAC-capable to begin with, just nobody's quite buttoned together a few peices here and there to make it happen. Basically on SMP multiple co-operating backends already serve parallel requests and synchronized on a shared memory cache. There's patches out there for the linux kernel to support network-synchronized distributed shared memory. Put two and two together, and what do you get? Something not far off from a first-pass hack at a RAC-like network-distributed database caching system. Most of the other details are easy to solve (start/shutdown, join/leave cluster, tracking of processes across the cluster, etc), or belong in another problem domain (implementing shared storage filesystems (hey, we have GFS, Oracle OCFS, etc available...)). One of the biggest issues would be multiple nodes all having pg "Writer" processes. The first step would probably be to put the writer on one node and failover the writer functionality when that node dies, to be quickly replaced by a scheme whereby multiple writers can work by synchronizing through a distributed lock manager (there are already dlm modules available for linux). Then there's the issue of making the current distributed shared-memory patches do the right thing performance-wise for this kind of usage, and so on. It's not easy, but it's not outside the realm of possibility.
Re:Nice "question" (Score:2, Interesting)
As a SAN has to be shared between all nodes in your cluster, this already limit the availability and scalability of your database to your SAN capabilities (you are shifting the problem to the disk). With a shared nothing architecture you are also replicating the disks and thus distributing the IO worklo
Re:Nice "question" (Score:2)
But you can replicate IO across (high-end) SANs up to several hundred km. Totally OS-independent.
We'e talking serious money though...
This looks like a religious war between shared-disk and shared-nothing solutions but a SAN and its admin cost is usually not compatible with someone seeking for an open source solution. If you can afford the SAN, why not just using Oracle RAC?
SAN isn't the only way to do sha
Re:Nice "question" (Score:2)
What, like SCSI? FC?
Re:Nice "question" (Score:2)
What, like SCSI? FC?
Huh? A chain? That's the least-redundant topology.
HA DB's (Score:2)
Any time you add HA to something, you're adding complexity, and usually a fair bit of it. That's a trade off you need to consider (as is the extra price for software/hardware and support for the solution).
MySQL cluster and replication are not the same (Score:4, Informative)
Neither option is really "beautiful", though Cluster has a lot of promise for the future, especially in 5.1.
Re:MySQL cluster and replication are not the same (Score:2)
Master/slave replication is tenatively scheduled for MySQL 5.1 Cluster. IOW, you'll be able to replicate with a Cluster as a master or slave, or between two separate Clusters. (It's already working, it's just not yet been merged into the main 5.1 tree.) Also, a disk-based Cluster implementation - while not 100% guaranteed at this time - is also a
DRBD (Score:4, Informative)
http://www.drbd.org/ [drbd.org]
If you are smart, you'll play around with this on a test network or VMWare first. Get it all tweaked out and actually test it by killing a server while in mid-transaction to see if it works for you.
Re:DRBD (Score:2)
1. Multicast replication to more than one backup
2. Write support from more than one system.
Coupled with GFS, it would be absolutely astounding. Multiple failure happens, a single backup is not always enough.
speaking from experience (Score:2, Insightful)
*sob*
Re:speaking from experience (Score:3, Insightful)
case in point:
We started off with HA, figured out how to go to cloned configuration: two servers, two RAIDS, no SPoF, right? We had some LAN issues which caused traffic storms, there was a bug in the controller logic, so both RAIDS crashed simulta
Re:speaking from experience (Score:2)
Re:speaking from experience (Score:2)
What often happens is that
Slony + PGPool are one postgresql option. (Score:2, Informative)
For commercial postgresql options, Afilias (who wrote Slony and uses it to power the .NET domain registry [google.com]) will happily sell you commercial support for less than similar microsoft-or-oracle stuff. I think a company called Command Prompt also has a commercial solution for postgresql, but I haven't tri
Provide more information (Score:2)
Take a look at Avokia (Score:2, Informative)
Radiant Data PeerFS (Score:2, Informative)
Radiant Data has a product called PeerFS which is a replicated filesystem (rw/rw, active/active) which allows you to also hold MySQL databases on it. You run 2 seperate MySQL servers pointing to the same data folder, and have it use POSIX locks for writes. The data is physicaly held on each server, and synced across the network.
I am testing it at work ( http://www.concord.org/ [concord.org] ) now for our websites. VERY easy to setup, but it supports MyISAM tables, and NOT InnoDB (or the
Don't need SAN for RAC (Score:2)
RAC is an absolute ******* to install however. The heavily bugged Oracle Installer being the cause of most of these issues.
Incidentally for people experiencing blue screens when building a cluster with Windows Server 2003 Enterprise on Dell 2850 servers with AX100 SAN with Oracle's ASM with QLA200 fibre channel cards - flash the QLA card firmware.
And before anyone asks, I was going to use Linux, but it was act
So use Mandriva (Score:2)
Re:So use Mandriva (Score:2)
m/Cluster (Score:2)
Clustering, Master/Slave, Load- balancing (Score:3, Interesting)
You can acheive this in three basic ways. Each has their own pros and cons. I recommend that you weigh them out and come to a decision you think you can live with.
Clustering - You have a group of servers (physical hardware) each running the same software and working to stay synched up with each other. Now clustering comes in two flavors active/active and active/passive. The active/activ
databases (Score:1)
Depends on amout of data and amout of data changin (Score:1)
hardware solution (Score:1)
Most homebuilt computers now have motherboards that supports
built in RAID-5 support. It's an easy way to provide not just
database HA, but system stability as well.
Just string a couple of SATA-II drives together and activate
RAID support on a motherboard like LanParty from DFI.
RAID-5 is more robust and has higher survival rate in case of
hard-drive failures.