Free High-Availability Solutions For Solaris? 11
prwood asks: "Our company is looking into high availability solutions for our Sparc-based servers running Solaris. Primarily this would be used on our Oracle Database servers, but presumably we'd like to expand this to frontend servers. We've looked at Veritas, as well as Sun and Oracle's HA solutions, but we're not wanting to lay out that much money this year. We also have some quick and dirty in-house scripts and such to perform certain tasks, but we're looking for something a bit more robust. So the question is: Are there any free or low cost solutions for getting HA on Solaris? I know there is some good stuff for Linux, but it seems to depend in part on patches to the Linux kernel, and switching to Linux isn't an option for us at this point. Note that we're not looking for load balancing, but rather failover solutions. Suggestions?"
How much money do you want to spend? (Score:1)
There is always roll your own. (Score:2)
which my company did. In particular me... However we had to write most of that code anyway for failover of our custom hardware so it was more a matter of porting to solaris. Course we also designed from the start for failover.
A hugh pain, overall. Many tricky problems. Many little things to come back to bite you. There are a couple special cases that bite me time and time again because I cannot tell the difference between the two. (ie is the other machine broke, or just the cable between us?)
We concluded that all HA got us was notification when to fail over (And disk mirroring I suppose). We still had to write the code to do the failovers. Soon after we decided that it wasn't worth the expense and wrote it our self.
Slightly OT (Score:1)
HA (Score:3)
You can also get a free "Admin Pack" from Sun.
Look into it, Sun's got what it takes in that area.
Yipes! Say it Ain't So (Score:2)
I believe in using free tools when they are better (cost, performance, shape, color, smell, etc.) than commercial tools.
In terms of high availability, the free tools don't come close to the commercial tools. I've used Veritas on Sun for many years and for the last two years, I've been using IBM's HACMP. I have evaluated three Linux tools for HA and only one worked and it didn't even work well.
If you've spent the money to buy Sparc and Oracle, you probably aren't working out of your basement. Don't close the checkbook now. If you need HA, buy HA.
Spending the money won't put your company out of business. Not having a high availability solution just might.
InitZero
Re:Yipes! Say it Ain't So (Score:2)
If your company depends on having that database available, your management needs to ask themselves if they can afford be effectively closed, while your HA software decides that your master node really is down and starts to recover on the other node?
If you try to roll your own solution based on Free/Open Source tools, be ready to spend a lot of time thinking about what might go wrong. (e.g. What happens when a failed node recovers? Do you leave the currently running node as the master, or do you migrate back? How do you migrate the required filesystems? How do you make sure the recovering node doesn't try to run fsck on a filesystem the other node is actively using?)
Just ask youself, or your management, if you really want to be alpha testing some home grown HA software on your production servers?
High Availability (Score:1)
http://www.enlightendsm.com/ is the url i think
Also look at the the Sun Management Console and its monitoring capabilities.
all you need a volume manager and dual-ported raid (Score:2)
the key is having a volume manager that allows 'importing' or 'deporting' of the volume sitting on the shared LUN. Both systems, although they physically can, should never try and write to the shared LUN at the same time. you need to be able to control this, so you can logically move ownership of the volume back and forth between systems, depending on the state of the cluster.
veritas volume manager does this - if a volume is deported, the other system can mount it, etc. i suspect DiskSuite (which comes with Solaris) would do the same thing, but not sure.
if you can wrapper up something to move the ownership of the volume around, then the rest is trivial (ifconfig-ing up and down the network interfaces - i.e moving the primary system's node name between boxes; and starting/stopping Oracle and the sql*net listeners)
as long as both systems will normally obey the rules of the volume manager, you could roll up a basic failover system from scratch pretty easily.
(if node B notices the database on node A is down,
it takes ownership of the volume, ups server's IP address on it's net interface, and starts oracle.
another nice thing to have, which isn't there by default on most Suns, is the ability to 'kill' the node that is 'thought' to be down. i.e. if it appears that node A is dead, and it really isn't (maybe just the network card is hosed) - you need to be able to ensure that node B can take over without node A holding on to stuff (like the IP address) - SGI provides a serial controller to do this on most of their systems, but Sun doesn't.
anyway, at the very least you need that - the rest you could probably pick up from the usual linux open-source HA packages (heartbeat stuff, process monitoring, notification, etc.)
the only reason HA stuff isn't more popular on linux is due to lack of a world-class volume manager - if such a thing existed, there'd be a lot more HA stuff for linux floating around i bet.
mt
Get informed (Score:1)
Sun will insist that their consultants install their cluster software for you (otherwise they won't support it), which is where the main expense lies. OPS requires resilient, simultaneously accessible shared storage on every node and a certain amount of Oracle genius.
There are plenty of other failover solutions on the margins so look around. (Disclaimer: my company sells one, but there are many others. I do not speak for my employer. Disclaimer 2: you need more than just failover software for HA, see book mentioned above.)
If you really want free/open source, I believe some of the Linux solutions are portable.
Ade_
/
Re:There is always roll your own. (Score:2)
One example is the erlang programming language and libraries [erlang.org], which were developed specifically for writing high availability telephone systems. This is open source, available as tar.gz for solaris or in debian, red hat, BSD...
People have gone on to use these tools (both on Solaris and other OSes) to build high availability web systems (e.g. lodbroker [lodbroker.com]) and robust email systems (e.g. bluetail [bluetail.com]).
Two Options Exist (Score:1)
distribution.
Second, HA Technical Solutions
(http://www.tech-sol.com) has a "poor man's
Veritas, that runs only $1500.
Third, one can certainly take a public-domain
service-level monitor, such as BigBrother,
and look at it as a set of inputs to a state
machine that would load different DNS files
and restart bind/named whatever, thereby
redirecting traffic to servers that
are known to be up.