Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Sun Microsystems

Free High-Availability Solutions For Solaris? 11

prwood asks: "Our company is looking into high availability solutions for our Sparc-based servers running Solaris. Primarily this would be used on our Oracle Database servers, but presumably we'd like to expand this to frontend servers. We've looked at Veritas, as well as Sun and Oracle's HA solutions, but we're not wanting to lay out that much money this year. We also have some quick and dirty in-house scripts and such to perform certain tasks, but we're looking for something a bit more robust. So the question is: Are there any free or low cost solutions for getting HA on Solaris? I know there is some good stuff for Linux, but it seems to depend in part on patches to the Linux kernel, and switching to Linux isn't an option for us at this point. Note that we're not looking for load balancing, but rather failover solutions. Suggestions?"
This discussion has been archived. No new comments can be posted.

Free HA Solution for Solaris?

Comments Filter:
  • by Anonymous Coward
    How much money do you want to spend? That is, what is high availability worth to you?
  • which my company did. In particular me... However we had to write most of that code anyway for failover of our custom hardware so it was more a matter of porting to solaris. Course we also designed from the start for failover.

    A hugh pain, overall. Many tricky problems. Many little things to come back to bite you. There are a couple special cases that bite me time and time again because I cannot tell the difference between the two. (ie is the other machine broke, or just the cable between us?)

    We concluded that all HA got us was notification when to fail over (And disk mirroring I suppose). We still had to write the code to do the failovers. Soon after we decided that it wasn't worth the expense and wrote it our self.

  • What about personal recommendations for free Linux/FreeBSD High Availability?
  • by Shaman ( 1148 ) <shaman@@@kos...net> on Tuesday July 18, 2000 @09:51AM (#924455) Homepage
    Solaris 8 comes with failover, RAID and other HA utilities free of charge. They're on the CD. Even the $75 "free" version of Solaris.

    You can also get a free "Admin Pack" from Sun.

    Look into it, Sun's got what it takes in that area.
  • I believe in using free tools when they are better (cost, performance, shape, color, smell, etc.) than commercial tools.

    In terms of high availability, the free tools don't come close to the commercial tools. I've used Veritas on Sun for many years and for the last two years, I've been using IBM's HACMP. I have evaluated three Linux tools for HA and only one worked and it didn't even work well.

    If you've spent the money to buy Sparc and Oracle, you probably aren't working out of your basement. Don't close the checkbook now. If you need HA, buy HA.

    Spending the money won't put your company out of business. Not having a high availability solution just might.

    InitZero

  • I have to agree. Use the best tool for the job. If you're using Solaris and Oracle, use tools that are designed to work with them. For instance, (IIRC) Oracle Parallel Server (OPS) can run on all the nodes in the cluster at once. This gives you more bang for the buck while all the nodes are operating, and fault tolerance when one fails.

    If your company depends on having that database available, your management needs to ask themselves if they can afford be effectively closed, while your HA software decides that your master node really is down and starts to recover on the other node?

    If you try to roll your own solution based on Free/Open Source tools, be ready to spend a lot of time thinking about what might go wrong. (e.g. What happens when a failed node recovers? Do you leave the currently running node as the master, or do you migrate back? How do you migrate the required filesystems? How do you make sure the recovering node doesn't try to run fsck on a filesystem the other node is actively using?)

    Just ask youself, or your management, if you really want to be alpha testing some home grown HA software on your production servers?

  • The most important piece is being able to monitor whats happening at the OS level to avoid trouble to start with. I have had results on my primarily Solaris network - with some AIX and Linux boxes also, using EnlightenDSM. Regarding failover and availablity, some cool stuff is the ablity to pool machines and setup rules that will actually move services to another machine if machine is unavailable.

    http://www.enlightendsm.com/ is the url i think

    Also look at the the Sun Management Console and its monitoring capabilities.

  • a basic active-passive failover cluster typically consists of a RAID (makes your disks HA) and at least two server nodes. the RAID is either dual-ported SCSI or fibre-attached, such that both nodes (or more in the case of fibre) can 'see' the LUNs on the raid at the same time.

    the key is having a volume manager that allows 'importing' or 'deporting' of the volume sitting on the shared LUN. Both systems, although they physically can, should never try and write to the shared LUN at the same time. you need to be able to control this, so you can logically move ownership of the volume back and forth between systems, depending on the state of the cluster.

    veritas volume manager does this - if a volume is deported, the other system can mount it, etc. i suspect DiskSuite (which comes with Solaris) would do the same thing, but not sure.

    if you can wrapper up something to move the ownership of the volume around, then the rest is trivial (ifconfig-ing up and down the network interfaces - i.e moving the primary system's node name between boxes; and starting/stopping Oracle and the sql*net listeners)

    as long as both systems will normally obey the rules of the volume manager, you could roll up a basic failover system from scratch pretty easily.
    (if node B notices the database on node A is down,
    it takes ownership of the volume, ups server's IP address on it's net interface, and starts oracle.

    another nice thing to have, which isn't there by default on most Suns, is the ability to 'kill' the node that is 'thought' to be down. i.e. if it appears that node A is dead, and it really isn't (maybe just the network card is hosed) - you need to be able to ensure that node B can take over without node A holding on to stuff (like the IP address) - SGI provides a serial controller to do this on most of their systems, but Sun doesn't.

    anyway, at the very least you need that - the rest you could probably pick up from the usual linux open-source HA packages (heartbeat stuff, process monitoring, notification, etc.)

    the only reason HA stuff isn't more popular on linux is due to lack of a world-class volume manager - if such a thing existed, there'd be a lot more HA stuff for linux floating around i bet.

    mt
  • Buy a copy of "Blueprints for High Availability" by Evan Marcus & Hal Stern. Then buy failover software - that's right, buy some. With a support contract, so you can rely on it or have someone to yell at if it doesn't work. Homegrown stuff will let you down, unless you have specific expertise in this area; it's easy to script simple failovers using the volume manager techniques outlined here, but it's also easy to forget a lot of the important details. (To the guy who worried about distinguishing between a dead host and a dead cable: you need redundant monitoring links!)

    Sun will insist that their consultants install their cluster software for you (otherwise they won't support it), which is where the main expense lies. OPS requires resilient, simultaneously accessible shared storage on every node and a certain amount of Oracle genius.

    There are plenty of other failover solutions on the margins so look around. (Disclaimer: my company sells one, but there are many others. I do not speak for my employer. Disclaimer 2: you need more than just failover software for HA, see book mentioned above.)

    If you really want free/open source, I believe some of the Linux solutions are portable.

    Ade_
    /
  • For many applications, writing your own makes good sense. But even then you can get a flying start by using components made by others with high availability and reliability in mind. By building your own using fine-grained components you keep control over what you're building while also having some comfort in knowing that others have successfully done similar things using the same tools. And been happy with the results.

    One example is the erlang programming language and libraries [erlang.org], which were developed specifically for writing high availability telephone systems. This is open source, available as tar.gz for solaris or in debian, red hat, BSD...

    People have gone on to use these tools (both on Solaris and other OSes) to build high availability web systems (e.g. lodbroker [lodbroker.com]) and robust email systems (e.g. bluetail [bluetail.com]).
  • First, Solaris 8 has some HA stuff in their
    distribution.

    Second, HA Technical Solutions
    (http://www.tech-sol.com) has a "poor man's
    Veritas, that runs only $1500.

    Third, one can certainly take a public-domain
    service-level monitor, such as BigBrother,
    and look at it as a set of inputs to a state
    machine that would load different DNS files
    and restart bind/named whatever, thereby
    redirecting traffic to servers that
    are known to be up.

A computer scientist is someone who fixes things that aren't broken.

Working...