Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Linux Software

Use Of Shared Storage In High Availability Arrays? 18

urbanjunkie asks: "I want to ensure my web site/database farm/whatever is as available as possible, so I checked out many HA (High Availability) packages for Linux. It seems that they -all- seem to want me to use shared storage. I don't want to use shared storage since it moves the point of failure to the disk array. I know that the disk array can be RAIDed etc, but what about a fire, power loss and any of the other things that can go wrong? I'd prefer to have something that replicated changes made to one disk to another disk located in a separate PC that may well be in a location 100 metres away. Is there anything open sourced that can do this?"
This discussion has been archived. No new comments can be posted.

Use Of Shared Storage In High Availability Arrays?

Comments Filter:
  • I know that afs has disreplication abilities. I'm by no means an expert on it, however
  • I should be smacked..that's disk replication..good lord, save me
  • You want Replication, or High-Availability, or Both. High-Availability generally means your running in an environment where your trying to eliminate any single point of failure, but also that in the event of a failure you wish to be up as seemless as possible, regardless that a very large amount of high throughput transactions have been occuring. Enough that repliaction to a distant location isn't very feasable. (atleast not as a primary solution). Filesystem, or Database replication is quite feasable if, after looking at the amount of throuput on your database you could transfer the data fast enough you wouldn't get behind. (In many cases this is accomplished by segmenting the data so no 2 sites could update the same data.). Filesystem replication is similar. You can't expect to be able to update a file locally, and remotely without having to deal with the consequences of simultaniously updating 2 files differently at 2 different locations. If you can figure out how to partition everything so this wouldn't be a significantly issue then its easy...
  • by Zurk ( 37028 )
    RAID 5 the freaking disk space with RAID 5 hardware (icp-vortex has good controllers) and then RAID-1 mirror the RAID-5 array (you need two) over ethernet. not exactly easy to set up but doable. BTW, if not going for linux only solutions, sun does exactly this with some of the SAN solutions available for solaris/sparcs specially on their E4500 line over fcal. i recommend sun if you can afford it (anything less that $1-2 mil budget dont even bother - you need netra, veritas, coupla e4500s, fiber switches, redundant locations etc). BTW, morgan stanley just finished doing this (i dont work for em i saw it done tho) in NY..they have two mirrored locations via fiber using E4500s (10cpu/20gig ram, fcal and netra boxen).
  • If you're worried about any single point of failure, run a Sun T3 storage array. The only single point of failure there is the drive controller, which hardly fails. And since we're into eliminating ALL failure points, buy two. You can link them together so that even if one controller fails, the controller on the other machine kicks in. Zero points of failure!

    For the sake of specs, the thing takes 9 SCSI disks, using FC/AL for linkup, and works with Linux and Solaris (among other systems I believe). Placing 2 together DOES NOT make a RAID5+1 array, the whole thing is straigh RAID5. The systems also have 256 MB of RAM to remove the RAID5 write penalty. Should a catastrophic failure occcur (all power to the box kicks), internal UPSes will dump the info in the RAM to the disks and power down correctly.

    The investment is definitely worth it, and makes things easier than other systems. As for fire damage, get a Halon system and BACKUP YOUR DATA!

    If you're really paranoid, you can seperate the boxes a little...

    My karma's bigger than yours!

  • Not really much of a question to go on. E.g. it makes a big difference if you are just doing a completely db-backed website verses smb sharing or something. For websites, you can setup DB replicaton now with pgsql and mysql. But the single disk system really is good, especially with GFS4.0 now out of beta.
  • It certainly isn't open source, but Tivoli's SANergy product, running with fcal or fibre fabric -attached storage is perfect for this.
  • I'm sure this doesn't relate to your proejct, but it is somewhat in the same vein. IBM OS/390 Unix services [ibm.com]
  • I don't think that you really get it. It _still_ has a _single_ point of failure, and it's a pretty large point, the whole box. Urbanjunkie is asking for a solution that will allow service to continue even if a whole box (any box) is destroyed.

    Think earthquake. Then design a network layout and set of systems that would withstand the total destruction of any system in the network. That's what urbanjunkie is asking for.

    Database replication can be nasty business. It all comes down to the failure modes. Things like distributed transactions can have some pretty nasty ones. No, I don't have a solution. :) If your period between updates is high, you can try having one authoritative system. You perform all your updates to it, and then periodically shut it down and copy the files to the remote servers.

    Jason Pollock
  • by weave ( 48069 ) on Sunday January 28, 2001 @05:28AM (#477210) Journal
    EMC has storage that can be mirrored over a fiber or even copper at speeds as low as 100 Mbps allowing a mirrored HA array to be stashed off-site. It tracks disk writes and duplicates them at the other site. It can even batch up the sync writes to occur in off-peak hours if desired.

    I guess it's how much availability you want. That last 0.001% drives costs through the roof. Many modern disk arrays have everything redundant and hot swappable, including not just disk modules, but power supplies, fans, and controllers.

    Set up a nice HA disk array and cluster the servers and you too [microsoft.com] can run all of your critical services on one subnet!

  • I'd prefer to have something that replicated changes made to one disk to another disk located in a separate PC that may well be in a location 100 metres away. Is there anything open sourced that can do this?

    Yes, rsync. http://rsync.samba.org [samba.org]

    -Peter


    "There is no number '1.'"
  • What you want is multipath. You can use Fibre-Channel for the shared storage interconnect, and then keep the shared storage at different locations. Support for this under linux right now leaves a little to be desired though.

    There are of course other options, and <SHAMELESS PLUG> you can have the company I work for work out a solution for you. Check out our website at http://www.missioncriticallinux.com/ and then call our professional services department </SHAMLESS PLUG>

    Good luck.
  • I agree. The Web was a much better place when failure was always a distinct possiblity. Failure creates jobs, failure keeps me in business, failure is very good for everybody.
  • I believe Coda/Intermezzo support replication & distribution, don't they? Have a look at the Global FileSystem (GFS) too. (See Freshmeat for all these.)

    The main problem with this stuff is that it may not be ready for production use yet.

    Ade
  • RAID 5 the freaking disk space with RAID 5 hardware (icp-vortex has good controllers) and then RAID-1 mirror the RAID-5 array (you need two) over ethernet. not exactly easy to set up but doable Instead of using ethernet you can also buy some SCSI glass converters (+/- $2000,- each) for long distance SCSI and perform the total action in hardware.
  • by crow ( 16139 )
    If you want the ultimate in high-availability storage, go with EMC. You get the highest possible level of reliability within one box, and you can get remote mirroring to another machine room in another building if need be.

    Disclaimer: I work for EMC.
  • I'm going to mention a NT based product and let the moderators do their thing if they need to.

    I was working a place that needed a HA product for NT. Most of the products wanted shared storage or used a file level replication that were either not 100% reliable or had way too much overhead. We stumbled across Vinca's [vinca.com] Co-Standby Server. It establishes a 2-node cluster with a 100Mb, Fiber, etc. link between them. I realize that several packages do this but I think this one had a couple things going for it. The data had to be on a seperate physical disk or volume(s) and copied on a block level. You need 2 seperate disks or volume(s) if you wanted to fail over either direction (system, data and mirror for other system) or if you only had 2 disks you can do a fall back from to the other. You setup virtual IP addresses, shares, etc. and the other machine will assume it's identity if it doesn't answer in the time you tell to. We only rebooted our servers 1 time for SP4 in the year I was there after they were setup.

    I'm not telling the poster that he should look into this non-open $3000-$4000 software on NT. What I am saying is that if a similar type of setup was done on a *nix setup then it would be about as bullet-proof as you can get on a commodity x86 box. I'm talking about a product from 3 years ago for NT. Surely someone can create a similar product on Linux/BSD/Etc..

    To answer the whiners before they start: I would do this myself if I could but I have been severly handicapped by starting with NT. I've got a lot of retraining to do before I could help out with something like this.

  • I know he asked for a Linux solution, but the truth is that sometimes Linux just isn't the best tool for the job.

    Currently, we're running a pair of Sun E4500's (4 CPU 6 gig 2 disk boards 2 IO boards) connected to a pair of Sun StorEDGE A5200 disk arrays by FCAL. Each box has 2 fiber connections to each array, for a max throughput to the box of 2gbit. The 4500's are running Solaris 7 for OS, Veritas Volume Manager for managing disk storage, and Veritas Cluster Server for HA management of Oracle 8.

    Veritas won't let both machines mount the disks at the same time (which would be bad anyway), and it does a rather good job of managing things. Recently when we had a cpu die in the primary machine, the cluster failed over and had Oracle up (and running recovery) in 1 minute 3 seconds. Not bad, considering the other box rebooted itself and didn't shut Oracle down cleanly.

    -j

To do nothing is to be nothing.

Working...