No Hassle RAID 5 Implementations? 51
LambSpam asks: "I had a nightmare week (last week) with two of our servers running Intel's U3-1L RAID controller (RAID 5). Whenever there's a power outage in our building these controllers randomly mark one or more of the drives in the array offline (even with adequate UPS support), which means I have to manually mark them online and/or rebuild. Intel acknowledged the problem, but their solution involves updating the backplane's firmware, the controller firmware (destructive upgrade!), and even the firmware on our IBM drives in the array because they 'draw too much power' in certain conditions. I've only used one other RAID 5 implementation (MegaRAID), and it NEVER had these kinds of problems, whereas if you sneeze too hard around this U3-1L card it will go offline. Is this common with most hardware RAID implementations? What RAID 5 implementations works without hassle? What should I stay away from?"
PERC? (Score:2)
Re:PERC? (Score:3, Interesting)
Re:PERC? (Score:3, Interesting)
Just a note on EMC.. When i've had the joy of working with a Symmetrix, EMC has always done a wonderful job of never having any downtime. They would come out at any hour of the day or night to replace a redundant card or a spare disk that wasn't even being utilized. They always evaluate any changes before they are made. I'm sure its possible for them to make a mistake, but for mass storage they're the ones i would choose.
Re:PERC? (Score:3, Interesting)
Just FYI, Sun doesn't actually make their high-end storage product. I think they call it the StorEdge 9900 or something but it's actually a rebranded Hitachi Data Systems 9960.
Funny thing about HDS. When you buy one of their 9960 systems-- a minimum investment of about $250,000-- you get a guarantee. If you ever lose any data at all on that storage system due to hardware or firmware fault, HDS will give you 30% of your purchase price back.
According to one of the senior HDS VPs that I spoke to last month, they've never had to pay out that penalty clause.
Re:PERC? (Score:2)
Re:PERC? (Score:1)
Re:PERC? (Score:1)
Re:PERC? (Score:2)
Xiotech (Score:1)
Re:ICP Vortex (Score:1)
IBM HDs (Score:1, Informative)
Re:IBM HDs (Score:1, Interesting)
FWIW, I've found the drivers for the PERC in FreeBSD to be far better than those in Linux.
Re:IBM HDs (Score:1)
Eurologic (Score:1)
Due to excessive server room heat, we did lose a drive, but data was fine. While it has Windows software to monitor it when connected via scsi, they didn't have anything for unix, so configs had to be done via telnet on its serial port.
Tried Adaptec? (Score:5, Informative)
Now the 3200 RAID Controllers int he Compaq's, thats another diffrent story altogether.
We had roughly 2000 servers, operating 24/7 @ 67 degrees F. Two times a year we had a site shutdown. Every single time we had to bring everything back up we would have anywhere from 3-5 Compaq array controllers die. But never once did the low-buck Adaptecs crap out on us.
Re:Tried Adaptec? (Score:3, Informative)
Don't take it from me, ask around there. If they worked for you, however, great. Whatever works.
Firmware (Score:4, Informative)
Two possibilities... (Score:4, Interesting)
The other one is something I've heard of (I'm not an electrical expert, but I'll try to explain). Larger (older installations, particularly) sites were wired for three-phase electricity. Over time, they split the phases for normal 110 volt usage. There is a chance where if the PC is connected to power on one phase, but the external unit is connected to power from a different phase, that the differential between the two can cause problems, due to the ground connection between the two through the cable shielding. I know, it sounds like something from the BOFH daily calendar, but it does make sense. Try making sure both pieces of equipment are on the same true UPS, or at least switched UPSes on the same circuit.
Good advice above. (Score:2)
Sounds like good advice in the post above.
Some UPSs switch. Some are always online. You want the latter for a RAID array.
The second paragraph is important. Check your input power. Everything attached to your network should be wired to the same power circuit. Otherwise there is a possibility for feeding large spurious signals to your hardware through the power line.
Re:Good advice above. (Score:3, Insightful)
As others have statued, make sure you have a true "online" ups, but ALSO make sure that you don't run over 50% power utilization on the UPS either due to the non-linear nature of switching power supplies.
Of course the BEST power stability solution is to use all 48VDC equipment like Telco's do. When was the last time your phone went down due to telco hardware failure? Note that most Major hardware vendors have 48VDC versions of their equipment (Sun, Cisco, etc.)
Clarification (Score:3, Informative)
Everything needs to be on the same Ground circuit. It is necessary to avoid ground loops.
"They draw LARGE spikes of current sporadically."
I don't think this is correct. I have designed power supplies, and I don't immediately think of any reason why the power input of a switching power supply should vary differently from the power output. The only surge is when the hard disks spin up, but with SCSI there is a means to stagger the spin-up.
Re:Two possibilities... (Score:3, Informative)
The term you're looking for here is "On-line UPS". There are two basic varieties of UPS, switched and on-line. Both share the following common features: The AC (mains) power coming into the UPS is rectified (converted to DC, usually in the range of 24 to 48 VDC). The DC is used to charge the batteries which are the source for backup power when the mains fails. AC backup power is supplied to your equipment by an invertor (DC to AC convertor) in the UPS which takes the battery's DC juice and "builds" a 50 or 60 Hz AC sine or pseudo sine wave at the right voltage.
Switched UPS: When the AC mains is OK, your equipment is being powered by it. When the mains fails, the UPS literally switches to backup power from the invertor. This switching takes a measureable amount of time to complete and relies on your equipment's electronics to ride-through the loss of power until the switch to invertor power is complete. Advantage? Switched UPS's are generally less expensive.
Online UPS: Regardless of whether the mains power is OK or not, the UPS's invertor is already on and already supplying your equipment. When the AC mains does fail (momentary loss, glitch, blackout or brownout), it takes zero time to switch to UPS power, because your equipment was already on UPS power! Advantages? (1) Zero switching time, (2) the online UPS will feed a constant, glitch-free sine wave to your equipment at the right frequency, the right RMS voltage all the time .
-
Don't use host based RAID (Score:2)
ICP Vortex + Cheetah (Score:1)
Alternatively you could try Sun's A3500 FCAL drive arrays with the 15K cheetahs for non PC hardware.
Compaq is good. (Score:3, Interesting)
I suggest you also fix you power problem. The systems should have no idea power was lost to the building. If you are using a UPS and this is still happening, I'd find a better one.
3ware (Score:1)
Randall.
IBM ServeRaid (Score:2, Informative)
I was expecting a hassle, but it was mind-blowing to see how easy it was. The cross-platform remote management utility is a plus too.
Sun A1000 (Score:1)
http://store.sun.com/catalog/doc/BrowsePage.jht
they range in size from 75gb to 436 gb, I work for an EDU so we get almost a 50% discount on them, but they are worth every penny
Re:Sun A1000 (Score:2)
The A1000's stink. The firmware is awful; the RM6 management software is worse!
Be careful upgrading your firmware (which you need to do from time to time) -- the controller _can_ deadlock. And of course, if it does, you lose all your data, since the only copy of the LUN configuration is in the controller.
Seriously. They're crap. Built on the same crap as the A3000/3500 series. It's all old, re-branded Symbios stuff. Yuck-o.
You'd be better off getting an A5200 tray (or D1000 tray) and using the RAID-5 functions of Veritas Volume Manager instead. It actually has a shot at working
--NBVB
Re:Sun A1000 (Score:1)
I have about 10 A1000 and 30 D1000 in production
and I'll take the simplicity of the D1000 jbod configs over Raid Mangler.
Re:Sun A1000 (Score:1)
I hope your kidding.
Software RAID5 on arrays with no cache? Heavens no, it sucks. Read performance sucks pretty bad considering the number of drives involved in the stripes and write performance is worse than dreadful even on high end machines. Write performance gets *even* worse the more drives you add unless you go across arrays - even then it just sucks. It's better on Veritas than Disksuite, but not much. Mirror, don't use RAID 5 on anything other that A3x00, A1000 or T3. It's especially good on the T3 where the XORs are done on the controller and it's almost as fast as striping.
I agree though, RM6 is pretty bad but if managed properly it's deployable. I know of one of Sun's customers who threw out terabytes of A5x00 storage after the GBIC debacle - as in deposited on the pavement outside of Sun's City of London office - only to replace them with A1000's and lots of them.
Re:Sun A1000 (Score:1)
Mylex for linux raid (Score:1)
we had this exact problem! (Score:1)
Intel's site has a technical advisory dated Jan 29th, 2002 regarding drives being 'marked offline".
http://support.intel.com/support/motherboards/s
AMI MegaRAID, Mylex eXtremeRAID, ICP Vortex (Score:1)
ICP Vortex have great reputation, though I don't have any experience with them.
XML is the best place to start (Score:2)
In this situation, I use XML. I invent my own markup language that is self-consistent and describes the API of a system. I then use an XSLT processor, Apache Xalan [apache.org] to be precise, to transform the source to various other formats including: a web site, one big printable web page, PDF, and I've been thinking about writing a stylesheet for man pages as well.
The only issue with a system like this is version control of your source files, which is highly situation specific.
Re:XML is the best place to start (Score:1)
Raidtec.. (Score:1)
I think all of our Raidtecs are kitted out with Seagate drives.. anyway, check out http://www.raidtec.com for a little more information on what they sell.
One excellent solution.... (Score:1)
I've probably set up over 100 servers over the last 10 years or so, and I wouldn't use anything but Compaq Array controllers. I've never lost data because of a drive subsystem problem. I've got over 20 that I'm responsible for now, and all of them use Compaq Array controllers. They are reliable, easy to configure, well supported, and easy to maintain. The tools under NetWare and Windows work well. Most are supported under Linux. They aren't cheap,but they are simply great.
For details look here. [compaq.com]
I have worked for one large regional financial institution, and one large entertainment conglomerate, and one of the things they have in common is that both use Compaq hardware. There's a good reason - it works.
FWIW, I do not now, nor have I ever worked for Compaq, nor do I have any direct investment in Compaq.