Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage Software Linux

Experiences w/ Software RAID 5 Under Linux? 541

MagnusDredd asks: "I am trying to build a large home drive array on the cheap. I have 8 Maxtor 250G Hard Drives that I got at Fry's Electronics for $120 apiece. I have an old 500Mhz machine that I can re-purpose to sit in the corner and serve files. I plan on running Slackware on the machine, there will be no X11, or much other than SMB, NFS, etc. I have worked with hardware arrays, but have no experience with software RAIDs. Since I am about to trust a bunch of files to this array (not only mine but I'm storing files for friends as well), I am concerned with reliability. How stable is the current RAID 5 support in Linux? How hard is it to rebuild an array? How well does the hot spare work? Will it rebuild using the spare automatically if it detects a drive has failed?"
This discussion has been archived. No new comments can be posted.

Experiences w/ Software RAID 5 Under Linux?

Comments Filter:
  • by Anonymous Coward on Saturday October 30, 2004 @06:44PM (#10675185)
    Writing large files to an eight-drive RAID-5 arary will be butt slow unless you have a LOT of RAM.

    The idea is that in order to write data to any sector on one of the drives, the sectors from six of the other drives need to be read, all XOR'd together, and then the result written to the remaining drive.

    In theory, this could be done simultaneously--read from all drives at once. In practice, Software RAID and ATA isn't so good at that kind of thing. (Good hardware RAID is a different story.)

    So the idea is that those six reads will take a reasonable amount of time, every time there is a write. If you have a lot of RAM, and/or don't write really large files, it won't be a problem because all the data can be cached in RAM and the reading/writing involving the disks can be done later, at the OS's leisure. However, if you don't have a lot of RAM, or copy really big files, you'll have performance issues.

    You may not notice this for a little while, until your array starts filling up, because some implementations (not sure about the Linux software one) optimize it so that they assume unused sectors are filled with a known value, so they don't actually read from drives where the sectors haven't been written to yet (they keep a big table in memory). This is a GREAT optimization. But over time, it will get slower and slower.

    So my advice to you is to install a lot of RAM in this system, whatever the motherboard allows. At least one gigabyte, but preferably two or more.
  • Works great (Score:5, Informative)

    by AIX-Hood ( 682681 ) on Saturday October 30, 2004 @06:45PM (#10675195)
    Been doing this with 5 Maxtor Firewire 250gig drives for a good while, and regular ide drives for years before that. It's always been very stable and has had no problems with drives going bad as long as you replaced them quickly. I moved to firewire though, because it was much easier to see which drive went bad out of the set, and you could hot swap them.
  • by chrispyman ( 710460 ) on Saturday October 30, 2004 @06:46PM (#10675201)
    Generally for situations where you really need to make sure the data stays safe, I'd just stick with hardware. If you can spend that much on some harddrives, I don't see why you can't spend the money on hardware.

    Though from what I hear, software RAID on Linux works decently.
  • by suso ( 153703 ) on Saturday October 30, 2004 @06:52PM (#10675233) Journal
    I used to work at Kiva Networking [kiva.net] and we used hardware raid 5 on some machines and software raid 1 and raid 5 on others. Maybe it was just me, but the software raid 5 disks always seemed to last longer. Never much problems with it. In fact, we had more problems getting the hardware raid controller to work with Linux or with buggyness than anything.
  • by pythas ( 75383 ) on Saturday October 30, 2004 @06:57PM (#10675269)
    I have a smaller array, but it's been largely trouble free.

    However, when a drive did drop off line, unless things are on their own channel, it seems to knock off the entire IDE channel as well. It ended up taking the better part of a day to get everything back online again, without any data loss.

    It even seems like any time there's an IDE hiccup, you can knock your array offline.

    It's definitely cheaper than hardware RAID, and I haven't noticed any performance problems, but sometimes the stability of good old SCSI raid is something I miss. :(
  • Re:hmmm (Score:0, Informative)

    by Anonymous Coward on Saturday October 30, 2004 @06:57PM (#10675273)
    There are alot of things that can take up gigabytes of space. For instance:

    mp3s
    movies
    scat porn
    japanese tentacle rape dating simulators
    video captures of his nubile, young 14-year old neigbor undressing, oh to view her budding sexuality she is but a flower of femminine innocence
    MAME Roms
  • Works Great! (Score:2, Informative)

    by ogre7299 ( 229737 ) <jjtobinNO@SPAMumich.edu> on Saturday October 30, 2004 @06:57PM (#10675281)
    I've been using RAID 5 with a 3 18 GB SCSI drive setup for about 6 months now, it works very fast and reliably.

    The best advice I can give is to make sure each drive has its own channel if you are on standard ATA, you didn't specify SATA or regular ATA. If you're using SATA then all the drives get their own channel by design. If you have more than one IDE device on a channel in a RAID, performance will suffer because IDE can't write to both devices on the channel simultaneously.
  • by kgasso ( 60204 ) <kgasso@bl o r t.org> on Saturday October 30, 2004 @06:58PM (#10675284) Homepage

    > Generally for situations where you really need to make sure
    > the data stays safe, I'd just stick with hardware. If you can
    > spend that much on some harddrives, I don't see why you can't
    > spend the money on hardware.


    Truer words were never spoken. I don't know the status of the more recent software RAID implmentation in Linux, but I do know that bugs in the old one send 2 arrays in 2 different mission critical servers of ours down in a hailstorm of fire and brimstone.

    We had one drive get booted from the array for having corrupted data, so the load on the other drives shot up a bit. We think that the increased load made the software RAID driver start lagging in writes to the disks, causing more corruption on another drive, until we were down to a steaming pile of rubble.

    Happened 2 seperate times on 2 different machines, as well. We're sticking to hardware from now on.
  • Vinum with FreeBSD (Score:3, Informative)

    by Anonymous Coward on Saturday October 30, 2004 @06:58PM (#10675290)
    While it's not Linux, I've been using Vinum with FreeBSD for about 3 years with RAID 5 and have never had any problems. My current box is an old VIA 600MHz C3 with FreeBSD 4.8 and a measly 128MB of RAM. As far as benchmarks go, my RAID seems to blow away all of the cheapy hardware cards performacewise as well.

    BTW, I switched from Linux to FreeBSD for the server years ago for the stability.
  • by ErikTheRed ( 162431 ) on Saturday October 30, 2004 @07:01PM (#10675309) Homepage
    Software raid is fine for simple configurations, but if you want to "do it right" - especially considering that you just dropped about a kilobuck on HDDs, go Hardware. A good, reasonably priced true hardware RAID controller that will fit the bill for you is the 3Ware Escalade 7506-8. It has 8 IDE ports, 1 for each drive - you don't want to run two RAID drives in master/slave mode off of a single IDE port; it will play hell with your I/O performance. It's true hardware raid, so you don't have to worry about big CPU overhead and being able to boot with a failed drive (a major disadvantage to software RAID if your boot partition is on a RAID volume, certain RAID-1 configurations excepted). You can buy them for under $450. provantage.com price [provantage.com] is $423.48 (I have no relationship with them other than I've noticed that their prices tend to be decent).
  • by mortonda ( 5175 ) on Saturday October 30, 2004 @07:02PM (#10675314)
    RAID 5 hardware tends to be rather expensive, and most RAID hardware tends to be "pseudo hardware", the drivers for the raid card make the CPU do the actual work anyway. Your 500Mhz CPU is faster than all but the most expensive RAID controllers anyway.

    Stick with Linux RAID. It knows how to do it better.
  • *DO* go with 3ware (Score:3, Informative)

    by Alan Cox ( 27532 ) on Saturday October 30, 2004 @07:04PM (#10675330) Homepage
    Except for the early 7000 series they are good cards and have decent performance too. I'm very very happy with the 3ware I have even though its one of the quite early designs.
  • by Anonymous Coward on Saturday October 30, 2004 @07:05PM (#10675334)
    Another piece of advice would be, since you have eight identical drives, to use only seven drives in the RAID array, and keep the eighth one out of the array entirely, either outside the computer in an antistatic bag or as a "hot" spare--installed but idle.

    When one of the drives fails--and one of the drives will fail--this will allow you to swap in the replacement drive immediately, before another drive fails. (Remember, if two drives fail in a RAID-5 array, you lose data.) You can then return the defective drive, get a replacement from Maxtor, and when that one arrives FedEx in a few days, that one will be your new "spare."

    You can either keep your spare drive unused, outside the computer, or keep this spare "hot"--in the computer, connected and ready to go, but unused by the array or anything else, and have the array fall over to it automatically when a drive fails.

    Both ways offer advantages. If you keep the drive out of the computer, since you need to shut down to remove the bad drive, you can install the spare drive at that time. If you were to keep the drive "hot" in the meantime, your extra "new" drive has been spinning for months or years, and exposed needlessly to heat. Which increase its probability of failure, making it essentially as likely to fail as all your other drives that have been running the whole time.

    However, keeping the spare "hot" means that the array can be rebuilt sooner, in some cases automatically before you know there is a problem. This can reduce the possibility of data loss. You will have to reboot twice--once to remove the defectie drive to return to Maxtor, and once when the replacement arrives to install it as the new hot spare.

    Which of those two choices is a judgement call, but it's absolutely critical to have a spare drive on hand.
  • by Futurepower(R) ( 558542 ) on Saturday October 30, 2004 @07:06PM (#10675345) Homepage

    This is a VERY big issue. We've found that Promise Technology RAID controllers have problems, and the company doesn't give tech. support when the problems are difficult, in our experience.

    --
    Government data compares Democrat and Republican economics. [futurepower.org]
  • Re:Please! (Score:5, Informative)

    by Gherald ( 682277 ) on Saturday October 30, 2004 @07:06PM (#10675346) Journal
    > Do yourself a favour and buy some more or less cheap hardware RAID controllers. You won't regret it. Software RAID is nothing more than "showing it's possible".

    There is no such thing as a "cheap" hardware RAID 5 controller. Well there is, but they'll still set you back at least $120 and are crap.

    There are RAID controllers from highpoint and promise, et al that are card-based, but they are still CPU bound (that is where the XOR really takes place). So they're really nothing more than a controller with a driver that does the calculations in the CPU. These cards are good for booting windows to a software RAID (since that is essentially what they are) but not good for anything else.

    Most motherboards especially those with only 2 RAID ports (whether IDE or SATA) are software-based, as well. The nvidia nforce3 250 is one of the few notable exceptions.

    But the bottom line here is: Linux Software RAID 5 is a logical approach if simple redundant mass storage is your main concern, and will save you at least $120. Also note that for RAID 0/1 it doesn't really matter if you go hardware or software since they aren't very processor intensive anyway. Pure software RAID 0/1 seems to be easier to set up in Linux (less mucking around with drivers) so it often makes sense to go with it for that reason alone.
  • Hot spares (Score:1, Informative)

    by lathama ( 639499 ) <lathama&lathama,com> on Saturday October 30, 2004 @07:08PM (#10675351) Homepage Journal
    Declare at least one hot spare. I would declare two for your setup but YMMV.

    nr-spare-disks 1

    device /dev/hdh1
    spare-disk 0
  • Performance Tips (Score:5, Informative)

    by Alan Cox ( 27532 ) on Saturday October 30, 2004 @07:09PM (#10675353) Homepage
    There are a few things that really help in some cases, but RAM isn't always one of them.

    If you've got a lot of data that is read/re-read or written/re-read by clients then RAM really helps, streaming stuff which doesn't get many repeat accesses (eg running a movie editing suite) it might not help at all

    For performance its often worth sacrificing a bit of space and going RAID 1. Again depends if you need the space first or performance first.

    Obviously don't put two drives of a raid set on the same IDE controller as master/slave or it'll suck. Also if you can find a mainboard with multiple PCI busses that helps.

    Finally be aware that if you put more than a couple of add on IDE controllers on the same PCI bus it'll suck - thats one of the big problems with software raid 5 versus hardware which is less of a problem with raid 1 - you are doing a lot of repeated PCI bus copies and that hurts the speed of drives today.

    I use raid1 everywhere, disks may be cheap but you have to treat them as unreliable nowdays.
  • by Anonymous Coward on Saturday October 30, 2004 @07:09PM (#10675359)
    The idea is that in order to write data to any sector on one of the drives, the sectors from six of the other drives need to be read, all XOR'd together, and then the result written to the remaining drive.
    Um. No. Not if the RAID5 implementation is reasonably sane. Assuming all the drives are in good working order, all the software has to do is read the original block off the drive; the parity block off the appropriate drive; XOR the two values together, and XOR the new data; and you have the new parity block.

    IOW: Two reads, and two writes. Not six reads and two writes. But yes, large amounts of RAM is a good idea. Of course, if a drive goes south, everything goes out the window and your performance will be shot until you replace the dud drive and everything resyncs.

  • by Dop ( 123 ) on Saturday October 30, 2004 @07:12PM (#10675384)
    We've had two different 3ware hardware RAID cards without any problems in the last 3 years.

    I've done software RAID as well using Promise IDE controllers. Fortunately for us we never had a drive fail in the software RAID so I can't comment on how difficult it is to recover from a failure.

    Interestingly enough, we ran some fairly intense iozone tests on both the hardware and software RAIDs with very little difference in performance (maybe that's why the parent poster doesn't like the 3ware stuff). But... we also ran these same tests with a fibre-channel SAN disk, again with very little performance difference.

    Maybe it was a Bus limitation... I didn't have time to investigate it any further.
  • by brak ( 18623 ) on Saturday October 30, 2004 @07:13PM (#10675389)
    You will get responses from people with good and bad experiences, but they are all jaded by their small particular case. After seeing what can happen with dozens of machines (8 drive and 4 drive) running Linux software RAID5, here is some concrete advice.

    First, ensure that all of the drives are IDE masters. Don't double up slaves and masters.

    Secondly, DON'T create gigantic partitions on each oft he 250's and then RAID them together, you will get bitten, and bitten hard.

    Here's the skinny...

    1) Ensure that your motherboard/IDE controllers will return SMART status information. Make sure you install the smartmon tools, configure them to run weekly self tests, and ensure you have smartd running so that you get alerted to potentially failing drives ahead of time.

    2) Partition your 250GB drives into 40 GB partitions. Then use RAID5 to pull together the partitions across the drives. If you want a giant volume, create a Linear RAID group of all of the RAID5 groups you created and create the filesystem on top of that.

    Here's why, this is the juice.

    To keep it simple, let's say there are 20 secotrs per drive. When a drive gets an uncorrectable error on a sector, it will be kicked out of the array. By partitioning the drive into 5 or 6 partitions, let's say hd(a,c,e,g,i,k,l)1 are in one of the RAID5 groups, which contain sectors 1-4 (out of the fake 20 we made up earlier)

    If sector 2 goes bad on /dev/hda1, Linux software RAID5 will kick /dev/hda1 out of the array. Now, it's likely that sector 11 might be bad on /dev/hdc. If you hadn't divided up the partitions, you would lose a second disk out of the array during a rebuild.

    By partitioning the disks you localize the failures a little, thus creating a more likely recovery scenario.

    You wind up with a few RAID5 sets that are more resilient to multiple drive failures.

    If you are using a hot spare, your rebuild time will also be less, at least for the RAID5 set that failed.

    I hope this makes sense.

    My advice to you is to bite the bullet and simply mirror the disks. That way, no matter how badly they fail you'll have some chance of getting some of the data off.
  • by kcbrown ( 7426 ) <slashdot@sysexperts.com> on Saturday October 30, 2004 @07:14PM (#10675392)
    Generally for situations where you really need to make sure the data stays safe, I'd just stick with hardware. If you can spend that much on some harddrives, I don't see why you can't spend the money on hardware.

    I disagree with this. Here's why: the most important thing is your data. Hardware RAID works fine until the controller dies. Once that happens, you must replace it with the same type of controller, or your data is basically gone, because each manufacturer uses its own proprietary way of storing the RAID metadata.

    Software RAID doesn't have that problem. If a controller dies, you can buy a completely different one and it just won't matter: the data on your disk is at this point just blocks that are addressable with a new controller in the same way that they were before.

    Another advantage is that software RAID allows you to use any kind of disk as a RAID element. If you can put a partition on it, you can use it (as long as the partition meets the size constraints). So you can build a RAID set out of, e.g., a standard IDE drive and a serial ATA drive. The kernel doesn't care -- it's just a block device as far as it's concerned. The end result is that you can spread the risk of failure not just across drives but across controllers as well.

    That kind of flexibility simply doesn't exist in hardware RAID. In my opinion, it's worth a lot.

    That said, hardware RAID does have its advantages -- good implementations offload some of the computing burden from the CPU, and really good ones will deal with hotswapping disks automatically. But keep in mind that dynamic configuration of the hardware RAID device (operations such as telling it what to do with the disk you just swapped into it) is something that has to be supported by the operating system driver itself and a set of utilities designed to work specifically with that driver. Otherwise you have to take the entire system down in order to do such reconfiguration (most hardware RAID cards have a BIOS utility for such things).

    Oh, one other advantage in favor of software RAID: it allows you to take advantage of Moore's Law much more easily. Replace the motherboard/CPU in your system and suddenly your RAID can be faster. Whether it is or not depends on whether or not your previous rig was capable of saturating the disks. With hardware RAID, if the controller isn't capable of saturating the disks out of the box, then you'll never get the maximum performance possible out of the disks you connect to it, even if you have the fastest motherboard/CPU combination on the planet.

  • Re:Devil in the... (Score:3, Informative)

    by mortonda ( 5175 ) on Saturday October 30, 2004 @07:17PM (#10675409)
    the card can tell the OS everything is done being written and then flush it out of cache at its convinience.

    Which is absolutely horrible. This violates protocol - mail MTA's demand that data is written to disk before they acknowlege delivery. They get this from the confirmation from the kernel, but if the disk array lies about it, a power failure could lose data even though the kernel assumed it had bee synced properly.
  • by k.ellsworth ( 692902 ) on Saturday October 30, 2004 @07:18PM (#10675413)
    true... if your are not getting a seriuos RAID controller forget about hardware raid... is just the same of linux software raid but made by the controller bios and the controller driver.... that's why many "pseudo hardware" controllers make the array but booting into linux, linux only see's the harddisks and no array ... becuase there is no real array...

    for the trolls: a real raid controller for me is a HP/Compaq smartarray, IBM server raid, Intel RAID, some megaraid controller...

    i have a Proliant DL380g2 at my home, and it has a smartarray 5i with 32MB RAM of it's own, and a risc CPU for the array computing...

    doing any raid configuration on the machine... the system CPUs are not affected.
  • by Futurepower(R) ( 558542 ) on Saturday October 30, 2004 @07:21PM (#10675432) Homepage

    Is this the definitive article about software RAID under Linux?

    Software-RAID HOWTO [unthought.net]. In English and HTML: Software-RAID HOWTO [unthought.net].

    --
    Bush borrows [brillig.com] money to kill Iraqis [iraqbodycount.net]. 140 billion borrowed [costofwar.com]. With interest, you pay 200 billion. When Saudis attack, invade Iraq?
  • by Anonymous Coward on Saturday October 30, 2004 @07:22PM (#10675439)

    Your logic eludes me. The blocks do not need to be read, as we are in the process of writing. We already have the data, because we are writing, so why would we re-read the data?

    Unless you write across a whole row in the array, how are you going to compute the new parity without reading in something? This is the "small write problem", and it is why expensive RAID controllers have a non-volite writeback cache.

    The current kernel does read in the whole row to recompute the parity for simplicity. Technically, though, you just need to read in the block you are modifying and the parity block, making writes take 4 operations under RAID 5, but unless something has recently changed, Linux doesn't do that. A gig of RAM, however, will allow a degree of volitile write-back cache, to help offset what will otherwise be poor write performance.

  • by Anonymous Coward on Saturday October 30, 2004 @07:23PM (#10675441)
    Two pieces of advice: (1) Look into mdadm, it saved my array once when I had to move it from one server to another, (2) look into smartd as a way to monitor the individual disks and detect failures. Okay, well then, _three_ pieces of advice. (3) make sure you look into ext2/3 filesystem parameters like the size of the journal (max it out) and the -R stride= option.

    mdadm will allow a "spare pool" shared between multiple RAID devices and smartd will check the state of the disk controllers at regular intervals. You should put the system _and_ the disks on UPS to avoid losing data in the event of a power failure (the disks need to write their cache to the physical media before it evaporates). Set up something (mdadm or smartd) to email you in the event of a disk failure, or you may be running in degraded mode for quite a while before you discover it (unless you look at /proc/mdstat regularly).

    All in all it seems to work fairly well if you spread the disks across multiple channels, if you have enough RAM for page (buffer) cache, and if you get reliable disks. I have a 4-disk SCSI storage box that I have in RAID 5 mode. It has been running for over two years. The server failed and I had to move it, that is when I discovered mdadm -- A LIFE (DATA) SAVER!
  • by mortonda ( 5175 ) on Saturday October 30, 2004 @07:23PM (#10675444)
    Um, no. Since we are writing, we already know what the data is. Just write. No reads.

    RAM is helpful on ly in the sense that it can cache data and make things appear faster if the data is already available in cache. It won't really help it read/write data any faster.

    If a drive goes out, write performance is inchanged, as the XOR operation must be done no matter what.

    Read performance depends - it's just an XOR operation, which is not very difficult. A 500Mhz CPU will still be mostly idle even in degraded mode. Some implementations *could* do the XOR in regular mode too, to check for data errors, in which case, no performance is lost in degraded mode anyway.
  • by quewhatque ( 806311 ) on Saturday October 30, 2004 @07:26PM (#10675453)
    I have run software raid on Linux and FreeBSD just in testing. They seem to work without hiccup and they're stable, although I had no extensive testing. As far as hardware or software, I'd suggest hardware if you need hot-swappability and speed. It's a waste of money if speed is not an option as software raid is just little less secure; I can only see it losing data in the event of the kernel going wrong in a very unusual way or more likely the kernel crashing in the middle of a write. Increasing the ram would definetely take a lot of stress off the kernel and increase speed, but then again, I haven't seen many situations where stored files need to be quickly accessed; I can only think of low compressed, hi-quality movies. As far as swapping drives, if you set up linux properly, the drive will reconstruct correctly (given only 1 dies). If linux dies and you have to redo the OS, the configuration is stored in each disk (partition more properly) that is in the raid so none is lost except time. One of the advantages of software raid is its flexibility, it's partition based instead of disk based. It doesnt seem like much of an advantage (and it isnt). Just make sure you don't make two partitions on the same disk part of the same raid array. My best piece of advice is: make sure every drive gets its own IDE controller. A drive can slow another one down on standard ATA. Also, a drive going wrong could kill the rest of the drives on its channel.
  • by aaronl ( 43811 ) on Saturday October 30, 2004 @07:26PM (#10675454) Homepage
    There's already some issues here because these are eight identical drives bought at the same time. They are very likely manufactured by the same machines in the same factory at the same time. This increases the liklihood of multiple disc failures for a variety of reasons. This is why many admins will replace discs on a regular schedule. For example, you buy your eight shiny new discs. Then you run them for a year, and replace one, a few months later you replace another, and so on. Then every two years from then you replace that disc again.

    Also, any half-decent RAID implementation will have that hotspare in the machine with its spindle off until it is needed. So it won't have been spinning for months/years at all. Not quite as good as having it in a box as far as wear and tear, but very close.
  • Experience (Score:2, Informative)

    by Libor Vanek ( 248963 ) <libor,vanek&gmail,com> on Saturday October 30, 2004 @07:26PM (#10675455) Homepage
    As someone who built and installed ~400 systems with about 50 TB of storage capacity and ALL on Linux SW RAID5 I can only recommend it. I have bad experiences with HW RAID - when 2 or more disks fails, you can't get your data. Linux SW RAID is easy (OK, not SO easy) to be convinced to recover most of your data except the really bad ones. Also performance is really superb (with P4/Xeon/Opteron CPUs it's much higher then any HW IDE RAID can do).
  • by mortonda ( 5175 ) on Saturday October 30, 2004 @07:28PM (#10675463)
    It's a poor solution for raid, since if the OS goes, there goes your raid. If you use hardware, at least it'll autodetect.

    Um, that's bogus. If your OS goes (probably due to hardware?) then you can simply put the drive in a new computer (same basic master/slave setup) and away it goes. Linux knows how to detect its own RAID arrays!

    OTOH, if you have a hardware RAID, good luck getting tech support, especially if they no longer carry that board, or have gone out of business altogether.

    At least with Software RAID, your data is not stuck in a proprietary format.
  • by Peridriga ( 308995 ) on Saturday October 30, 2004 @07:30PM (#10675472)
  • 500 MHz? (Score:3, Informative)

    by tji ( 74570 ) on Saturday October 30, 2004 @07:32PM (#10675479)
    You may not need much CPU performance for file service.. after all, it's mainly just doing DMA to/from disks. But, I assume it's just your standard PC motherboard, with a single 32bit 33MHz PCI bus.

    If you're spending $960 for the disks at Fry's, why not spend another $80 to $250 at that same Fry's and get a current generation motherboard and CPU (they have package deals that are dirt cheap).

    For $80, you can get a 5x faster processor, and a much newer chipset with ATA133 and Serial ATA.

    For $250, you can get a board with multiple PCI busses, PCI-X and a chipset capable of handling much more throughput than a cheap PC motherboard.

    The I/O bandwidth will be your bottleneck with an 8 drive RAID array. The standard 32bit / 33MHz PCI bus only does about 1Gbps. Serving a gigabit ethernet connection will use all your bandwidth by itself.. when you have 8 ATA drives fighting the NIC for bandwidth, you can see a clear problem.

    If you're spending that much for the drives, don't hamstring it by skimping on the motherboard. And, in any case, once you have a Linux box installed, you inevitably start using it for many tasks (caching proxy, mail server, ftp server, dns server, www server, etc). So, a beefier system will stand up better.
  • by ErikTheRed ( 162431 ) on Saturday October 30, 2004 @07:35PM (#10675485) Homepage
    The Promise controllers are SHRAID, which is my own non-standard acronym for Software w/ Hardware-assist RAID or SHitty RAID in less polite company. And the "promise" of true redundancy is a charade (rim-shot, please). Basically, you have all of the disadvantages of software RAID - the need to manually configure bootability of both drives (assuming you're running RAID 1 or RAID 0+1 - if you're running RAID5 or JBOD it's an even bigger pain), plus the need to have specialized drivers on the OS, etc. These controllers (Promise, Highpoint, etc.) should be avoided like the plague for technical reasons alone.

    Good, relatively inexpensive IDE and SATA RAID can be had with 3Ware Controllers [3ware.com]. 2-drive models start around $140, and they support up to 12 drives on their more expensive controllers. The drives appear as a single physical device to the O/S, whether it's Windoze, Linux, BSD, DOS 3.1, etc.
  • by dbullock ( 32532 ) on Saturday October 30, 2004 @07:41PM (#10675517) Homepage
    So my advice to you is to install a lot of RAM in this system, whatever the motherboard allows. At least one gigabyte, but preferably two or more.

    There's a reason this is posted anonymously. It's absurdly incorrect. Disregard and move on.
  • by mikej ( 84735 ) on Saturday October 30, 2004 @07:44PM (#10675531) Homepage
    To answer your actual question, whether or not the linux kernel's software RAID implementation is safe... "yes". I used it in production for NFS fileservers as far back as the 2.2 series; it performed wonderfully under high load then and has worked just as well when I've used it off and on since, both in production and on test systems. There are lots of suggestions elsewhere in the thread about things to avoid - multiple devices on the same IDE channel is the big gotcha: don't do it, its performance is particularly horrific during array reconstruction, just when you need it to run as fast as it possibly can. Keep those suggestions in mind when you build the system, but you can categorize the RAID implementation itself as more than sufficiently reliable.

  • by ErikTheRed ( 162431 ) on Saturday October 30, 2004 @07:47PM (#10675539) Homepage
    This is slightly off-topic because it won't take care of the particular solution being sought, but another interesting way to do RAID-1 is using the controllers from Arco Data Protection. The have some that are physically connected between your IDE or SATA controller and the two drives to be mirrored - they just seamlessly mimic a single IDE device. This makes it possible to RAID-1 any IDE or SATA drive under any operating system or device. I've used them in places like phone systems and voice mail systems that have no provisioning whatsoever for RAID. It can take a little bit of case tweaking, and you have to be sure the power supply can handle it, but it's an interesting solution in certain situations where nothing else can do the job.
  • by rimu guy ( 665008 ) on Saturday October 30, 2004 @07:51PM (#10675565) Homepage

    I manage a lot of servers remotely. I started out using the hardware RAID support on my server's mobos. But there were issues with that.

    First, it was hard getting Linux driver support (I think drivers were available, but it was a matter of downloading them. And I don't beleive they worked on the 2.6 kernel's I used).

    Then the RAID setup required BIOS settings. When you only have remote access to a server (and no KVM-o-IP) that means you need to work through a tech at the DC. Not, umm, ideal.

    And finally, there was the issue of 'what if I need to move these disks to a different server'. One that doesn't have the same raid controller. Well, it wouldn't work.

    Anyway, I ended up using software raid. I've used it now on a few dozen servers. And I'm really happy with it. Performance seems fine, albeit I'm not using it in really IO critical environments like a dedicated database server. In in 99% of cases I'd now use software raid in preference to hardware raid.

    What follows are a few tips I'd like to pass along that may be a help with getting a software raid setup...

    If you get the chance setup RAID on / and /boot via your OS installer (on a new system). Doing it afterwards is a real pain [tldp.org].

    Build RAID support and RAID1,and RAID5 into the kernel (not as modules). You'll need that if you boot from a raid1 boot partition. Note: if you are using RAID5 you'll need RAID1 built in (since I beleive in the event of a failed disk the raid personaility swaps from RAID5 to RAID1).

    With a 2.6 kernel build I've been getting "no raid1 module" errors at the make install phase when building with a RAID-ed / or /boot. The 'fix' is to compile the RAID support you need into the kernel (not as modules) then run: /sbin/mkinitrd -f /boot/initrd-2.6.8.1.img 2.6.8.1 --omit-raid-modules (substituting your kernel image name/version).

    Every now and then I've had the kernel spit a drive out a raid array. I've found that sometimes the kernel may be being overly cautious. You can often raidhotremove then raidhotadd it back again. And you may never see a problem again. If you do, it probably really is time to replace the disk.

    Rebuilding a RAID array goes smoothly. It happens in the background when the Linux machine is in multi user mode. The md code rebuild guarantees a minimum rebuild rate. From memory it takes about an hour or two to do a 200GB RAID1 array.

    You can see the RAID rebuild status in /proc/mdstat. I run a very simple script [rimuhosting.com] to check the RAID status each day and send out an email if it is broken.

    If you are using a RAID-ed /boot, grab the latest lilo [rr.com] since IIRC it has better RAID support than what is in the distros I use.

    Hard drive-wise I've been happy with Seagate Barracudas. I've had to replace a few failed Western Digital drives. (Just my recommendation from experience, it could just have been good/bad luck on my part).

    One neat trick with Software raid is that your drives don't have to be the same size. You do RAID on partitions. And your raid array sizes itself according to the smallest common denominator in the array.

    Tip: always create a bit of spare space on any device you are RAID-ing. e.g. a 4GB swap partition. Then if you have a drive fail and it needs to be replaced, and your replacement varies in size slightly you'll still be able to use it. Not all 40/120/200GB drives are created with equal sizes :).

    In summary: Software RAID=good. Decent performance. I've had no real kernel bugs with it. No need for BIOS access. Easy to move drives between servers. Easy to monitor failures. Non-intrusive/minimal downtime when recovering a failed devi

  • by Cynbe ( 96657 ) on Saturday October 30, 2004 @07:52PM (#10675571)
    I converted nearly our whole house network to software raid at the start of this year: Big RAID5s for our fileserver, backupserver, netserver and videoserver, and smaller RAID2s for our firewall and workstation boxes.

    Overall, I'm very happy with it -- no more rebuilding from scratch every time a boot disk blows!! :) I'd started converting to SCSI boot disks everywhere, but a pair of software RAID2 IDE drives gives me a much better sense of security. My workstation did in fact blow a boot drive a month ago, and rather than being an emergency, I just ambled into Fry's after a week or two and bought a replacement and rebuilt the raid. No muss, no fuss. Feels like living in the third millenium!

    I did learn various things the hard way that the HOWTOs don't warn of.

    Note that you can't boot off RAID5, only RAID2. The hack they mention of putting /boot on RAID2 and everything else on RAID5 is not worth it with today's drive sizes. Give yourself a 2-16GB RAID2 with a complete bootable system on it, and save yourself mucho grief at very little proportionate cost in disk space.

    As of kernel 2.4x, at least, the linux software RAID5 autorecovery is workable but less robust than one might like in the face of serious problems: I had one RAID5 setup totally destroyed because the hardware was flaky leading to constant reboots while RAID5 reconstruction was just underway. After awhile the kernel got confused about the order of the disks (which shouldn't matter, but apparently did) and the whole thing went into a Death Spiral. Lesson: If you're sure the problem is just one flaky disk, feel free to just swap in another and reboot. But if you are in any doubt, play it safe: Switch off RAID autodetect first thing (by fdisk'ing the partition type from FD back to 82. Get the hardware stable, rebuild the RAID by hand, then switch everything back to FD.

    RAID5 is a comparative pain in the ass to work with vs RAID2, because under RAID2 any of the partitions can be mounted normally as a non-RAID drive in an emergency, getting you back on the air fast, but not so with RAID5. (You'll want a live Linux CD with a RAID-supporting kernel, likely. Knoppix &tc don't yet ship this way.) So only use RAID5 if the extra space really matters -- the big servers.

    BTW: One of the reasons I like software RAID over hardware: If you have hardware RAID and the controller blows and you can't find a matching model, you may be stuck reverse-engineering their RAID scheme to recover your data. No worry about that under software RAID.)

    I tested automatic failover to hot spare disks under the kernel, and it worked perfectly for me in a handful of tests. For whatever that's worth.

    Do keep an eye on /proc/mdstats readout of your RAID system health. If you're asleep at the wheel and don't notice anything until enough disks fail to bring the whole system down, you haven't gained much. I have a crontab-driven set of Perl scripts which check all sorts of things weekly to minutely and email me if they look wrong: Checking for failed RAID drives is one of the things they do. If you don't have a comprehensive solution like that, the raidtools2 package has an ad-hoc solution specifically to email you on drive failure. USE IT.

    FWIW, here's the system I've evolved for partitioning disks in such systems:

    • First partition: One cylinder (the innermost one): Ext FS containing a THIS_DISK file in which I record when and why I bought the drive and any interesting history it has had. In an emergency when you're suddenly shuffling eight hot drives plus a couple spares plus a dead and replacement motherboard &tc, you WILL lose track of which disk was doing what. This little partition will save you a lot of grief.
    • Second partition: Swap, in the outermost (lowest numbered) cylinders -- because these give the fastest transfer rates, up to about a 50% advantage. Putting swap on every disk lets the kernel stripe swaps across
  • by mprinkey ( 1434 ) on Saturday October 30, 2004 @07:53PM (#10675576)
    I have build at least two dozen software RAID5 boxes over the past few years. Usually Promise controllers, Maxtor drives. Performance is generally pretty good. Here are bonnie numbers for my 1.2 TB media server (five Maxtor 300 GB drives in Software RAID5). These numbers are a little slower then other systems because it uses an Athlon motherboard. I have found that Intel chipset boards generally give read performance ~100-140 MB/sec.

    [root@media root]# more bonnie20.log
    Bonnie 1.2: File '/raid/Bonnie.27772', size: 2097152000, volumes: 10
    Writing with putc()... done: 14517 kB/s 83.2 %CPU
    Rewriting... done: 25060 kB/s 17.1 %CPU
    Writing intelligently... done: 41987 kB/s 29.5 %CPU
    Reading with getc()... done: 18830 kB/s 96.1 %CPU
    Reading intelligently... done: 82754 kB/s 62.2 %CPU

    Using an older processor/motherboard is probably not a huge concern. I've used 300 MHz Celerons before. Of course, your performance might not be as high as this, but if you are using this as network attached storage (NFS or SMB), you will likely be limited to 12 MB/sec due to fast ethernet. If you have (and need) gigabit transfer speeds, you should probably use a better motherboard/CPU.

    Lastly, remember that you shouldn't skimp on power supplies and an UPS that automatically shuts the system down. The *only* data loss I have ever had on raid5 arrays came because of power-related issues. Heed my warning! 8)
  • by Anonymous Coward on Saturday October 30, 2004 @08:03PM (#10675630)
    Consider one row in a 7 drive array. It has 6 data blocks and one parity block:

    D1 D2 D3 D4 D5 D6 P1

    where P1 is the XOR of D1 through D6.

    If I write to D1, but leave any of D2-D3 alone, then it is necessary to read SOMETHING in order to calculate the new parity. Yes, I know what I'm writing, but unless I overwrite the whole thing, I must perform extra operations in order to update the parity block correctly. These extra operations degrade performance, and are known as the small write problem. As another AC above said, the update can be done with two reads and two writes; read the old D1 and the old P1, then write the new D1, and write P1 to be (old D1 XOR new D1 XOR old P1). It's a bit of trickery, but it does give the correct parity block. It does, however, take two reads and two writes, to update the one block.

    Linux (last I looked) doesn't do this. Instead it takes the simpler approach of reading the blocks in the row that it isn't updating (D2 through D6 in this case), and then computing P1 as the XOR of D1 through D6 again.

    The small write problem is a big deal. Although the IOs can happen in parallel, the latency for the write becomes the maximum of the reads plus the latency of the parity write. The larger number of IOs also keep the array busy when it could be doing other things, which degrades the performance of those other operations. And it causes this performance degregation for small updates (those under the stripe size), amoung the most common operations. Even if all of your files are big, and written in a streaming manner, the metadata updates are generally in a different row in the array, and are small, isolated writes. A journaled file system, depending on how it is implemented, can be much worse for generating lots of scattered writes. Here [cmu.edu] is a paper from CMU that gives one possible solution (one that isn't implemented by Linux). The traditional solution is write caching--you delay the write until either you've updated the other entries in the row, you've read the other entires in the row, or it is otherwise convienient to do the update (i.e. array isn't busy). This is of course dangerous because your data isn't on disk but in RAM. OTOH, witha good UPS, loosing the contents of RAM is a relatively rare event. To sidestep the volitility of RAM entirely, nice HW raid controllers have some amount of non-volitile memory (either NVRAM or battery backed DRAM) for this purpose. Writeback caching can also help perfomance on non-RAID devices, since it allows you to reorder the writes to minimize head seeks and rotational latency. These two, especially the head seeks, are what make disk IO slow.

    You obviously don't have much background in storage. Try reading Chen's classic paper [psu.edu] on RAID, go and search for a few papers the reference it, and then come back and spout off. Until then, quit giving people bad advice.
  • by mprinkey ( 1434 ) on Saturday October 30, 2004 @08:03PM (#10675637)
    Sorry to reply to my own post. More information...avoid putting master and slave on the same port. Sometimes, if one of the drives goes, it will whack the entire port and drop out the other drive. In raid5, this is bad though unrecoverable. It might require you to manually rebuild (mkraid --secret-option) to get the data back after replacing the drive. That is a scarely situation that can be easily avoided by only using one drive per ide port.

    That information may be (and probably is) outdated with regard to SATA. I don't have experience with them yet, though I will be building four 1.75 TB RAID5 (or 1.5 TB RAID6...Linux 2.6 willing) arrays next month that use 250 GB SATA drives.
  • Comment removed (Score:5, Informative)

    by account_deleted ( 4530225 ) on Saturday October 30, 2004 @08:12PM (#10675704)
    Comment removed based on user account deletion
  • by Anonymous Coward on Saturday October 30, 2004 @08:15PM (#10675725)
    Just to follow up on this a little. I used to work with a lot of 1-2TB based 3ware systems. Typically the reason why drives would drop is due to a either a firmware issue on the 3ware or more likely a firmware problem on the Maxtor (I have heard of a similar issue with WD drives). The firmware on the Maxtor would tell the drive to spin down for some reason never to return, but this issue only happened with 3ware based systems. While the 3ware said it was a bad drive it was actually fine and just needed new firmware. While the 3ware is a great solution to get TBs of data on the cheap they are quite slow when it comes to Raid5 especially when rebuilding the array if you do loose a drive. They are great if you are not running a "mission critical" system, but even then the hassle of IDE raid is not worth the savings when you need to manage 10+TB of data.
  • by barc0001 ( 173002 ) on Saturday October 30, 2004 @08:48PM (#10675908)
    For a little more (well, maybe more than a little) than the amount of coin a lot of RAM will cost you, go get a 3Ware 8 port RAID card instead. I run one, and it kicks ass. I see the one we got (Escalade 7506-8) on Pricewatch for $366. The RAID is fast, fault tolerant, and has a little web interface to let me know its status. I've currently got the drives configured as 7 in a RAID 5 with the 8th as a hot standby. I am very pleased with it.

  • by SealBeater ( 143912 ) on Saturday October 30, 2004 @08:49PM (#10675911) Homepage
    Just to insert my 2 cents into this, I have a 4 disc 750 raid 5 SATA array, on
    a PIII with 128 megs of ram. LVM on top of the array, and I have never run
    into a problem with serving files via NFS or SMB. More ram is always nice, of
    course, but again, I have not ran into any problems.

    SealBeater
  • by Codename_V ( 813328 ) on Saturday October 30, 2004 @08:52PM (#10675931)
    Fine. Didn't realize you slashdot types were so untrusting. =) At my work we initially went with about 12 systems all with assorted 8 port or 4 port 3ware cards. Out of the 12 systems, only 3 of the RAIDs are currently in working order, and I expect they'll fail shortly. Sure they work great at first, but my idea of a nice RAID 5 setup is that when one drive fails you pop a replacement in and you're good to go. With 3ware cards, a drive fails, you reboot your system, and then you can't even boot up until you pull the 3ware card, or pull all the drives off of the card. I've tried and tried to slowly add the drives back but I always seem to end up with an unbootable system. I'll tell you, I'm 100 percent happier with the ide to scsi RAID boxes we now go with.
  • by AaronW ( 33736 ) on Saturday October 30, 2004 @09:26PM (#10676066) Homepage
    After months of problems with DMA timeouts and lockups caused by using a Highpoint RAID controller and a Promise IDE controller I finally bit the bullet and bought a 3Ware Escalade controller. All the sudden, everything is completely stable.

    Do yourself a favor and get a good hardware raid controller and make sure it has good Linux support. Promise sucks. They advertise Linux support on the box - they lie, only with specific 2.4 kernels. 3Ware has good driver support for Linux included with the Linux kernel source code.

    -Aaron
  • by BawbBitchen ( 456931 ) on Saturday October 30, 2004 @09:32PM (#10676100) Homepage
    I have been using RAIDFrame under OpenBSD for about 2 years now. Never had any issues. From dmesg:

    wd1 at pciide1 channel 0 drive 0:
    wd1: 16-sector PIO, LBA48, 156334MB, 320173056 sectors
    wd2 at pciide1 channel 0 drive 1:
    wd2: 16-sector PIO, LBA48, 156334MB, 320173056 sectors
    wd1(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 5
    wd2(pciide1:0:1): using PIO mode 4, Ultra-DMA mode 5
    pciide1: channel 1 configured to native-PCI mode
    wd3 at pciide1 channel 1 drive 0:
    wd3: 16-sector PIO, LBA48, 156334MB, 320173056 sectors
    wd4 at pciide1 channel 1 drive 1:
    wd4: 16-sector PIO, LBA48, 156334MB, 320173056 sectors
    wd3(pciide1:1:0): using PIO mode 4, Ultra-DMA mode 5
    wd4(pciide1:1:1): using PIO mode 4, Ultra-DMA mode 5
    Kernelized RAIDframe activated
    raid5 (root): (RAID Level 5) total number of sectors is 960429504 (468959 MB)

    The setup is very simple.

    # cat /etc/raid5.conf
    START array
    1 4 0

    START disks /dev/wd0a /dev/wd1a /dev/wd2a /dev/wd3a

    START layout
    32 1 1 5

    START queue
    fifo 100

    It has been quite stable. The box is a 1Ghz AMD with 256MB of ram.

    I tried running the same setup under Linux (Gentoo & Slackware). The software RAID would crack under load and fail a disk. I really would give OpenBSD a try with RAIDFrame. You have to recompile the default kernel with RAID suppot but under BSD it is very simple. CVS the source down and:

    cd /usr/src/sys/arch/$ARCH/conf
    cp GENERIC RAID

    vi RAID and add:

    pseudo-device raid 9 # RAIDframe disk driver (Make the number, 9 in this case one more then the number of disc you are using)

    option RAID_AUTOCONFIG

    save the file and:

    config RAID

    cd ../complie/RAID
    make clean; make depend; make
    cp /bsd /bsd.old
    cp bsd /bsd

    and reboot.

    Then just read the main pages for raidctl to see how to set it up (hint: look at my raid5.conf above).

    Hope this helps.

  • by tf23 ( 27474 ) <tf23@lottad[ ]com ['ot.' in gap]> on Saturday October 30, 2004 @09:39PM (#10676137) Homepage Journal
    I had two drives go bad in my RAID 5 and I was screwed

    I take it you didn't have a drive sitting waiting as a hot-spare?

    I got bit by this once. Never again.... now I always have a hotspare waiting to jump into place for an instant rebuild.
  • by itzdandy ( 183397 ) on Saturday October 30, 2004 @09:44PM (#10676164) Homepage
    in linux, raid5 is a very solid and fast solution. even with a 500Mhz, it is faster than all but the most expensive hardware cards, as most cards have a 133mhz chip or even less.

    also, software raids are hardware independent. they can be modified easily while booted and without rebooting. if a hot-swapable drive is used, downtime can be eliminted by a hot-swap and a rebuild of a failed drive.

    also, i have been in a discussion about the new cachefs patch in rescent mm kernel patches(or maybe nitro?), allowing you to use a cache in ram with any filesystem, so you could mount your raid array through the cachefs with a given amount of RAM for write cache :) should give a nice performance boost on many systems. this patch is designed to improve transfering files on networks but is show to work equally well for local devices.

    AND, linux software raid works on a per-partition basis, so you can mix and match drive sizes without wasting space. 8 250GB drives can mate up with 4 300GB drives, and then the wasted 200GB can be made into another array.

    you can easily add IDE cards and increase the size of your array.

    you can spread your array over a large number of IDE cards for better redundancy, no single card will criple your array, and IDE cards are much cheaper than hardware raid cards.

    LINUX can be booted from a software raid! while is has trouble on some hardware raids!(driver issues)

    i run a software raid5 over 12 seagate 120GB drives with no problems. i get great transfer speeds accross the (gigabit)network and it's easy to manage drive spindown because the system sees each individual drive while hardware raid solutions typically only allow the system to see the array as a single device.

    most hardware arrays are mainly configured at boot time. to build or repair an array, your system will not be working. if you run a linux fileserver/firewall, your firewall doesn't function on hardware raid rebuild, while it does in software.

    --

    though i would go with a faster processor, you should have very good luck, reliability and performance from an 8 device software raid5. and have a nice 1.7TB array
  • by patniemeyer ( 444913 ) <pat@pat.net> on Saturday October 30, 2004 @09:47PM (#10676175) Homepage
    I am very happy with my linux / 3Ware 4 port raid card combination. It makes it brain dead simple and takes linux out of the loop of things that could trash the raid. I even forgot to install the *drivers* for the raid in the initial install and it all just worked fine... because the box thinks it's one big magical drive. (The drivers were only necessary for monitoring...)

    Spend the extra $200 on a 4 port card... put a *big* fan on the drives because that's the #1 killer and you'll be happy.

    Pat
  • by Skuld-Chan ( 302449 ) on Saturday October 30, 2004 @09:49PM (#10676191)
    It worked okay until one day I had to reboot my file server (moved locations) and I couldn't get the raid to come back up. I lost all my data :(. The bad part is - when it came to forums, irc and generally trying to get help there really wasn't any, a good amount of the documentation out there and the troubleshooting information is for the older tools. I generally believe that when it comes to your data you can only trust tools you can actually support - software raid for all intents and purposes seems highly alpha/beta.

    Anyhow I bought a 3ware 7450 Raid controller and haven't looked back - its brutally fast (over 20-30 megs a second in a sequental write), fully supported in linux and it a piece of cake to setup.

    Its not bad at recovering either - I had a power failure and the ups failed later on - machine restarted of course when the power came back on and the 3ware controller automatically rewrote all the parity on the disks - everything was fine. While it wrote the parity the system was up and running instantly (raid was in a fail state of course).
  • For those looking (Score:3, Informative)

    by phorm ( 591458 ) on Saturday October 30, 2004 @10:02PM (#10676253) Journal
    smartctl often comes as part of the package "smartsuite." For Debian users there is an apt package available under that name as well.
  • by anti-NAT ( 709310 ) on Saturday October 30, 2004 @10:16PM (#10676321) Homepage

    You can get smartd to execute tests automatically, using the -s option.

    In my smartd.conf file, I have :

    -s (L/../../7/03|S/../.././05)

    on the device lines, which means do a weekly online long test at 3 am Sunday, and a daily online short test at 5 am every day.

    mdadm running as a daemon, and watching the md arrays is also a good idea.

  • Fine (Score:3, Informative)

    by captaineo ( 87164 ) on Saturday October 30, 2004 @10:29PM (#10676383)
    I have a 160GB Linux software RAID-5 consisting of three 80GB disks, running 24x7 for years now. (when I built the RAID, 80GB was the largest disk capacity you could buy :).

    No problems at all. I once had an IDE controller fail - I replaced it (had to reboot of course), and Linux rebuilt the array automagically.

    I have not tried using a hot spare.

    Warning: a lot of the documentation out there on the web about Linux software RAID is very out of date. If you go this route, DEFINITELY buy the book "Managing RAID on Linux" (O'Reilly). Also be prepared to compile the "raidtools" package, which you need to set up arrays.

    I have since added an 8-disk system based on 3Ware's 9000 series SATA RAID controller. I recommend 3Ware for higher-performance systems. (I have 8 250GB disks in a single 1.6TB RAID-5, I get about 180MB/sec read, 90MB/sec write.)
  • A few other hints (Score:3, Informative)

    by anti-NAT ( 709310 ) on Saturday October 30, 2004 @10:34PM (#10676407) Homepage

    If you run smartmontools, you can configure smartd to not only monitor the SMART status of the disks, but also execute online tests - have a look at the "-s" option of smartd. For my RAID1 array, for each device, I have -s (L/../../7/03|S/../.././05) entries.

    mdadm also has a daemon mode which can monitor the arrays, and if there are any failures, send an email to a designated email address.

  • by dspeyer ( 531333 ) <dspeyer&wam,umd,edu> on Saturday October 30, 2004 @10:38PM (#10676424) Homepage Journal
    One other thing that's critical is to monitor status carefully. One of the points of RAID is transparent failure recovery, but the Linux version is too transparent: you can lose a drive and not notice it at all, until a second drive goes (third if you've set up a hot-spare as described) and then you're in trouble.

    Probably the best move is to have a cron job examine /proc/mdstat and e-mail you if it's troubled.

  • by jbn-o ( 555068 ) <mail@digitalcitizen.info> on Saturday October 30, 2004 @11:25PM (#10676630) Homepage
    I have used raidweb.com enclosures in the past and they work quite well. They handle all the RAID configuration inside the box and appear as one drive to the host (hence the boxes are totally host independent). The connection between the box and the host is SCSI and I've used off-the-shelf high-end SCSI controllers for this. Their boxes have redundant fans and power supplies. They sound like a jet taking off, but my experience is that they're stable and rock solid. They're rack mountable too.

    The only big disadvantage I experienced at the time was the lack of docs on the serial controller, so I only had the audio buzzer signal to go on when a drive failed. I think the box would have sent a signal over the serial link to the host indicating a failure. Then the host could do something interesting with that signal like send e-mail, call a pager, and so on. It would have been nice to have remote signaling, but in this case I didn't need it. The install site always has someone there to handle taking out the bad drive and plugging in the cold spare.
  • by jdibb ( 762911 ) on Saturday October 30, 2004 @11:31PM (#10676655)
    You don't have to read the data from ALL the other drives to do a Raid 5 write, but you do have to read the data that's being overwritten, and you do have to read the parity that corresponds to the data that's being overwritten. Unless, the data you are writing on a specific IO spans most or all of the data drives, then it might be cheaper to read the data that you are not overwriting and create a whole new stripe of data. I AM (and have been for 13 years a RAID developer), but not for LINUX.
  • gigabit fiber (Score:3, Informative)

    by bani ( 467531 ) on Saturday October 30, 2004 @11:34PM (#10676664)
    why? gigabit copper is just as fast and ~50x cheaper.
  • by apdt ( 575306 ) on Saturday October 30, 2004 @11:57PM (#10676745)
    Probably the best move is to have a cron job examine /proc/mdstat and e-mail you if it's troubled.

    Or you can just have mdadmd (pard of the mdadm [freshmeat.net] suite.(comes with my distro (SuSE 9.1))) running, and it'll monitor your raid arrays, and email you when there's a problem.
  • Re:Experience (Score:5, Informative)

    by flsquirrel ( 115463 ) on Sunday October 31, 2004 @12:13AM (#10676798)
    I got a newsflash for you. If you're using raid 5 and you drop more than one drive at the same time, you're done period. It doesn't matter if you're running hardware or software raid. It's the way raid 5 works. RAID 5 can recreate the info from any one drive. But if it loses two drives, it can't determine both variables. There is no convicing about it. If two drives die, so does your array. Do not pass go. Do not collet $200.

    It scares me that they let people like you play with the sort of computing resources that have 50TB of disk space.
  • by k12linux ( 627320 ) on Sunday October 31, 2004 @12:23AM (#10676841)
    Your logic eludes me. The blocks do not need to be read, as we are in the process of writing.

    Unless you write across a whole row in the array, how are you going to compute the new parity without reading in something?

    Everything you need to write is already in RAM except the checksum black. So if you have a 7-drive RAID5 array, the RAID subsystem can take 6 blocks of data, compute a parity block from them then write one block to each drive. It's not like it is going to write random sized chunks of data and can not tell what is going to be written without actually writing it to disk.

    Even hardware RAID cards typically don't have a lot of RAM. They also don't write to the drives, re-read what they have just written and create a parity bit from it. Neither does software RAID in Linux.

    The only time the system should need to read in order to generate a block is when it is rebuilding after a drive has been replaced.

  • by tylernt ( 581794 ) on Sunday October 31, 2004 @12:41AM (#10676924)
    I've stuck 4 7,200rpm IDE drives in a case... and promptly killed two of them within days. I had to add a rear exhaust fan to the case, a PCI slot blower, and I removed the blanking panels in front of the drives to get more airflow. The drives now stay merely warm to the touch (instead of HOT), and the drives have been fine ever since.

    8 of those suckers are going to get toasty without plenty of auxilliary cooling.
  • by k12linux ( 627320 ) on Sunday October 31, 2004 @01:39AM (#10677170)
    Actually, I can see some sense to that. He did mention failing during rebuild. That's when we are at the greatest risk of another failure after all since they are working harder than normal.

    If you have one large partition and impending drive failure wipes out any cylindar on that drive, all the data on it is shot. That drive won't be used at all during the rebuild... a rebuild of 250Gb. You are at risk if, during any time of the long rebuild, a 2nd drive fails completely or even coughs up a bad cylindar which can't be redirected.

    If you have 6 partitions, only the "damaged" one has to be recovered immediately. Obviously you would want to recover them all as soon as possible since that first drive is probably going to bite the dust soon. Even if you do lose the first drive completely and then a 2nd drive during a rebuild, you at least may not lose everything. Any of the 40 Gb blocks which were rebuilt before the 2nd drive died would have been saved.

    Getting a slightly different sized drive for an RMA can also be a problem. What if your original 250Gb drives were actually 250.3Gb and the replacement is 250 Gb even? You aren't going to fit that single 250.3 Gb partition onto the replacment drive. And are you going to call the drive manufacturer and complain that your original drives were too big?

    I've had issues with this on hardware RAID. I had to back up 600Gb over the network, wipe the entire array out, rebuild it and restore the data. If it had been software RAID, I could have backed up the data from the last partition into one of the others just to be safe, resized the last one, reformatted and copied the data back.

    LVM with multiple partitions would have made it even easier.

  • by photon317 ( 208409 ) on Sunday October 31, 2004 @01:48AM (#10677198)

    Don't forget that hardware raid is a single point of failure. The best solution for the absolute best redundancy and performance is software raid set up to be fault tolerant of controller failures. For example, put two seperate scsi cards in the box, and software mirror your data between them, and then stripe on top of that for added performance if you have the drives. When using striping and mirroring together, always mirror at the lowest level, then stripe on top of that.

    The basic idea is:

    C == controller
    D == disk
    R == virtual raid disk

    C1 --> D1,D2,D3
    C2 --> D4,D5,D6

    R1 = mirror(D1,D4)
    R2 = mirror(D2,D5)
    R3 = mirror(D3,D6)

    R4 = stripe(R1,R2,R3)
  • by XNormal ( 8617 ) on Sunday October 31, 2004 @02:03AM (#10677232) Homepage
    Suddenly, you've got nearly 2 TB of data that is completely unreadable by normal controllers, and you can't replace the broken one! Oops!

    This is also a good reason to use mirroring rather than fancier schemes like striping or RAID-5, if you can afford the capacity hit. You can always mount the drive individually.
  • by rpwoodbu ( 82958 ) on Sunday October 31, 2004 @02:46AM (#10677344)

    This logic doesn't hold. Let's first talk about the performance.

    Also, on any reasonably modern system, the software RAID will be faster. You just have a much faster processor to do the RAID processing for you. The added overhead of the RAID5 processing is nothing compared to a 1-2GHz processor.

    The actual RAID processing is relatively easy, and any RAID solution, be it hardware or software, that is worth anything will not have any trouble doing the logic (perhaps the cards mentioned are indeed not worth anything). The processing isn't your limiting factor; it is data thoughput. This is where hardware shines. A lot of extra data has to be shipping in and out to maintain and validate the RAID. This can easily saturate busses. A hardware solution allows the computer to communicate only the "real" data between itself and the hardware device, and then allows that device to take the burden of communicating with the individual drives on their own dedicated busses. Sure, that device can become overwhelemed, but I submit to you that if it does, it was poorly designed.

    I am not saying that one shouldn't consider software RAID solutions. Just don't consider them because you think the performance will be better.

    Now lets talk about data recovery.

    I've lost 4 drives out of a 12 drive system at the same time, and Linux has let me piece the RAID back together and I've lost nothing. Was the machine down? Yes. Did I lose data? No. Compare that with a 3ware hardware RAID system where I lost 2 drives. Even thought I probably could have salvaged 99% of the data off that array, the 3ware just would not let me work with that failed array.

    Let us be clear: we are talking about RAID5. In RAID5, you simply cannot lose more than one drive without losing data integrity. And it isn't like you can get back some of your files; the destruction will be evenly distributed over your entire logical volume(s) as a function of the striping methodology. So it is quite impractical to recover from this scenario. I don't know what kind of system was being employeed with this 12-drive array that can withstand a 1/3 array loss, but it certainly wasn't a straight RAID5. I can come up with some solutions that would allow such massive failure, but then we aren't comparing apples to apples. I'd be very interested in knowing what the solution was in this example case. It should also be noted that we don't know how many drives were in the system that lost 2 drives, much less what kind of RAID configuration was being used. No conclusion can be derived from the information provuded.

    As an aside, more often than not, when we as individuals want a large cheap array, we are less concerned about performance than reliability. We put what we can into the drives, and we hope to maximize our data/$ investment while minimizing our chances for disaster. A software RAID5 is a good solution. Some posts have said that if you can spend so much on the drives, what's stopping you from spending on a nice hardware controller? I submit that perhaps he's broke now! And besides, a controller that can RAID5 8 drives is quite the expensive controller indeed. This has software RAID written all over it.

  • by Futurepower(R) ( 558542 ) on Sunday October 31, 2004 @04:40AM (#10677693) Homepage

    I bought one of these from Newegg. I had a lot of problems with it. I called Silicon Image technical support. They told me that particular chipset did not work correctly, and they would not release working firmware for it.

    I told Newegg about this, but they continue to sell them.

    Fry's sells them also. I told a Fry's manager that Silicon Image told me they know they don't work correctly. Fry's still sells them.

    I would love to find a technically knowledgeable and honest distributor.
  • My experience (Score:3, Informative)

    by Mike Markley ( 9536 ) <.moc.kcahdam. .ta. .kcahdam.> on Sunday October 31, 2004 @04:41AM (#10677697)
    I've got a 4x160GB SATA software RAID-5 array (about 450Gb usable) serving up files on my home network right now, running under the 2.6 kernel.

    These drives are all crammed into an old Dell that was my Wintendo a couple of years ago. A few months back, the grilles on the drive-bay coolers I installed got clogged up and I lost one of the drives to overheating. Upon replacing the drive, the rebuild took the better part of an evening (but didn't need to be attended). No lost or corrupt data.

    The only major problem I had was that the RAID was dirty in addition to being degraded (insert "your mom" joke here), because I brought my machine down hard before realizing what was going on. In theory, I could have done a raidhotremove on the bogus drive and brought things down normally

    I ended up having to do some twiddling to get it to rebuild the dirty+degraded array. I don't remember what that was, but as long as you don't do something boneheaded like ignore kern.log messages about write errors to a specific drive, get annoyed that it's taking so long to cleanly unmount the filesystem, and hard-reset the box, that shouldn't be an issue :).
  • by Admiral Burrito ( 11807 ) on Sunday October 31, 2004 @04:59AM (#10677738)

    No, you're not "done period". You'll lose a lot of data, but may still be able to recover some. Likewise when losing one disk in a RAID-0 setup.

    Any file that resides entirely outside of the gap in the array can be recovered. How likely that is depends on the details of the filesystem, the striping, and the size of the file (the larger the file, the more likely that a part of it fell into the bit bucket).

    Also, not all drive failures are total. You may have a RAID-5 array with one drive that completely failed, and another drive that just has some bad sectors. In that case you should be able to recover most of your data. Or you may have two disks with just a few bad sectors, which is even less bad.

    This all depends on being able to force the array to allow access to the device, so that you can mount the filesystem (in read-only mode) and sift through the remains. Some (many? most?) RAID implementations may just give up if two disks in a RAID-5 array (or one disk in a RAID-0 array) are flagged as bad, in which case you really are screwed, even though your data is still there. From what people have been posting here I would guess that Linux SW RAID will let you force it, though I've never needed to try it myself.

  • by cowbutt ( 21077 ) on Sunday October 31, 2004 @05:19AM (#10677797) Journal
    Hard drives have spare sectors set aside for sectors that die, and they are automatically remapped. If software RAID is detecting errors, just REPLACE THE DRIVE. The entire drive will die soon anyways.

    Not quite. In my experience, bad sectors are only remapped by the drive firmware on write. Attempts to read bad sectors will return errors. This makes sense if you think about it; you might be trying to recover data, and the sector might be readable once in a hundred tries, but if you're writing to the sector, then obviously, you don't care about the data that's there already, so it's an opportune time to remap it.

    --

  • by Fallen_Knight ( 635373 ) on Sunday October 31, 2004 @06:04AM (#10677878)
    http://www.tomshardware.com/storage/20040831/sata- raid-controller-12.html
    http://www.tomshardware.c om/storage/20040831/sata- raid-controller-16.html

    seems plenty fast to me, guess they fixed the problems, at elast enough to be at the top:)
  • Re:Please! (Score:3, Informative)

    by Gherald ( 682277 ) on Sunday October 31, 2004 @08:11AM (#10678193) Journal
    3ware raid controllers kick ass, they are the best on the market especially for Linux

    frickin expensive, though... if you need that kind of performance it'd probably be speedier and more cost effective to do a software RAID 0+1
  • by Kyril ( 1097 ) on Sunday October 31, 2004 @01:54PM (#10679687)
    Careful with that "always". There was a Compaq box using a RAID-1 controller that I couldn't immediately recover; I think it reserved some space at the front and put the partition table after that, so I couldn't readily mount it or even fdisk -l....

So you think that money is the root of all evil. Have you ever asked what is the root of money? -- Ayn Rand

Working...