Which RAID for a Personal Fileserver? 898
Dredd2Kad asks: "I'm tired of HD failures. I've suffered through a few of them. Even with backups, they are still a pain to recover from. I've got all fairly inexpensive but reliable hardware picked out, but I'm just not sure which RAID level to implement. My goals are to build a file server that can live through a drive failure with no loss of data, and will be easy to rebuild. Ideally, in the event of a failure, I'd just like to remove the bad hard drive and install a new one and be done with it. Is this possible? How many drives to I need to get this done, 2,4 or 5? What size should they be? I know when you implement RAID, your usable drive space is N% of the total drive space depending on the RAID level."
RAID 1 (Score:5, Informative)
For simplicity and low expense, even though you lose a full drive worth of capacity, go with RAID 1.
You might want to read The Tech Report's recent article [techreport.com] mentioned on Slashdot [slashdot.org] if you haven't already.
search the fscking google (Score:1, Informative)
No offense intended, but why didn't you just do a google search rather than asking 1.5million slashdotters? The words "raid type" would have produced a nice table from adaptec and ars technica as the very first result that would have explained what you needed to know:
http://www.ebabble.net/html/types.html [ebabble.net]
RAID -1 (Score:5, Informative)
RAID 5 or RAID 10 (Score:5, Informative)
Quick overview:
RAID 5 - Requires at least 3 HDs (many times implemented with 5 - can be used with up to 24 I believe). Data is not mirrored but can be reconstructed after drive failure using the remaining disks and the parity data (very similiar to how PAR files can reconstruct damaged/missing RAR files for the Newsgroup pirates out there). % of total space available dependent on number of drives used.
RAID 10 - High performance, but expensive. You get ~50% of the total HD space as it is fully mirrored. So, 1 TB total disk space nets you 500 GB total storage space. Your data is mirrored so if one drive fails you do not lose everything. However, if you experience multiple drive failure you can be in big trouble.
raid and ide channels (Score:5, Informative)
Raid 1, 0+1, or 5.. (Score:4, Informative)
Raid 1 is the safest.. just mirroring the drives, but it results in no speed increase..
Raid 0+1 does mirrored stripe sets -- you get the speed advantages of raid 0 with the full protection of raid 1.
Raid 5 is good middle ground. Raid 5 stores 1 drive's worth of parity. When you lose a drive, your system goes down (if you don't have a hot spare), but you throw another disk in and it'll come back up. You also get some speed increase over a normal drive setup. With RAID 5, you only lose a single drive's worth of capacity no matter how many drives are in your array, whereas with raid 1, you lose 50%.
Try netcell raid xl (Score:1, Informative)
Can get a 3 drive or 5 drive card.
I've got 5 200 Gb drives in a raid xl array and it works great.
Old PC + 4 channel raid controller = easy (Score:5, Informative)
Here's what I came up with: Total cost about $1200 (probably less by now).
0) Red Hat Linux, ext3 filesystem.
1) 3Ware Escalade 7506-4LP card (64 bit card, but fits in 32bit slot)
2) 4x 250Gb Western Digital drives
3) Big fan.
At RAID 5 This yields 750gigs (715Gb after crappy GB conversion).
The 3Ware software has a nice web monitor interface and does daily or weekly integrity checks. It emails me if there is a problem - I did have one drive die already and replaced it easily.
Pat Niemeyer
Author of Learning Java, O'Reilly & Associates
RAID 5 (Score:2, Informative)
Re:Raid is not an option (Score:2, Informative)
Redundant Array of Independent Disks
Whichever damn raid level you want! (Score:5, Informative)
If you have a stack of 6 drives and believe not a single one is ever going to fail, go for level 0.
If you are a government contractor and are required to handle simultaneous failures of 75% of your drives, either mirror them all or go with 5+1 or a raid 10 setup.
All in all, its a poor question to ask slashdot. You need to let us know what you consider an acceptable failure, and by the time you have that figured out determining what raid level you need is easy.
RAID 5 or 6 (Score:3, Informative)
So, RAID 5 or 6 would be the best (RAID 6 is worth the extra bit of space for the 2nd calculation, and really helps when you can test the pairity bits against another pairity to create the lost data.)
There will be some slow down associated with RAID, but it wont be as bad with 5 or 6 and generally, you can live through it with the thought of having relativly robust file servers.
What raid to use. (Score:2, Informative)
Put simply if you don't have a lot of data to store but you want it safe go for raid 1 with small drives you end up with the same data storage as one drive but it takes 2 drives. If you have a lot of data to store go for raid 5 you get twice the data storage of one drive but you use 3 drives.
RAID 5 w/ hot spare (Score:2, Informative)
In theory, some of this is possible in software, but a good RAID controller card is much, much better.
www.google.com (Score:5, Informative)
Re:Raid 1, 0+1, or 5.. (Score:2, Informative)
It is able to work in degraded state by using the parity information to re-build the data from the failed drive on the fly.
Re:RAID 1 (Score:5, Informative)
Re:Raid 1, 0+1, or 5.. (Score:3, Informative)
Actually, with any proper implementation of RAID 5 you wouldn't lose functionality during a single drive failure, but you would suffer a performance hit because every read would require the drive controller to reconstruct the missing data from the checksums.
Replace the bad drive very quickly, though, because a second drive failure will result in wiped drives, effectively.
Re:search the fscking google (Score:5, Informative)
You also never touched on the possibility of him having only 2 drives, in which case RAID 1 would be the way to go for data redundancy.
Re:RAID 1 (Score:4, Informative)
Re:Just remember the RAID song (Score:5, Informative)
Raid 1 or Raid 5 (Score:2, Informative)
If you're upto a challenge, install Linux to boot from the RAID 1 config. It was a huge pain in the ass to figure out. When I configured Redhat 9, I had to use Lilo instead of Grub as the boot loader wasn't being correctly written for both drives. Had to use "dd" to write the boot sector and Lilo to get it working properly.
Benefits of software raid allow you to swap drives with minimum downtime and recreate the drive in the background. And u save money from not buying a hardware raid card, which could serve as another possible point of failure. Then you can write scripts that can email you the status of the raid periodically with cron.
Remember to test the config by unplugging each drive separately. Of course it will take awhile to sync each drive...
If you are feeling feisty and have more money to spend try this (a copy of a previous post of mine):
Here are some interesting numbers:
$250 per drive
400GB per drive
4 drives
1.2 TB in Raid 5
Total cost $1,000
or $0.83 per MB.
So there you have it. A terabyte file server for about $1000 will be a reality soon enough. Nice. Serial ata will lessen cable clutter, and only 4 drives will be doable in any spare decent case and power supply.
Hopefully it won't take too long for prices to drop to $250.
Of course Raid of any level is no replacement for a full backup, but it's certainly better than nothing or relying on a single drive no matter how good the quality/warranty.
Which RAID level to run ... (Score:3, Informative)
RAID 0 stripes the data across 2 or more drives and therefore offers no redundancy (in fact, in a two-disk stripe you mutiply danger of data loss x4 compared to two individual drives -- because you not only double the possibility of failure with two disks as opposed to one, but stand to lose all of the data on both drives should one fail). In any event, no point in discussing it further since redundancy is the point.
RAID 1 offers redundancy by exactly duplicating the contents of a drive onto another drive, and needs exactly two drives. This is considered the most "fail-safe" method of RAID array although offers no performance benefits whatsoever.
RAID 10 (or 1+0 or 0+1) is a combination of RAID 0 and 1 and is nearly always done with four drives, although technically it can be done with six or eight (if your controller supports them). It offers both performance benefit and redundancy, although the cost of the "wasted" drive space is quite high.
RAID 3 involves using 3 or more drives, one of which contains parity information to rebuild the lost drive should any of the other drives fail. This is one of the least popular RAID formats and has more or less been totally replaced by RAID 5.
RAID 5 involves using 3 or more drives and writes parity information across all drives in the array, allowing one drive to fail with little to no performance loss. The failed drive can be replaced and the RAID rebuilt. Depending on your hardware/software, this can often be done hot without having to power down the system at all. It is one of the most commonly implemented RAID solutions because of the good mix between drive use (the price goes down the more drives you have in the array yet you can have as little as three), redundancy, and high availability.
There are others out there like RAID 50 but nothing worth mentioning, especially for a home user.
The only question left to you is whether the RAID will be run by hardware or software (software might be a good choice if you are already running Linux on the server, but you'll have to ask someone else about it because I don't know a thing about it). Personally I chose the hardware route years ago and bought an Adaptec 2400A, which is a four-channel hardware ATA-RAID card capable of RAID0, 1, 10, and 5 -- guess which I use. I use all four channels, each with a 200GB SATA hard drive. I've lived through a couple drive failures, a full drive upgrade (when I first bought the card it was 4x60GB drives) and even once where two drives RAID tables got zapped (I'll NEVER put my drives in removable cages again) and never lost a byte of data -- so the CAD$500 or so for the investment on the card was worth it.
600GB of storage means not having to worry about all those unlicenced-in-North-America-anime torrents running out of space any time soon.
Re:Just remember the RAID song (Score:5, Informative)
RAID 0: This is a striped set, there is no redundancy. One drive goes, everything's gone. Useable space = 100%
RAID 1: This is a mirrored set. Typically this involves 2 drives. One drive is an exact copy of the second. If a drive fails, you replace it and rebuild the set. Life goes on. Useable space = 50%. Most IDE raid cards only support RAID 0 AND 1.
RAID 5: This is a striped set with parity. You get the performance associated with a striped set. Particularly on reads. If you have 4 drives, there are 4 stripes. 3 of those stripes are data stripes, the 4th is parity. Lose 1 drive and the parity information is used to rebuild the set. Useable space = (n-1)/n. To do this in hardware is typically fairly expensive.
There's a lot of hardware solutions out there. It can also be done in software. Windows supports creating disk sets in software. Other options include the purchase of a Snap! server, or other brand of NAS. If you've got a little $ to throw around, NAS is the way to go. Plug it into your network, minimal setup, and your off and running. Not very upgradeable, and somewhat problematic if your drive does actually die, but I use them at the office for a zero maintenence file server.
Re:Raid 1, 0+1, or 5.. (Score:3, Informative)
Example: two SATA drives
RAID-0: Write Speed: 2x, Read Speed: 2x
RAID-1: Write Speed: 1x, Read Speed: 2x
Basically, when doing a write, the driver can use the same buffer and stream the write data to both drives synchronously meaning no slowdown. A proper read driver will read alternate chunks simultaneously from the two drives, resulting in a 2x speed improvement on the whole.
The obvious downside to the Mirroring setup is that you only have half of your total space available for use.
Re:RAID 5 or RAID 10 (Score:4, Informative)
Specifically the setup is as follows
1 == 2
3 == 4
5 == 6
7 == 8
Setting up a RAID in this way will allow you to experience multiple drive failures while still keeping the raid alive. The most detremental in this scenario is if you lose two drives on the same deveice. Meaning if you lost drives 1 & 2 you expereince a more of a problem as opposed to losing drive 1 & 4.
Just my 2 cents, poke holes where necessary
Re:Dear Slashdot (Score:2, Informative)
OTOH, Ars Technica has a decent piece [arstechnica.com] on RAID.
Re:Old PC + 4 channel raid controller = easy (Score:1, Informative)
Re:search the fscking google (Score:4, Informative)
Let's see. My server requires half a terrabyte of storage.
3 200gb IDE drives at $100/ea == $300
3 180gb SCSI drives at $700/ea == $2,100
Yeah... Not likely, pal. And certainly doesn't qualify for "affordable" like this guy is clearly looking for.
Re:raid and ide channels (Score:3, Informative)
Re:Just remember the RAID song (Score:2, Informative)
Re:RAID 1 (Score:4, Informative)
I had heard that the new VM for linux supports snapshots so I will probably be looking into that soon but I haven't messed with my file server in over 3 years. It just works (TM).
Avoid Promise like the plague! (Score:5, Informative)
What I didn't know at the time, but learned the hard way, is that Promises's RAID monitoring program "PAM" is a user-mode only application. That means that if you don't login, it doesn't run. Care to guess what happened to me?
At some point while I was gone for the weekend, I can only guess something crashed and rebooted Windows 2000. When it rebooted, I didn't have it set to automatically login (why would I? it's a server). So "PAM" wasn't running when one of the drives in the RAID 5 set failed. Maybe it even had something to do with the crash, I don't know.
Now, the point of PAM is that if a drive fails, an e-mail gets sent, in this case to my mobile phones textpage address. Since PAM wasn't running however, nothing was sent. The drive failed and, I can only guess, put off so much heat that it cooked the drive above it (why do so many cases mount hard drives horizontally above each other anyway?) and next thing I know, I can't login to my server from where I'm staying. I call a family member with a key to come by and they are unable to restart the server. It wasn't until I came home and read the BIOS messages that I understood why. Everything gone.
I had a lot of stuff on CDR, but let me tell you, I was plenty outraged that Promise could design something so utterly stupid as a monitoring utility that doesn't know how to run as a service. Even to this day, PAM still will only run as a user-mode program, and even worse, you actually have to login to the program now to start it, which can't be scripted.
F Promise. Only a complete and utter fool would be stupid enough to buy any of their products. May they rot in that special place reserved for child molesters. (Yes, I'm still bitter about it)
- JoeShmoe
.
Re:RAID 1 (Score:5, Informative)
My fileserver has a mirrored pair of drives in front mounted, hot-swap bays. I have a third drive on my workstation and I sync to that every time I add significant amounts of data to my server. The mirroring protects against drive failure and the third drive protects against server failure, operator error, filesystem corruption or other problems that can wipe out a RAID array.
Lastly, the stuff that changes often and is worth the most to me - small documents and other things I create - gets a nightly sync to the server's boot drive and I keep a month's worth of revisions. This lets me "go back in time" to retrieve things if I need to. Considering the relatively small size of this type of material, this doesn't take up a lot of space. I think the whole month's worth of revisions only takes up 10GB or so.
The hot swap bays let me yank a drive out on my way out of the house if the place catches on fire. Yes, I know I should be storing that third drive at a friend's house, but it's too inconvenient to retrieve it every time I want to backup my array. So a fire may destroy everything if I'm not home or can't safely pull a drive on my way out. I'm comfortable with that.
Re:RAID 1 (Score:3, Informative)
It'll be cheaper than tape, but more work.
By a process of elimination... (Score:5, Informative)
It's for home use
No data loss if a drive dies
Easy to rebuild - remove dead drive, install new one
Budget... Ah. Why is it *every* "Ask Slashdot" never mentions the budget? On the cheap, you could do simple mirroring RAID1 - most mobos with on-board SATA RAID will do this for you. The overhead is that you pay twice as much per GB because you obviously need two drives and the performance gains are negligable.
Personally, I'd take the more expensive route; get a proper hardware RAID controller with proper RAID management software. There are 4 port SATA RAID controllers (who *really* still needs SCSI for home use?) for a few hundred dollars and do full RAID5. You lose one drive for the parity info, but that could be as little as 25% of your total capacity if you get four drives instead of the the minimum RAID5 requirement of three drives.
Also, with a proper hardware RAID controller, you should also get a performance boost from use of RAID and have minimal CPU overhead. Get four of Seagate's new 400GB drives and you'll have over a TB of disk space, which should give you some bragging rights for a months or two before it's old hat. :)
Re:Software raid (Score:3, Informative)
Re:RAID 1 (Score:5, Informative)
At a different site.
KFG
RAID information (Score:5, Informative)
You want a Promise UltraTrak SX8000 [promise.com] It's the easy idiotproof array. We're using several of these.
If a drive fails, it beeps at you til you replace it. You just yank it out, and put in a new drive, the same size or larger. It then rebuilds automatically. No shutdown or reboot required.
The Linux crowd will be happy to know the RM series runs linux. I don't know about the SX series, but I suppose it does too. Either one appears to the server to be a single SCSI drive. No drivers required, other than making the SCSI card of your choice work.
There's the Linux method of doing it too, which I like a lot. It saves you a *LOT* of money in extra hardware. You can go with 3 drives without adding any extra cards to your system, or you can put in IDE controllers to add as many drives as your system can support (PCI slots, power, and physical mounting points are the limitation). Read the "Software-RAID-HOWTO", which should come with your system. I've done many of these also, and they work quite nicely. You have to shut down the system to swap a drive, and then run `raidaddhot` with a couple parameters (the md device, if I remember right), and you can be running while it rebuilds.
You should have looked it up before you posted.
RAID 5 is the most common for a large redundant array. The array size is (N-1)*size . The more drives you use in a single array, the better off you are for size loss.
3 100Gb drives = 200Gb
5 100Gb drives = 400Gb
10 100Gb drives = 900Gb
10 200Gb drives = 1.8Tb
RAID 0 is striping. No redundancy, which you won't be happy with. (One failure means losing the array.
RAID 1 is mirroring. With two drives, you still only have the size of one.
RAID 50 is nice where it does striping across redundant arrays. You lose size, but gain speed.
Most other RAID types aren't very popular for various reasons.
Watch out for going over 2Tb in size on a single block device. I'm having problems with that right now. I have two Promise VTrak 15100's with 15 250Gb SATA drives in each, and anything with a block size over 2Tb is giving me grief. There are legitimate reasons for this, most of which newer documentation claims to be fixing, but I'm still having problems with a current Linux release. Making logical drives under 2Tb works, but doesn't accomplish what I need.
I hope this helps.
Try this... (Score:5, Informative)
Re:RAID 5 or RAID 10 (Score:2, Informative)
I don't believe the actual standard restrict the number of members a RAID 5 array can have, although generally each additional drive beyond 8 gives you diminished return.
Data is XOR'ed across all members - 1, and result "parity" deposited in one of the members, the following is one example...
D0 D1 P0
D2 P1 D3
P2 D4 D5
Re:RAID 1 (Score:5, Informative)
DO NOT RELY ON RAID TO PROTECT YOUR DATA. If you do, you will lose it some day. Raid only protects against hardware failure. There are plenty of other ways you can lose data and one of them will catch up to you eventually.
If you can't afford to lose it, back it up to another drive on another computer. If you really can't afford to lose it no matter what, store your backup drive with a friend.
Here's some reading material for you. (Score:2, Informative)
First, you must decide which RAID level meets your needs/wants. To do this, you must educate yourself on the various RAID levels and the pros and cons associated with each so you can make an informed decision. I recommend reading "The Skinny on RAID" [arstechnica.com] if you want to learn the various RAID levels available.
After reading that article, you should learn about hot spares and what they can and cannot do for you. A recent article has been written about setting realistic expectations on what hot spares can do for you. "The Mythical Hot-Spare - Tape/Disk/Optical Storage" [findarticles.com] will be informative on this subject matter.
Lastly, you should read "Kill SCSI II: NetCell's RAID 0 Performance + RAID 5 Security Equals SyncRAID" [tomshardware.com] to look into a innovative IDE RAID card that can give you kick ass performance and reliability. Be sure to read the benchmarks on the review so you can make an informed decision.
Re:Just remember the RAID song (Score:3, Informative)
Or you'll be down until the replacement arrives
Um, really? Software RAID 5 does require downtime, but hardware implemented RAID 5 allows for hot swapping out of the bad drive, assuming you have a decent controller card.
Re:RAID 5 w/ hot spare (Score:2, Informative)
I was never able to determine whether there was some kind of conflict between netatalk, the kernel, and the driver for the two Promise IDE controllers, if I had problems with a bad batch of drives or if I didn't have enough airflow through my case.
Also during that period, I somehow had a filesystem get corrupted, and lost the RAID, even though no drive had failed.
Given that bad experience with [software][IDE] RAID, I now use four 250 GB SATA drives on two controllers. Every night I do an rsync backup (changes only) from one disk on one controller to a different disk on the other controller. So far, none of the disks have failed (I'm using Maxtor Maxline IIs), but now I'm confident that I can survive either a disk failure or a filesystem failure.
One other interesting point: I recently found an article about a relatively new RAID problem. Apparently a RAID 5 using 250 GB IDE hard drives can take more than a week to rebuild the array when a drive is replaced! Might want to try to find some more details on that before you build your big RAID.
2 drives with a complete file copy at 4 am (Score:5, Informative)
You definitely don't need any type of RAID solution because it doesn't offer you what you really need. You say you want RAID, but what you really want is backup.
All RAID solution deal with disaster recovery, but they don't deal with the situation where you accidentally rm -rf a directory that you wanted. If you mirror or RAID 5 your drives, you're still hosed because both drives will delete the files. In the end, this is more important and much more convenient.
Instead, go with a better approach which is copy or tar your files every night (or every week) to a backup drive, preferably over the network on a completely different machine. This will prevent the problem of a power surge or accidental shutoff from corrupting both drives at the same time.
Re:Software raid (Score:5, Informative)
So, several hundred users using IMAP and POP3 to collect mail, SMTP to send mail, and the 100k or so incoming messages do add up to a lot of work, and it handles it flawlessly.
$ cat
Personalities : [linear] [raid0] [raid1] [raid5] [multipath]
read_ahead 1024 sectors
md0 : active raid5 hdc2[2] hdb2[1] hda2[0]
351100416 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
unused devices:
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md0 330G 11G 302G 4%
/dev/hda1 122M 8.0M 108M 7%
none 499M 0 499M 0%
Re:Drive cost (Score:2, Informative)
The joys of RAID (Score:5, Informative)
RAID 1 - Drive mirroring.
Pros:
-Excellent read performance, no loss of performance if one drive crashes.
Cons:
-The amount of space you can have on this array is limited to the largest drive you can find. Then you have to buy a second one to mirror the data, which means you are paying double the cost per unit storage on your array.
-Write performance is slower than other RAID levels.
RAID 5 - Striped array with parity. You can stack as many drives as you want on this array (within limits of the controller of course) and lose only one for redundancy.
Pros:
-You can build a very large data array out as many drives as you want, losing only one for the purpose of data reconstruction should a drive in the array fail.
Cons
-Array performance dies in the event of a failure, as lost data is reconstructed on the fly from parity information stored across the remaining drives. Of course, performance is restored with the bad disk is replaced and the array reconstructed.
-You need at least 3 drives to build a RAID 5 array.
RAID 10 - Drive mirroring with striping. Essentially combines RAID 0 and RAID 1, hence RAID 10.
Pros:
-Redundant and fast. Array can survive multiple drive failures.
Cons:
-Expensive. You need at least 4 drives to get started with RAID 10, and go by 2's as you expand on the array. As with RAID 1, your price per unit storage is doubled.
-The array can survive multiple failures, but that depends on which drives die...If you lose two drives out of the same mirror set, then the array is gone
Which RAID level you pick depends on your application. If you are interested in having something like a 1 TB data dump, you'll probably want to go RAID 5. If you only want 200GB or less in your array, then RAID 1 is probably the way to go. If you are interested in lots of space, lots of redundancy, and have lots of money, then RAID 10 is probably what you want.
Re:Software RAID? (Score:3, Informative)
The limitations and versatility are not determined by the "software or hardware" ("hardware" being software on a dedicated raid controller) but by the design of the specific software under consideration.
True, the software raid in Linux is quite versatile, but there is no reason why a raid controller could not work with two disks of different sizes and use part of one disk in a mirror and the remainder as a standalone disk.
But as you point out, not all controllers may be able to do that.
Furthermore, a software raid-1 solution in the kernel would theoretically be able to perform better than a hardware raid-1 controller, because read operations can be distributed between the drives and the kernel can know more about operations coming up and do better optimization of locality than a loosely-attached raid controller can.
However, the Linux kernel, when I last looked, does not take real advantage of that. The read operations are distributed over the drives, but there is no separate elevator optimization for the drives making up the array.
Hot-swapping has nothing to do with all this. It can be done when your hardware allows it, and IDE hardware cannot. But SCA drives can be used with software raid, and the brave hacker hotswaps any SCSI drive.
There is a nasty problem with the Linux software raid: when 1 sector fails to read in a raid-1 array, the entire drive containing that sector (better: the entire partition) is marked bad and taken offline (no longer updated).
When another single sector on the other disk fails, you have a real problem!
More reasonable in this case would be to read the failed sector from the other disk, and attempt to write it back on the failed disk. When that succeeds, try to read it again, and it may be OK because the drive may have re-mapped the bad sector or rewriting may have fixed a soft error.
So the soft-failed drive remains online and further errors can be handled.
(of course the failure would still be logged for examination by the administrator)
RAID5 (Score:3, Informative)
IDE is so cheap you might as well just buy two big sata drives for most usage. Do make sure you buy two drives from two different vendors - its really embarrassing when you use two identical drives with near serial numbers and they fail the same day.
Also keep external backups. One place I worked we lost an entire array and the hot spare to a PSU failure. No backups.. thankfully it was the usenet spool
Re:Software RAID? (Score:5, Informative)
Many benchmarks show the exact opposite, except when dealing with high-end RAID cards. Why? Because the average CPU on a system with a RAID is going to be much more powerful than anything you're likely to find on a low- to medium-range hardware adapter. I use software RAID on a number of FreeBSD servers and it absolutely flies.
The major downside is that you cannot (as least I don't know how to) hot-swap drives.
That's a function of the hardware and OS. One of the above-mentioned FreeBSD servers is in a nice IBM server case with hot-swappable front-access LVD drives. The swap process is:
There's no reason you can't do hot-swappable software RAID. If there is, then someone forgot to tell me server.
Indeed, no RAID (Score:3, Informative)
Initially I've dd'ed the primary to the other two disks.
Every morning the primary is 'cp -fpRu'ed to the second one. No files are deleted on the secondary, unless I'm running out of diskspace there, at which time I do an 'rsync -aH --delete' after some verifications.
Each few weeks I bring the third, down the server swap it with the secondary, and return the swapped third.
I feel pragmatically protected. In the case of a crash I won't lose more than a day of work. In the case of burglary, fire or Gotterdammerung, a few weeks.
Next time I'll rebuild the file server I'll make the 2nd and 3rd an external Firewire or USB2 High Speed.
Fingers crossed.
Re:Software raid (Score:5, Informative)
I've run software RAID-5 on Linux for several years on two of my home fileservers.
The only problem I ever encountered were hardware failures (Promise *ack* *spit* PCI IDE cards) and one drive failure. Performance is not really an issue for home use; I can easily saturate my 100Mbps network card.
My Fileserver: AMD Duron 1300MHz, 768MB RAM
This device was built from 4x 160GB 7200rpm SW RAID-5 for online storage (including all of my digital photos, and my collection of CD's ripped to MP3).
For backup I have an old Celeron 433, 512MB RAM box with 4x 120GB 5400rpm SW RAID-5
The main fileserver is rsynced to the backup server once a week. CPU on the backup server is a bottleneck; the Celeron is a bit underpowered for rsync, but it works ;)
My $0.02:
- Software RAID is perfectly usable, especially for typical home use. Performance is adequate.
- With RAID-5 you "lose" only one disk to parity so it is quite cheap to build
- Yes, I'd really like a 3Ware Escalade but if the card fails I need to get a new one pronto; software RAID sets can be migrated to most PCs.
Recommended RAID level (Score:5, Informative)
In short, I would probably recommend RAID5 if you have 3+ drives.
RAID5 gives you the most available space while still being redundant. It allows for exactly one hard drive failure.
RAID5's write speed is usually terrible, especially with a small number of drives, but write speed isn't a big deal on my home file server. (Only you know about your needs).
RAID1+0 (NOT RAID 0+1, which is inferior) is great for performance. With 4 drives, you have potentially twice the STR of one drive (writing) and 4 times the STR of one drive reading. Of course, since STR is not important for most IO, this doesn't really effect your end performance much unless you are dealing with linearly reading/writing very large files.
Writing performance will almost certainly be higher than with RAID5.
You do lose quite a lot of space (especially when you use a large number of drives). If you used a 4-drive 1+0 array, you would have the space in two of those individual drives.
RAID1 is nice, and is very reliable, but is impractical with more than two drives unless you are incredibly paranoid. RAID1 simply makes all drives copies of the others, this, you always have as much free space as one drive would have, even if you have ten. If course, you could also handle 9 drive failures and not lose data. RAID1 is fine for 2-drive arrays though.
DO NOT FORGET that RAID is no substitute for regular backups. RAID will not help if your data loss is caused by FS corruption, a cracker, accidentally typing "rm -rf
For lowest cost, I would use software RAID, such as Linux's LVM, FreeBSD's Vinum, or whatever Windows has. (RAID5 requires Windows server). (I would not use Windows as the file server myself).
For slightly higher cost, try a Promise controller.
I would avoid Highpoint and Silicon Image controllers. Highpoint, especially, is crap. (but it is very cheap, at least).
If you possibly can, I would recommend a nice 3Ware Escalade controller. Escalades are true hardware RAID cards, unlike Highpoint/SI and most of Promise's cards, and are OS independent and very stable (with certain exceptions for some unlikely configurations).
If you have any questions, you might try the StorageReview forums. There are a number of extremely knowledgeable people there, including engineers and executives-level researchers at hard drive companies. They can give far better advice than I can, I am sure.
By the way, all my comments assume that all drives are the same size. If not, treat all drives as if they are the same size as the smallest drive on the array (unless you are using JBOD, which is not redundant)
Re:mod down, incorrect! (Score:2, Informative)
Any halfway decent RAID controller, or software RAID will do large reads by using both drives at once, greatly increasing the performance.
IDE has some limits to this, because they are stupid drives compared to SCSI, but the controller can still take advantage of the 2 copies.
Re:Software raid (Score:2, Informative)
Re:RAID 1 (Score:2, Informative)
Typo, should have been RAID 1+0
RAID 5, but more importantly (Score:5, Informative)
Next up is drives. Not all drives are alike as I'm sure you already know. Do you want a SCSI or an IDE array? I won't go into this lengthy topic further. I'll assume though that you will build an IDE array. Some drives do not work well in RAID setups. The controller companies are more likely to tell you this than the drive manufacturers. I own 6 Western Digital WD12000JB drives (7200 RPM, 8MB cache, 120GB capacity). By all accounts one would expect those drives to work quite well in a RAID setup. They have excellent read/write times individually and have a massive amount of cache. Well, one would think that and they'd be wrong. Both 3ware, Highpoint, and Asus tech support (on an OEM Promise chipset in teh A7V333) recommend against using Western Digital drives. 3Ware did however say that WD will give you firmware that works significantly better in RAID setups if you ask for it. Personally I'm a fan of Maxtor, both the drives and the company. I've had very few failures with Maxtor drives. Whenever I did they were always extremely helpful with getting me a replacement fast. I've been very impressed by ther service. I have 2 Maxtor 7Y250P0 and 2 6Y200P0 drives in the server sitting next to me. The second is a very high quality drive from Maxtor's DiamondMaX Plus 9 line. It too have 8MB cache and 200GB to spare and runs at 7200 RPM. Nice drive. The first pair are from Maxtor's MaXLine Plus II. They have a high MTTF, 8MB cache, 250GB space, and run at 7200 RPM. They are also a little bit faster than the 6Y200P0. They are excellent drives. My next drives will also be Maxtors but this time I'll be buying the SATA siblings of the MaxLine Pluss II product line.
That brings me to my next point. PATA or SATA. Does your case have an abundance of room? I mean a massive amount of room to route long 80-conductor ribbon cables? Do you have at least 1 if not 2 PCI slots to waste below your RAID controller with the room needed to route the ribbon cables and make connections? If not then you need to go with Serial ATA drives. Don't even think twice about it. Go with SATA. The drives cost almost the same nowadays and you'll find wht little price difference there is ($5?) is worth it in the end. SATA drives are so much easier to wire. I have a case full of round cables. The case I have is an extremely large Codegen case and even I am having trouble with the cable mess. SATA is a wonderful thing. Along the same lines is hot-swap cages. There are a dozen brands to choose from. You should probably utilize them, even if you don't need hot-swap capabilities. I need them to create 3.5 drive slots from 5.25 bays. If you do want to do hot-swapping, make sure you drive cage and controller support it.
Finally we get to RAID levels. You don't want to increase your risk of losing data so level 0 is out. 1 is extremely redundant and with the right controller can actually speed up reads. It's also costly at twice the cost per GB. Unless the data you're storing is absolutely critical you won't want to use 1 (in most cases). Forget about level 2. For starters th
Re:RAID 5 (Score:4, Informative)
Your description seems more to fit a RAID 1+0 which is something completely different.
And then, you don't seem to know anything about probabilities:
"If you have 15 drives, and two fail, the chances of them being consecutive are very low."
Correct. But the probability of two consecutive drives failing ist still just as high as with 3 drives! It is just much more probable that from 15 drives two fail at the same time (that's just 2/15) than from three drives (2/3). Still, it could be better to have more drives, just because you could have a better "feeling" of how many drives fail before it comes to the fatal crash.
But for RAID 5, this is irrelevant anyways, because any two drives failing will screw your data. And with 15 drives, the probability for that is much higher (and I would even say that 15 drives is a bit too much for RAID 5, use a RAID level where more than one drive can fail without data loss).
Re:RAID 1 (Score:2, Informative)
You're wrong. Hot-swap ability is a function of the RAID controller and the drive's mounting hardware (or 'cage'). It has nothing to do with the RAID level at all (except that a RAID 0 array can't be hot-swapped, since it has no redundancy and removing a drive would take down the whole array).
We have a bunch of RAID 0+1 systems, some use 80-pin SCSI hot-swap disks and cages, some use ATA hot-swap cages. A wide variety of manufacturers make hot-swap cages for 80-pin SCSI (SCA), Ultra-ATA, and Serial-ATA drives (DataStor, Adaptec, Promise, and SuperMicro to name a few). And of course the bigger server manufactures make their own with hot-swap cages as well to build into their servers.Re:RAID 1 (Score:2, Informative)
By adding the second physical disk as part of the volume you just double your chances of losing all your data if either drive fails.
Re:Raid 1, 0+1, or 5.. (Score:3, Informative)
Ok, here is the info and prices (Score:4, Informative)
Sigh.
The cheapest RAID 1 OS internal and independent RAID (MIRROR) is Duplidisk3 by ARCOIDE.com
You also get a ton of implementations; Stand alone, PCI card (for power only), 3 1/2" bay, and 5 1/4" bay. The ones that install in bays are so the user can seethe status lights.
If you want an external RAID 5 the cheapest I have found is this - http://www.coolgear.com/productdetails1.cfm?sku=R
If you want 5 disk RAID 5 those are @ $1200. http://www.cooldrives.com/fii13toatade.html
If you want external RAID 0 or 1 relatively cheap then go with one of these - http://www.cooldrives.com/dubayusb20an1.html
You can find a ton of these devices on the web since they all use the same drive controllers and bays. The nice thing about these is that sometimes you can talk the store into selling you the RAID system without the external case. These things simply require you plugging in an IDE cable and power and can be installed in any PC case that has 2 5 1/4" bays open. If you but just the 2 bay controller they are @ $230 or so. I have one and I am really happy with it.
Everything I listed above uses IDE drives and is OS independent.
Re:Snap Appliance (Score:3, Informative)
Re:Old PC + 4 channel raid controller = easy (Score:4, Informative)
1) The raid card is well worth the $200 - it will all just work out of the box, looking like one big disk. I forgot to install the 3ware drivers on my first pass and the raid still set up and worked just fine out of firmware - the monitor just wasn't there. Don't mess with software... just buy the card.
2) Put it in the basement! It's always cooler down there... and a little noise won't matter. Temperature is the key to the life span of disks.
3) Also - it's really important that you churn through the data on a regular basis so that the raid can detect bad sectors and repair them in a timely matter or warn you of impending disk failure. The 3ware software does this automatically... but regular backups would accomplish the same. I'm a bit torn on how much reading through the data to do, since I don't use my raid heavily and the read cycle is actually the largest usage... If I overdo it I'm probably shortening the life of the disks.
Pat Niemeyer
Author of Learning Java, O'Reilly & Associates
Re:Software raid (Score:4, Informative)
As for forking $ for RAID cards, I've had really good experiences w/ the MegaRaid cards from LSI Logic [lsilogic.com] - really, really good tech support and exceptionally inexpensive cards.
Re:Software raid (Score:2, Informative)
Expense, mostly. $272 for a 147 GB 10k SCSI, $170 for a 250 GB 7.2k SATA. An Extra 100 GB for a $100 less? Everybody at work (a large network shop) loves talking about their RAID-0 gaming machines. I shudder each time, as I have had HD failures. Backup Backup Backup!
As far as my home installation goes, I have a single HD for my OS/cache install, and my data goes onto the RAID array, so that even if my OS corrupts it's HD, I can rebuild without affecting my data.
As for not spending $30 for a SCSI HD, that would entail getting into the SCSI world. The guy would also need a card. So we're up to a $100 for a old HD, that while it has fast seek time, a brand new 250 GB IDE drive will smack it down on read rate, and has a 8mb cache to help insure it has it in cache!
Software or hardware RAID-5 (Score:5, Informative)
Basically, your options are RAID-1 and RAID-5... as hundreds of people here have already pointed out. RAID-1 is just straight mirroring (where all drives in the array contain the same information). Usually, this just involves two drives, but there's no reason why you couldn't have, say, three or four drives all mirrored... and you could lose all but one of them and still be up and running.
RAID-5 is a very cool beast. You bascially have an array of drives with some portion of them set aside for redundancy. Most of the posts I've seen here only describe a scenario where you have three drives with one of those drives for redundancy. This only scratches the surface, however.
For example, you could have an array of, say, 5 10GB drives, with 2 drives' worth of redundancy. With this, your RAID implementation would make available to you, what seemed to be, a single 30GB drive (since 20GB of the total 50GB is used for redundancy). This way, you could have any two drives go bad and you're still okay.
Another example, I guess, is that you could have a two-drive RAID-5 with one drive's worth of redundancy. In this case, you'd have the functionaly equivalent of a RAID-1 mirroring setup. Not very sexy... but you could do it in some implementations, I'm sure.
I'm trying to use the phrase "X drives' worth of redundancy" instead of "X drives set aside for redundancy" because it's important to point out that, in RAID, all of the drives are considered equal. If you have 5 drives with 2-drive redundancy, it's not like you set 3 of them as the "main" drives and 2 as the "backup" ones. There's no preferential treatment like that. All the drives are equivalent and you could lose any of them and the others all move to cover for the one that was lost.
Now, personally, I like RAID-5 because it offers the ability to use more than 50% of the space you paid for. With RAID-1 mirroring, you always only get to use 50% of the space that really exists. This would be necessary if, when you suffered a storage failure, you always lost half of it. But that's not how it happens. Usually, you lose a single drive. So, it would be nice to maximize your space available, while having some insurance against a single drive failure.
This is where RAID-5 really shines, because each successive drive you add, you get all of that space for your usage. You could have, say, four drives, 1 drive of redundancy, and you get 3 drives' worth of space.
Now, there are a few pros and cons for both RAID-1 and RAID-5 regarding recovering/moving data and changing the size of your array, and I'll list them here.
ide raid [Re:Just remember the RAID song] (Score:1, Informative)
if you can.. make sure your NIC and disk controller are on different pci buses.
if you play with hotswap bays make sure they have some quality to them - otherwise you'll be reseating/replacing drives like crazy and that'll bring your raid5 to crawling.
another prob i've seen - most sata cables tend to become lose; i'm still looking for a decent snug fit cables out there..
lsi(megaraid), adaptec make decent sata-hw-raid cards (~300$); 3ware is another popular choice;
might wanna check out these "sofware raid cards": a slightly advanced ide controller, might do hotswap but still uses system cpu which could be an ok for you since system cpu is idle anyway..
Re:Software raid (Score:5, Informative)
My system, for example:
36gig 15k (3.6ms) rpm scsi:
250 gig 5k (9.5ms) rpm ide:
Who needs 3.6ms access time for their music and videos? What will that gain you? I can tell you what 3.6 ms access time gives you for a root partition, though: blazingly fast startup of the system, of X, of programs, and compilation.
All of my media is in
As I demonstrated, you can get a small 10k rpm scsi drive with access time 70% better of that for all but the nicest IDE drives (which cost notably more than scsi drives), brand new and with shipping, for 30$. After re-looking at pricewatch, I found the same thing for only 20$, including shipping. You can get a new scsi controller for 20$ also, inc. shipping, that will do 40mb/s (plenty for one drive). A new cable will cost you about 6$. That's 50$ for a root partition that will give you a 70% speed boost over a 7200 RPM ide drive.
Why would one *not* do something like that, unless they really don't care about speed at all? And if they don't care about speed, why raid for reasons other than redundancy?
Re:RAID 1 (Score:1, Informative)
If you're going to be mirroring a RAID 5, why would you not just use RAID 10? You'll gain disk space, be more resistant to disk failure, get better performance (hw/sw dependant), and get faster recovery times.
Re:RAID 1 (Score:1, Informative)
Use LVM with RAID (Score:2, Informative)
But regardless of how you answer those questions and what RAID level you finally go for, I would strongly recommend layering LVM (logical volume management) on top of RAID. Sounds bizarre and cumbersome to have two virtual layers between your filesystem and your physical devices, but in most cases it's worth it.
(Now here I'm assuming you're using Linux, but similar solutions are available for other OSs).
If you're not familiar with LVM, it virtualizes partitions. You group together one or more physical volumes (PVs) that provide a pool of physical extents (PEs). From this pool, you create logical volumes (LVs) filled with logical extents (LEs).
Thus, you could have four partitions on three drives serving as PVs, and from that pool (Volume Group or VG) you could create, say, two partitions. From there you have many options:
- You can resize the partitions.
- You can add another drive and add the space on that drive to the VG, then increase the size of the partitions.
- You can migrate data off one of the partitions, then remove that partition from the VG.
- You can migrate to another drive by adding that drive, migrating data away from the previous PVs, then removing the old PVs from the VG. This can be, by far, the easiest way to
To combine LVM with RAID, just use the md device as a PV.
And here is the top reason to use LVM:
- You can create snapshot backups.
A snapshot backup is a virtual partition, read-only, which contains the same data as another partition, frozen at a certain point in time. Something similar to copy-on-write is used so that the snapshot partition takes only the amount of disk space necessary to store the changes between the time the snapshot was frozen and the current state of the 'snapped' filesystem.
If you 'rm -rf *', you can just cp the files from your latest snapshot. (BTW, this can save a ton of work for sysadmins with forgetful users
So RAID can protect you from hardware error, and LVM with snapshots can help protect you from user error.
Re:RAID 1 (Score:2, Informative)
-- indicates an end to options, so touch creates a file named "-i".
"-i" will be at or near the top of the list when you do a listing (or any other alphabetically sorted operation, including rm). This will cause "rm -rf *" to be passed the file "-i" as its first (or nearly first) argument. This will cause rm to prompt for confirmation for each file (no -- in the rm argument list).
This "trick" will fail if you attempt to rm -rf a subdirectory that contains the "-i" file (it won't be passed as an argument to rm).
I would suggest Raid5 (Score:2, Informative)
The performance on this system is outstanding, writing is done at a sustained 50-60MB/s (yes, megabytes) and reading maxes out the PCI bus completely (tops out at about 80-100MB/s depending on other activity on the PCI bus)
The system is powered by a 2.4Ghz Celeron with 512MB memory.
The only drawback is that it will be a pain to add an additional drive to the system, but thats not really a big issue for me anyway.
Btw, the filesystem on this raidset is Ext3. I've had a diskfailure (old drive that should have been left to its own) since I got it up and running but as long as no more than one drive fails atonce, all is well. Just replace it with a new one, add it to the set and one hour later (or thereabout) all data has been restored to it and the raidset is running at full performance again.
A tip for the hardware that will be running the fileserver. Make sure to cool your drives, this is of outmost importance. No, you don't need screaming 7000rpm fans (I use three 12dB Papst) just make sure that outside air is pulled over the harddrive and expelled in the back of the case. Avoid cases with ventilationholes on the sides. Thermaltake makes (made?) a great case which had airintakes on the front and 6 internal 3.5" bays right behind the intakes (which is the one I use).
Also, you should get a good powersupply. I had some really odd problems before I upgraded to from 300W to 450W.
Good luck
WikiPedia has a great explanation (Score:2, Informative)
WikiPedia: Redundant array of independent disks [wikipedia.org] - great detailed article summarising RAID with explanation of all the levels.
Anyone can even jump in and improve the article.
Simple. (Score:3, Informative)
L0: no raid: 1x 160Gb or 2x80
L1: raid1: 2x 160Gb.
L1: raid5 (3 disks): 3x80Gb.
L1: raid5 (4 disks): 4x60Gb
L1: raid5 (5 disks): 5x40Gb
and you could even go for a hot spare:
L2: raid5HS (4 disks): 4x80Gb.
L2: raid5HS (5 disks): 5x60Gb
L2: raid5HS (6 disks): 6x40Gb
Now, from 4 disks you probably need an extra IDE controller in your computer. Factor that into your costs, and you can chose the protection level (L0 means you can tolerate 0 lost disks. L2 means 2 disks, but in fact not at exactly the same time, but you can tolerate two bad disks).
Then simply chose the cheapest solution.
I'd probably go for 3x80 myself.