Fibre Channel Storage? 119
Dave Robertson asks: "Fibre channel storage has been filtering down from the rarefied heights of big business and is now beginning to be a sensible option for smaller enterprises and institutions. An illuminating example of this is Apple's Xserve Raid which has set a new low price point for this type of storage - with some compromises, naturally. Fibre channel switches and host bus adapters have also fallen in price but generally, storage arrays such as those from Infortrend or EMC are still aimed at the medium to high-end enterprise market and are priced accordingly. These units are expensive in part because they aim to have very high availability and are therefore well-engineered and provide dual redundant everything." This brings us to the question: Is it possible to build your own Fibre Channnel storage array?
"In some alternative markets - education for example - I see a need for server storage systems with very high transaction rates (I/Os per second) and the flexibility of FC, but without the need for very high availability and without the ability to pay enterprise prices. The Xserve Raid comes close to meeting the need but its major design compromise is to use ATA drives, thus losing the high I/O rate of FC drives.
I'm considering building my own experimental fibre channel storage unit. Disks are available from Seagate, and SCA to FC T-card adapters are also available. A hardware raid controller would also be nice.
Before launching into the project, I'd like to cast the net out and solicit the experiences and advice of anyone who has tried this. It should be relatively easy to create a single-drive unit similar to the Apcon TestDrive or a JBOD, but a RAID array may be more difficult. The design goals are to achieve a high I/O rate (we'll use postmark to measure this) in a fibre channel environment at the lowest possible price. We're prepared to compromise on availability and 'enterprise management features'. We'd like to use off the shelf components as far as possible.
Seagate has a good fibre channel primer, if you need to refresh your memory."
Re:speedwise (Score:3, Informative)
alternative to FC (Score:4, Informative)
My memory recalls (Score:5, Informative)
Transactions make significant use of CPU resources in ATA based systems. The only cards I am aware of that move ATA transactions to hardware come from Advansys and Adaptec, both use SCSI chipsets but ATA translators. Available for about 190.00 each but hard to find.
On a PII 450 the chipset usage on a softraid was almost 40% when at full throttle with 3 P/ATA drives. Same system went down a tidbit using a PCI card (not sure why actually, presumably some operations are offboarded by the PCI card thus reducing the overhead, but don't quote me on it).
The main problem with all of this is cost vs redundancy vs speed... which brings us to the old issue we found with Cars. the FAST, GOOD, CHEAP pyramid.
FAST
/ \
GOOD-----CHEAP
CHEAP + FAST != GOOD
GOOD + CHEAP != FAST
FAST + GOOD != CHEAP
My suggestion since my own buying experience with FC is limited (read as NONEXISTANT) I can only tell you what I've learned on SCSI vs PATA/SATA rigs.
SATA systems that do not use an Adaptec chipset are usually mere translators that make use of CPU resources to monitor drive activities. 3Ware cards seem to transcend this limitation rather well and they provide a fine hardware RAID setup for ATA drives.
PATA systems that do not use an Adaptec chipset are likewise mere translators.
SCSI interfaces make use of a hardware chipset that monitors and controls each drive... thus relieving the CPU of being abused by drive intensive operations (think of a high provile FTP or CVS/Subversion server and you get the idea of what would happen to the CPU if it was forced to perform the duties of server processor AND software RAID monitor).
Onto bandwidth. Serial ATA setups suffer from a similar issue I've found with almost all setups. Without separate controllers, SATA setups and PATA setups split up the total bandwidth to the maximum number of active drives. More recent controllers available via SATA may have fixed this issue, but in general, I've found that my systems slowed down in data transfer rate when using drives or arrays found on the same card. SCSI seems to bypass this completely by providing each drive with a dedicated pipeline, but I am not sure what the set ammount is, there is an issue however, as some of the older chipsets DO have some issues when handling a full set of 15 drives. The primer on the article link at Seagate is on the money in their LVD vs FC link/FAQ. You will most likely require a backplane and full setup, and it probably won't be cheap. The big difference will be that FC setups are usually bulletproof, and can support MASSIVE amounts of drives (128 or 125 I think...), with the only thing being faster being a VERY high end Solid State drive or array. Those also have a very small amount of latency, but as far as speed, setup a good ram drive on a dedicated memory bank with backup battery and EMP shielding... okay, that's way expensive
~D
PS - I understand that I've not answered your question but I should've simplified this for everyone else out there.
XServe RAID not fast enough? (Score:5, Informative)
Speaking as the owner of two XServe RAID devices (5TB and 7TB models) as well as several other Fibre Channel devices I can say that the Apple Fibre Channel is by no means slow. Each SATA drive has pretty much equal performance to the SCSI drives we use in our Dell head node. Combined together there are times where we can pull several hundred megs a second off the XRAID's. Plus our XRAID has been fairly immune to failures thusfar. I have yanked drives out of it and it just keeps right on going.
Another little hint, if you are really worried about speed you can just install large high RPM sata drives yourself. Its not that hard to do at all.
Check out Alien Raid [alienraid.org] for more information.
Re:XServe RAID not fast enough? (Score:3, Informative)
SCSI Drives
1000G (rounded off 1TB) / 36G = 28 drives * $200.00 = $5600.00 USD (excluding shipping, RAID towers, etc). This is presuming a STRIPE RAID 0... if you're doing this via RAID 5 or 0+1 all I can say is "YE GODS!!!"
** YOu will also have to price out extra controllers (you will need 3 ribbon cables, plus terminators, etc to install all 28 drives, 15 devices per chain plus terminators. Don't forget the massive power requirements.
SATA Drives
1000G (rounded off 1TB) / 400G = 3 drives * $200.00 = $600.00 USD plus 3 SATA cables and ONE host adapter. If you make it a RAID 5 or 0+1 you will need at least 5 drives to make it solid, but you can see that if you were to reach 7 TB, the drive amount would shoot up significantly, requiring several RAID towers for the SCSI setup.
Pricewatch currently has the SATA 400GB setups at $205.00 for the lowest.
By comparison the cheapest 36G SCSI drives are Hitachi Ultrastars at 138.00 per drive at Zip Zoom Fly. Next one is 236.00 USD and 556.00 respectively. Don't forget cables and cooling since those high RPM ultrastars really heat up (I owned several deskstars and the latest ones died of heatstroke).
~D
Re:speedwise (Score:3, Informative)
When I worked at an FC startup, FC was 4Gbps full duplex, and 10Gbps was in the works.
All FC RAID is going to be high-availibility (Score:5, Informative)
1) High numbers of transactions per second. Your focus here is going to be on units that can hold a LOT of drives (not necessarily of high capacity). You want as many sets of drive heads as possible going here. In addition, SATA drives are not made to handle high duty-cycles of high transaction rates. The voice coils have insufficient heat dispersion. (They are just plain cheaper drives.) High transaction rates require a pretty expensive controller, and you won't be able to avoid redundancy, but that isn't a problem, since you are going to need both controllers to support the IOPS. (I/O's per second.)
2) High raw bandwidth. If you need raw bandwidth and your data load is non-cacheable, then really software RAID + a JBOD may be able to get the job done here, if you have a multi-CPU box so one CPU can do the RAID-ing. Again, two controllers are usually going to provide you with the best bandwidth. SATA striped across a sufficient number of drives can give you fairly decent I/O, but not as good as FC.
There are "low end" arrays available that will offer reduced redundancy. The IBM DS 400 is an example. This box is OEM'd from Adaptec, and pretty much uses a ServRAID adapter as the "guts" of the box. This unit uses FC on the host side, and SCSI drives on the disk side. It is available in single and dual controller models. (Obligatory note: I do work for IBM, but I am not a sales drone.) This setup has the distinct advantage of being fully supportable by your vendor. A homegrown box will not have that important advantage.
Don't be scared away by the list price, as nobody ever pays it. Contact your local IBM branch office (or IBM business partner/reseller), and they can hook you up.
This unit is also available as part of an "IBM SAN Starter Kit" which gives you a couple of bargain-barrel FC switches, four HBA's, and one of these arrays w/
SirWired
Advice from a SAN lab manager (Score:5, Informative)
0. Get everything off of ebay.
1. Stick with 1GB speed equipment. It's older, but an order of magnitude less expensive.
2. Avoid optical connections if you can -- for a small configuration, copper is just fine and often a lot less expensive. Fibre is good for long distance hauls and >1gb speed.
3. Pick up a server board with 64bit pci slots, preferably at 66mhz.
4. Buy a couple of qlogic 2200/66's. These are solid cards, and are trivial to flash to sun's native leadville fcode if you desire to use a sparc system and the native fibre tools. They also work well on linux/x86 and linux/sparc64. These should run about $25 each.
5. Don't buy a raid enclosure. Get a fibre jbod. You can always reconstruct a software raid set if your host explodes if you write down the information. If you blow a raid controller, you're screwed. Besides, you won't want to pay for a good hardware raid solution, and I have yet to see a card-based raid; they're always integrated into the array. I recommend a Clariion DAE/R. Make sure you get one with the sleds. These have db-9 copper connections and empty should run about $200. Buy 2 or 4 of these, depending on how many hbas you have. They'll often come with some old barracuda 9's. Trash those; they're pretty worthless.
6. Fill the enclosures with seagate fc disks. If you're not after maximum size, the 9gb and 18gb cheetahs are cheap, usually like $10 a pop on ebay and are often offered in large lots. They are so inexpensive it's hard to pass them up. Try to get ST3xxxFC series, but do NOT buy ST3xxxFCV. The V indicates a larger cache, but also a different block format for some EMC controllers. They are a bitch to get normal disk firmware on.
7. Run a link from each enclosure to each hba. Say you have 2 enclosures with 10 disks each. Simple enough; and 1gb second up and down on each link.
8. Use linux software raid to make a bigass stripe across all the disks in one enclosure, repeat on the second enclosure, and make a raid10 out of the two. Tuning the stripe size will depend on the application; 32k is a good starting point.
With that setup, you should pretty much max out the bandwidth of a single 1gb link on each enclosure, and enjoy both performance and reduncancy with the software raid, and not have to worry about any raid controllers crapping out on you.
You should be able to get two enclosures, 20 disks, a couple of copper interconnects and some older hbas for about $750 to $1,000 depending on ebay and shipping costs.
This should net you some pretty damn reasonable disk performance for random access type io. This is NOT the right approach if you're lookign for large amounts of storage. You'll get the raid10 redundancy in my example, but if you want real redundancy (and i mean performance-wise, not just availability -- you can drive a fucking truck over a symmetrix, then saw it in half, and you probably won't notice a blip in the PERFORMANCE of the array--something fidelity and whatnot tends to like) you have to pay big money for it. The huge arrays are more about not ever quivvering in their performance no matter what fails.
Hope this was of some use.
Re:why "build" your own array? (Score:5, Informative)
The Veritas cluster file system (which is the reason I'd imagine someone would go through all the effort) has the ability for multiple systems to access a single volume at the same time, the moral equilivant of NFS, but without the NFS server or the speed problems associated with NFS due to the filesystem abstraction (ie it's good for databases).
The only Free competitor that I know of for this is GFS.
ZFS is a very powerful filesystem/volume manager, but it's more akin to LVM + very smart filesystem access.
Re:why "build" your own array? (Score:1, Informative)
40MB/s seems awfully close to a saturated SCSI bus of that era. E.g., Sun Ultra workstations have 40MB/s SCSI busses in them. Probably the way to boost performance is to have multiple arrays, each using a dedicated controller in the workstation/server (ensure controllers are on separate PCI busses, too) and the mirror set up across the controllers (gee, three controllers, three arrays, say five disks per array...I'll let you cover the power bill this time
Re:XServe RAID not fast enough? (Score:3, Informative)
Just to point out...the XServe RAID uses Ultra ATA drives, *NOT* SATA drives. I spent the last month or two researching RAID arrays, and that was one of the most disappointing things I saw...
Xserve RAID features a breakthrough Apple-designed architecture that combines affordable, high-capacity Ultra ATA drive technology with an industry standard 2Gb Fibre Channel interface for reliable... from Apple [apple.com].
I run a SAN... (Score:5, Informative)
For a single server, a FC system seems like overkill to me. Buy a direct attached SCSI enclosure and be done with it.
For 10 or more servers, sharing disk space, a SAN (FC IMHO, although iSCSI is acceptible if your servers all share the same security requirements - I.E. are all on the same port of your firewall) is the way to go.
Here's what I see the benefits of a FC SAN as (if you don't need these benefits, you'll waste your money on the SAN if you buy it):
1) High availability
2) Good fault monitoring capability (me and my vendor both get paged if anything goes down, even as simple as a disk reporting errors)
3) Good reporting capability. I can tell you how many transactions a second I process on which spindles, find sources of contention, know my peak disk activity times, etc.
4) Typically good support by the vendor (when one of the two redundant storage processors simply *rebooted* unexepectedly, rather than my vendor saying, "Ah, that's a fluke, we're not going to do anything about it unless it comes back in again", they had a new storage processor to me within one hour)
5) Can be connected to a large number of servers
6) Good ones have good security systems (so I can allow servers 1 & 2 to access virtual disk 1, server 3 to access virtual disk 2, with no server seeing other servers' disks)
7) Ease of adding disks. I can easily add 10 times the current capacity to the array with no downtime.
8) LAN-free backups. You can block-copy data between the SAN and tape unit without ever touching the network.
9) Multi-site support. You can run fiber channel a very long way, between buildings, sites, etc.
10) Ability to snapshot and copy data. I can copy data from one storage system to antoher connected over the same FC fabric with a few mouse clicks. I can instantly take a snapshot of the data (for instance, prior to installing a Windows service pack or when forensic analysis is required) without the hosts even knowing I did it.
Note that "large amounts of space" and "speed" aren't in the 10 things I thought of above. Really, that's secondary for most of my apps, even large databases, as in real use I'm not running into speed issues (nor would I on direct attached disks, I suspect). It's about a whole lot more than speed and space.
Re:XServe RAID not fast enough? (Score:3, Informative)
I would so love to prove you wrong right now. Turns out you are completly correct. both of our XServe RAID devices use 7200RPM Ultra IDE Hitachi drives. This invalidates at least some of our benchmarking as it was done single drive on our XServe systems (what where supposed to be like drives but are serial ATA drives.) all of our single drive benchmarks are invalid then (or rather, are meaningless to this discussion =) However, the XServe RAID still performed very well when doing a 5 disk vs 5 disk RAID setup.
What really sucks is that we just found out that our nifty Cisco switch that was purchased before I got here has a severe bandwidth restriction making it nearly useless for MPI communication on our clusters.. 6GB/s between 48 port gig switches. who thought that was a good idea?
Re:why "build" your own array? (Score:3, Informative)