Open Source Highly Available Storage Solutions? 46
Gunfighter asks: "I run a small data center for one of my customers, but they're constantly filling up different hard drives on different servers and then shuffling the data back and forth. At their current level of business, they can't afford to invest in a Storage Area Network of any sort, so they want to spread the load of their data storage needs across their existing servers, like Google does. The only software packages I've found that do this seamlessly are Lustre and NFS. The problem with Lustre is that it has a single metadata server unless you configure fail-over, and NFS isn't redundant at all and can be a nightmare to manage. The only thing I've found that even comes close is Starfish. While it looks promising, I'm wondering if anyone else has found a reliable solution that is as easy to set up and manage? Eventually, they would like to be able to scale from their current storage usage levels (~2TB) to several hundred terabytes once the operation goes into full production."
Entry level SAN? (Score:5, Insightful)
I wrongly implemented HA system will have less uptime than a 499US$ Dell with a single ATA drive.
Entry level SANs using iSCSI are available at quite affordable prices. Look at HPs and IBMs (e.G. the DS300). Even the entry models allow you to use MPIO.
Re: (Score:3, Interesting)
Re: (Score:1)
Re: (Score:2)
Re: (Score:3, Insightful)
They need to accept that this ain't going to happen, and what they need to do is put in a solution for now and plan for a different solution when they go into production and presumably have the money.
However one has to wonder if there current storage requirements are a messily 2TB why the heck do they need more than one server, unless it is a second for failover.
Re: (Score:1, Offtopic)
Was I the only one to catch this disturbing freudian slip? Is this guy dangerous around computers or what?
Re:Entry level SAN? (Score:4, Informative)
Full Disclosure: I'm one of the author's of the Starfish Filesystem.
Simply not true anymore, lukas84. High-availability solutions don't have to cost "big money". Starfish is the perfect example of such a system. In fact, it is THE reason we wrote Starfish: To provide an in-expensive, fault-tolerant, highly available clustered storage platform that works from the smallest website to the largest storage network. We've based the technology on the assumption that having expensive hardware/software is the wrong way to go about solving the problem.
Full HA environments do not need to be incredibly complex. If your HA solution is incredibly complex, you've done something wrong. Take a look at how easy it is to set up a Starfish file system:
Starfish QuickStart Tutorial [digitalbazaar.com]
That solution doesn't cost "big money", nor is it "incredibly complex".
Re: (Score:3, Interesting)
I worked at a place where a $400 million project that spent tons of money on high availability database and server components was crippled by bad switches and application servers.
Re: (Score:2, Funny)
network, app servers, etc aren't highly available, you have a whole new range of equipment and services that needs an HA solution as well
I couldn't agree with you more. I focused on the storage aspect because the article, thread, and Starfish is about HA storage.
I worked at a place where a $400 million project that spent tons of money on high availability database and server components was crippled by bad switches and application servers.
I'm sorry to hear that. What an embarrassingly colossal waste of money. I'm assuming that was US tax payer dollars at work?
Zfs on Solaris 10, OpenSolaris, FreeBSD... (Score:2)
Re: (Score:2)
That was extremely funny... even if I don't know if I took it the way you ment it.
On a normal hardware you can (Score:4, Informative)
On usage nodes you can use ocfs2 or RedHat's gfs for accessing those iSCSI devices.
You should also use meaningful fencing/locking methods. (read manuals for ocfs or gfs for details).
Re: (Score:2)
Our 'organization' has a similar situation - they will have about 200gb of data coming in per day, yet only have 2.5 TB of data storage (Raid 5). It's hillarious when the engineers tell management that the solution isn't workable- they don't listen, and they 'dismiss' anyone that 'can't provide productive feedback'.
At least there's an answer out there, I'll read up on this and see if there isn't a way around it.
Re: (Score:2)
One of their other servers started flaking out after it was dropped 2 foot to the ground by someone trying access the rear of the cabinet to plug an extention cord into the outlet they saw. (don't ask me). The solution was to temporarily use the document storage serve
Re: (Score:2, Insightful)
Basically you setup two similar systems (well they don't have to be, but it helps), they get a direct connection between the two, as well as the normal network connection
Re: (Score:2)
Then configure failover ... (Score:3, Informative)
OpenFiler and Apple's XServer RAID (Score:3, Informative)
Apple's XServer RAID and OpenFiler (openfiler.org) The XServer RAID is basically a LSI Logic Engenio RAID at a very cheap price and you can't beat OpenFiler for free. The XServer RAID at 10.5TB costs about $1.31 a GB.
I know several people who backup their NetApps to this setup or just use it for storage where they don't require what NetApp offers and don't want to spend $25k+.
Rethink your drink (Score:5, Insightful)
Then you can start looking at solutions. 'Optimal File System' can mean many things to many people, and everyone here is going to have a different viewpoint. You need to decide what features of a file system makes it optimal for you. Then you can go looking for a solution.
GlusterFS (Score:5, Informative)
It's design is simple is smart. Every feature is a translator that interconnects to other translators. So, you may organize your filesystem they way *you* want it.
Let-me give-you an example: they have 2 translators: 'unify' to unifying harddrives as one and 'afr' for automaticly file replication. Depeding on the order you use it you have two completly different setups. You can have two cluters replicating eachother or you can have a cluster of replicating servers pair.
Beside it's features and design, it's development team is *very* friendly. Yesterday someone (user) asked for a feature in the devel list, a get answered saying: good ideia, i'll do it.
Very good software.
Take a look: http://www.gluster.org/glusterfs.php [gluster.org]
Just buy bigger servers (Score:4, Informative)
But with 2 TB currently and scaling to perhaps a few hundred TB in the future, the obvious simple solution is to just buy bigger servers. With modern gear you can really connect a frightening amount of storage to a single server at modest cost. Say a rackmount box with space for 12 drives, then SAS card(s) with external connector(s), so you can chain together multiple enclosures. Taking Dell as an example (just what I quickly found with google (http://www.dell.com/downloads/global/power/ps3q0
When needs grow beyond one server, clever use of automount maps lets you manage the namespace for multiple servers easier than doing it all by hand.
As for Lustre, it's really a specialized solution for HPC, made for multiple compute nodes striping to the storage nodes at full speed using a collective IO API like MPI-IO.
Re: (Score:2, Informative)
Full Disclosure: I'm one of the author's of the Starfish Filesystem.
As others have mentioned, HA solutions are complicated and expensive. Unless you really need it, you probably don't want to go down that route.
High-availability solutions don't have to be complicated and expensive. Starfish is the perfect example of such a simple and low-cost system. In fact, it is THE reason we wrote Starfish: To provide an in-expensive, fault-tolerant, highly available clustered storage platform that works from the smallest website to the largest storage network. We've based the technology on the assumption that having expensive hardware/software is the wrong
Re: (Score:2)
High-availability solutions don't have to be complicated and expensive. Starfish is the perfect example of such a simple and low-cost system. In fact, it is THE reason we wrote Starfish: To provide an in-expensive, fault-tolerant, highly available clustered storage platform that works from the smallest website to the largest storage network. We've based the technology on the assumption that having expensive hardware/software is the wrong way to go about solving the problem.
Oh, I absolutely agree. But I wasn
Re: (Score:2, Informative)
With all of these redundant, even a single server can be quite reliable.
Hmmm... you seem to be concerned with a completely different class of problem than the one Starfish addresses. HA systems assume that your single server will fail eventually (which it will). There many single points of failure in the scenario you describe (ram, motherboard, glitch in the redundant power supply). What happens when you need to take the machine down for maintenance? What happens when the power strip or the UPS you have the
Try out MogileFS (Score:4, Informative)
We've been using MogileFS [danga.com] on commodity Linux servers for a few months now and it's been working great. The MogileFS community/mailing list is very active, so it's actually been fun to implement.
Right now we have 22.8 TB spread across six 2U servers using a mix of 400 and 500 GB SATA drives. The great thing is that we can lose an entire file server (or two) with no downtime or loss of data.
Another reason to like MogileFS is that it removes the need to maintain RAID arrays. A RAID-5 array made of 750 GB disks is very risky. A high-end controller will still take many hours to rebuild a degraded array, during which time you could lose another disk and be largely screwed. (This actually happened to us very early on and we lost 0.02% of our data after restoring from backup, which still hurt.)
Re: (Score:1)
Full Disclosure: I'm one of the authors of the Starfish file system.
We've played around with MogileFS. It does a very good job at archiving files. It is write-once, which is good for certain very specific applications. Unfortunately, it did not solve our problem. We needed a POSIX-compliant file system that looked like just another disk to Linux, but was inexpensive, simple to set up, fault-tolerant, and performed automatic data backup.
Starfish and Lustre are really for people that just want the file sy
Take another look at NFS (Score:4, Informative)
I would not suggest cluster file systems such as Lustre for a small installation; they're generally designed to scale up to hundreds or thousands of servers, but not to scale down to a handful.
Re: (Score:1, Interesting)
I would not suggest cluster file systems such as Lustre for a small installation; they're generally designed to scale up to hundreds or thousands of servers, but not to scale down to a handful.
Our first Lustre cluster was 3 servers - it worked just fine. Starfish effortlessly scales down to 2 servers. Here is an example of it doing so:
Starfish Quickstart Tutorial [digitalbazaar.com]
Just because something scales to thousands of active nodes and disks, doesn't mean it can't scale down gracefully. The Internet is a good example of this concept.
then buy 2 fileservers (Score:2)
Alternatively, buy more drives and put them in 1 server with a good raid card. Even cheaper.
If they want true mutiple server redundancy, then you just need 2 of everything, and rsync them every so often, or make backups of the first onto the second.
OpenSolaris? (Score:2)
I don't think your question has enough detail (Score:5, Informative)
"High Availability" can mean a lot of things. The most important part of it, though, is "how highly available do you need?". Do you want to survive the loss of a server? Of a room? An office? A city?
Basically, you've got two options.
1. Homebuilt, possibly based around either Solaris (ZFS looks interesting) or a specialised Linux distribution. OpenFiler [openfiler.com] looks interesting but doesn't appear to get a lot of attention, so community support may be lacking. Unless you've already got the hardware, however, you'll need at least two reasonably large servers.
Depending on how crucial all this is to your employer (I'm assuming it's fairly crucial or you wouldn't be looking at HA systems in the first place), the level of support you have available to fall back on with this may or may not be acceptable.
In any case, if you're going to have to spend the amount of money involved in buying two large servers and paying for support on a linux distro anyway, you may as well look at option 2.
2. An entry-level SAN.
Yes, I know you said you can't afford it. But I don't think the problem you're discussing can be easily tackled for zero-cost, and if there's cost involved you'd be in remiss of your duties to not cover every possible base.
I was faced with the same problem myself a few months ago. Eventually I concluded that there simply wasn't the business justification for highly-available storage - we could make do with servers with redundant power supplies and disks, and regular backups. However, I was surprised to find that an entry-level SAN from Dell (actually rebranded EMC units) isn't that much dearer than "buy two dirty great servers and run OpenFiler", and has the benefit that if you do need support, you don't run the risk of hardware and software support folks pointing the finger at each other, saying "it's not our problem, it's theirs".
Plus any half-decent SAN vendor will provide a clear upgrade path - if you roll your own, you'll have to figure out how you upgrade on your own when the time comes.
Finally, think of it like this.
Any business which relies on its backend systems to be solid and reliable should take any reasonable suggestion to maintain that reliability seriously. And by definition, this implies that storage must be reliable.
If it's that important to the business that your systems continue to operate in the face of extreme adversity, and you decided to save £1000 by taking the homebrew route, you're going to have a lot of justifying to do if the worst happens and your supposedly-HA system falls over. Particularly if your answer to "what are you doing about it?" is "I've posted a message to a forum and I'm awaiting a reply". Realistically the only way it can work is if you're competent enough to be able to fix even the worst outage yourself with little or no recourse to asking on forums (though reading documentation is OK). Even then, you should keep the system simple enough that it doesn't take several months of familiarising yourself with it before anyone else has a chance of fixing it, otherwise all you've done is moved the point of failure from the hardware to yourself.
The alternative answer "I've placed an emergency support call with our suppliers and they should be ringing me back within the hour" carries a heck of a lot more weight.
Free is not necessarily as in free beer (Score:2)
Re: (Score:1, Informative)
You may be just out-of-date - Lustre development hasn't used the "ghostscript" like "old versions are open source" model for ages now.
But if you want HA Lustre, you still need HA-grade and doubl
Re: (Score:1)
If you want commercial software, HP will sell you quality-assured Lustre and decent hardware in HA configurations, relabelled "HP SFS".
Also, in the spirit of full-disclosure, I am a supporting engineer of Polyserve, now owned by HP.
If you're looking for a HA storage solution, have you looked at Cluster Gateway [hp.com]? It's essentially a Polyserve file system [polyserve.com] with the NFS or CIFS solution pack, depending on which platform you're implementing. The software-only costs are relatively low (I've been bitching for a while that they're giving it away,) and you can use commodity servers and storage.
A scalable, clustered file system, that if properly impl
Re: (Score:2, Interesting)
Full Disclosure: I am one of the authors for the Starfish file system.
Software like Lustre and Starfish only wants you to help testing the software.
Both are not OSS in my opinion and not ready for the production.
Lustre is open-source and it has been production ready for years. The open source notice is on their website - GPL. You don't get much more open source than GPL. Lustre provides support to commercial enterprises.
As for Starfish, we eat our own dog food at our company. The newest version of Starfish will be taking over full-time for all of our HA storage systems in one months time. The website that runs on top of it is Bitmunk [bitmunk.com], our
Re: (Score:2)
Starfish has a limit of 1TB for the 'free' solution and is not GPL. Lustre is GPL (the limited free edition only), but cannot reexport over NFS, only SAMBA (not everyones choice). What about backup? Both solutions are not providing any means of
Re: (Score:1)
What about backup? Both solutions are not providing any means of backing up the presumably huge amount of data. As you get into the 50 TB+ regime, how you would ever be able to make a backup? Here is where a HSM kicks in: backups are not necessary anymore.
Starfish was designed to automatically back data up - HSM was designed in from the beginning. You never have to backup a Starfish storage network. Take another look at Starfish - it does exactly what you're asking for:
Starfish Introduction (mentions file mirroring) [digitalbazaar.com]
Re: (Score:2)
Re: (Score:1)
You need to be able to recover from something being deleted (intentionally or otherwise) and often the ability to roll-back in time to older iterations of a document is useful.
While not asked for by the original post, backing up this data is (hopefully) on his to-do list and is probably an entire post all by
md? (Score:2)
AoE (Score:2)
Project scope and downtime costs (Score:1)
So, how much does an hour of downtime cost you? How big is your IT staff a
Coraid & AoE (Score:1)