How Well Does Windows Cluster? 665
cascadefx asks: "I work for a mid-sized mid-western university. One of our departments has started up a small Beowulf cluster research project that he hopes to grow over time. At the moment, the thing is incredibly weak... but it is running on old hardware and is basically used for dog and pony shows to get more funding and hopefully donations of higher-end systems. It runs Linux and works, it is just not anything to write home about. Here's the problem: my understanding is that an MS rep asked what it would take to get them to switch to a Microsoft cluster. Is this possible? Are there MS clusters that do what Beowulf clusters are capable of? I thought MS clusters were for load balancing, not computation... which is the hoped-for goal of this project. Can the Slashdot crowd offer some advice? If there are MS clusters, comparisons of the capabilities would be welcome." One has to only go as far as Microsoft's site to see its current attempt at clustering, but what is the real story. Have any of you had a chance to pit a Linux Beowulf cluster against one from Microsoft? How did they compare?
Fault tolerance (Score:2, Insightful)
Anyhow, imagine how much you're paying in software licensing for a large cluster? For a univeristy project, this just doesn't seem to make sense.
money, for one thing (Score:3, Insightful)
So unless they're willing to give you their OS for free, why would you even consider it? Suddenly your supercomputer cluster would cost like a real supercomputer... then you could have just bought a real supercomputer!
Re:not a good place to ask (Score:2, Insightful)
hardware currently running the Linux cluster.
Compare results.
"It runs Linux and works" - 'nuff said? (Score:5, Insightful)
Make him convince you that the time and cost of the switch is going to gain you something.
Does your current setup not do what you need it to do?
Re:department title said it all... (Score:2, Insightful)
what?? (Score:5, Insightful)
first the rep needs to prove that $199.00 per node for software fees has to provide major benifits over the Linux cluster. How many windows clusters can he list for you to call and ask about it? refrences, ones you can call and talk to the guys running/maintaining it. Show where microsoft provided increased profits or savings over an open alternative.
If they cant give you a dollar amount that shows increased profits or major savings then be sure to tell the rep that he shouldn't let the door hit him in the ass on the way out. It isnt MS versus Open anymore in today's economy.. it's what can get it done and save me money or can give me more profits... and this is what makes Open solutions win... microsoft can't give savings and the performance difference isnt enough to give profits that will more than overcome the added expense of Microsoft.
Get real numbers, talk to real people running real clusters on all platforms. if you have real numbers then you can make solid decisions.
just think a minute... (Score:2, Insightful)
Re:Licensing (Score:2, Insightful)
Let's just leave BSODs out of it. Maybe an issue but not always. Some people can get BSODs down to near nil and others can't but it is always the OS's fault. Hmmmm.
No command line (Score:4, Insightful)
We're primarily using the Beowulf for computations which are "embarassingly parallel" - in other words, tasks for which it is trivial to partition the input into 16 equal-sized pieces, give one to each node, and then collect the results and paste them together. For example, multiplying incredibly huge matrices and brute-force keyspace searching are embarassingly parallel.
For us, the primary advantage of running Linux on the Beowulf is that most of the time we don't need to write custom software to speed up a calculation. We just write a shell script that rsh's into each box and runs a program with slightly different command-line parameters on each one.
Obviously for some computational problems it's worth using MPI to have the processes communicate with each other, or load-balancing software so that we can run lots of smaller, but different-sized jobs, and these techniques would probably work equally well whether you're running Windows or Linux.
But for experimentation and prototyping, and quickly distributing easy problems, I think there's an incredible advantage to having a command line. (Of course you could install Cygwin on all of the Windows boxen...but why?)
mod up the parent (Score:3, Insightful)
There are plenty of resources on the net that provide specific details about building clusters and how to optimize the performance. don't forget applications need to be re-written to make them friendly to distribute/parallel processing.
Asking the wrong questions.... (Score:3, Insightful)
The question here that isn't being asked is about the application. Sure, you have a cluster. But just what is it doing? What numbers are you crunching with that many gigaflops? To take the beowulf idea out of the realm of geek bragging rights into actual useful production takes an application, and you can bet that most are customer designed in house.
Very little of the OS itself is involved in the real applications that make beowulfs useful and money-making. Take a look at your intended application, and see what its requirements are. If you are writing it in house, tell the MS rep to take a leap, since you wont have to worry about 100+ MS licenses, Visual Studio licenses, or whatever else. If your intended application requires MS OS underneath, hold out on the rep until he agrees to a dramatically reduced price on the software. But worrying about the OS in a cluster before looking at the application is counter productive.
Re:"It runs Linux and works" - 'nuff said? (Score:2, Insightful)
Ok, you have a solution in place. It works. Some sales guy wants you to change your solution that works.
Because of this, Microsoft may never conquer the existing commercial market for clustered computing. They are using an educational backdrop to essentially get free testing for a cluster of their machines so that if it looks good, they can sell it to new clients who want to get into this sort of thing before those clients go to a linux based solution
Ask for modifiable code and no injurous NDAs (Score:5, Insightful)
Simple... ask for :
Balancing versus Distributing. (Score:4, Insightful)
If the server holds the data and you have a potential of a lot of clients doing requests (thus I/O, Bandwidth, like a P2P crunching system to name a popular example) In that example, I don't see why you'd want to switch to microsoft if you got it to work on linux, you'll need to have a very good knowledge (or hire someone with) of Microsoft Server products if you want to move to anything more than a standalone server. Also last time I checked with M$ for that solution because I wanted a safer domain and maximum uptime, everything was doubled for 2 machines, I thought it would be a bit cheaper than that but heck, for the price of the Advanced server VS the standalone, with 25 users, you can get an extra tape drive and cheap RAID1 to mirror your critical drives (on a small buisness server)
So if you mention that you WISH you'll get donations, and you want raw computing power, instead of buying MS licenses, concentrate on the goal you try to acheive: distributed crunching power with scalable servers, so basically you'll need HARDWARE to crunch. (I still don't get why you'd NEED server to run number crunching, workstations can do the same and transmit to a server, like I was stating before). Check what you have, check what you need, design around that, do a cost analysis since it seems to be very critical in your case.
There are some cases where you'll want MS servers, here at work I've setted up a MS server to have less configuration and troubleshooting issues with my win2k Pro machines (at least I know when something screws up it's MS related for sure
Clustering (dis)information (Score:2, Insightful)
Here are your issues. Most computational science codes are written for unix based systems. Cluster codes are either latency sensitive (so they use Myrinet/other) with protocol stack bypasses, bandwidth sensitive (need hundreds of MB/s to do their work, or dataflow sensitive (need packets arriving in the right order in a just in time mode). Most computing centers have hundreds or thousands of users, who need simultaneous access to the resources.
The pragmatic view is that your cluster needs to support many users simultaneously (windows cannot), manage large data flows correctly over high bandwidth low latency pipes (windows cannot), and do so in a manner whereby your costs (porting code, end user costs) do not rise as the number of users increase.
In short, unless MSFT donates lots and lots of new hardware (256 nodes or more of late model AMD/Pentium gear, with 2 GB ram, and 50->100 GB disk per unit, gigE adaptors, etc), you very likely could not effectively run windows in the first place, and your costs to run would skyrocket without some serious software license donation by MSFT. Not to mention the cost of the programmers you would need to port the Unix codes to the windows compilers. Not to mention the additional support headcount you would need to maintain the beast.
In short, MSFT based computing clusters are simply not viable. This is from a cost wise basis, a time wise basis, a headcount basis, and so forth. If MSFT is willing to help you offset all the costs you would incur, great, go for it. Otherwise, have a good long look at some of the cluster linux distros, and stay far away from MSFT products.
Disclosure: I work for a cluster vendor. We will sell what the customers want. Customers do not want MSFT clusters. If they did, our business would be brisk. Never had one inquiry.
What's your app? (Score:5, Insightful)
Beowulf clusters get built to support your application, not the other way around. Your choice of hardware and OS will depend on the parallel nature of your code. Do you need myrinet, or can you get away with fast ethernet? Will your code even compile under win32? Do the supporting libraries (PVM/MPI/BLAS whatever) run under win32? What about the queuing system?
How are you going to manage the cluster? You need automation, even for small clusters. How easy is it to add a new user, apply a patch or change a bios setting on your cluster without having to plug a keyboard and monitor into each node? What about central logging? How about automated OS installs when you add another 100 nodes when you get your funding?
Oh. Benchmark, benchmark, benchmark. That means your code, running your datasets, on your hardware and OS. Not vendor supplied numbers. If you have a serious hardware vendor, you should be able to wrangle demo mechines off them. Try before you buy.
Re:Here's the deal: (Score:3, Insightful)
In case you haven't noticed, the 'Asl Slashdot' sections are for answering the original submitter's question, but they also provide a wealth of information to other readers. My post was intended to be informative, but then again YMMV.
Re:Beowulf (Score:3, Insightful)
If I were going to run a cluster that needed to take advantage of computational power I would go Linux. However my choice would be baised off of the fact that up until this point I still have not seen enough documented proof to support the theory that Microsoft vs. Linux cluster is even a battle. From my current knowledge I would have to deduce that they currently have their different uses even though the linked article above says that Microsoft Clusters are capabable of computational colaboration. Again as many have already stated, cost is always a factor when dealing with Microsoft and you have to take it into consideration.
I really will need to study the articles more closely in the link above. Many thanks for publishing it, this is the first thing I have read to support Microsofts capablity of computational colaboration within a cluster environment.
Remember my little Penguins do not be so quick to judge any OS even Microsoft's. Microsoft may not be cheap, it may be filled with bugs, and it may not always be the most secure. But it does serve its uses in the world, for now.
Re:Licensing (Score:5, Insightful)
> BSODs down to near nil and others can't but it is always the OS's fault. Hmmmm.
I find this intetresting because I have seen it too.
In my experience a well run Windows system by a person with real clue can last a while and be pretty blue screen free. The same is true for a system run by an idiot who got it all installed right and hardly does anything with the box, just plays some specific game or uses Word or something.
However, when you start installing software and doing different things, they gan get real flakey real fast. Not just in reliability either... users shit all over the box!
I saw someone turn on their computer...it came up... and the desktop was just littered with icons... full. They never manage their stuff, they just keep all that crap that every little software package installs.... is it just me or are companies that make Windows software extremely arrogent? id say MAYBE 1% of the software I use is something important enough that I want an icon for it on my desktop made special... but every peice of windows software seems to think its that special.
my little rant... the unmanageability is why I don't use it. I installed debian GNU/linux on this box 2.5 years ago, have installed software and iuninstalled it over and over... and it never gets unstable.
In a cluster, where software isn't being installed and uninstalled, windows will probably be just fine. Tho frankly, id rather a bunch of unix boxen with tools like cfengine to manage such things.
-Steve
Re:Licensing (Score:2, Insightful)
One of the biggest problems with Open Source advocacy is a tendancy to argue irrelevant points, then claim relvancy for an equally irrelevant reason (usually "MS is evil" political kinda thing). This post is a perfect example.
Re:first post - no way (Score:2, Insightful)
Re:Custom (Score:3, Insightful)
when was the last time you used VB for engineering, computationally heavy tasks?
christ - almost all this shit is written in Fortran still... Fortran 77, i believe and not even 90.
try to change the value of 5 in VB, go ahead... i dare ya.
Re:Here's the deal: (Score:5, Insightful)
Except your post is factually incorrect. MSCS is a POS -- to say it works "well" is true if you mean "well... it works.... kinda."
It basically just enables multi-initiator support for SCSI chains (so a chain can be connected to 2+ hosts), allows more memory for large applications (if the application is written correctly to use it) and (this is the main feature) allows services to fail-over from one host to the other.
This is where it MSCS should be good, but it just isn't. Basically imagine you have 2 NT servers. A is running Services, and B isn't running any Services except the basics. Do a NET STOP on all the services on A, wait for it to completely finish, and then, and only then, do a NET START on those same services on B. Visualize how long in your mind that would take, and then double it. If anything goes wrong, like a service won't stop (imagine that) or a service can't start due to a dependancy, it throws a monkey into the whole works.
Also, the clusters disks can only be used by one node at a time, and while it would have been trivial for Microsoft to expose each disk to both hosts always (by automatically mounting the disk on the "other" node over the network) they just didn't bother.
It's also got alot of setup caveats. Read the entire manual very carefully and take notes before you even purchase hardware. Then go on-line and read all the addendums and known issues. A good understanding of NT is not enough -- MSCS is a different build (compile) of NT than the Workstation/Server version. She is a woman who has serious issues, some of which can't be fixed by you.
And then there's the blue screens. And the 7 hour installation procedure. And the way you are strongly cautioned from deleting or changing some MSCS settings after being set, with loving MS-style advice that a reinstall is your best bet.
However, for just plain applications, it's OK. Anything you can run from the command line proper can be put in the cluster and will fail over. So if your one of the majority of Acrobat Distiller user who installs in a manner that violates the EULA, i.e. on NT polling the "In" folder of a network share, MSCS can fail over Distiller VERY FAST (it's not a service, so no delays). However, with a little brains and a little ActiveState Perl (or cygwin I suppose) you could hack together a work-a-like using DFS + rsync and save a lot of money.
Kudos to your post for not trying to engender a flame war. But you kinda imply that MSCS is worth the exorbitant price tag, and it just isn't for what little it does and the problems and extra headache it brings with it. I'm not flaming you, just spreading the word:
DON'T BUY MSCS -- IT SUCKS. IF THEY GIVE IT TO YOU FOR FREE, SEND IT BACK OR GIVE IT TO SOMEONE YOU DISLIKE.
Back on topic, what MS may try and sell you is something based on the Microsoft Message Queue and the Microsoft Transaction Server. Those are more BackOffice-variety PHB-entitled products that really don't do much except provide an API for sending guaranteed IPC and doing transactions, even for VB monkeys who don't really understand what that means but think it sounds just plain awesome. Free with the option pack.
This is part of that Microsoft program to divert "wins" from Linux to Microsoft at all cost, especially from IBM. So the sales rep probably doesn't have a clue what your cluster really does, what you want it for, or what MS products it would actually take to build a knockoff. They may have a anti-beowulf team cooking something up right now, and guess what pal?! They're hoping your administration will take the bait of free hardware and licenses, and you'll end up beta-testing a 0.1a version of some bizarro-beowulf for MS. What a deal!!!
Good luck. I'd stick to you guns and inside on using something already proven to work for your goals, like Beowulf or AppleSeed.
Clusters of Macintosh machines (Score:1, Insightful)
http://www.wired.com/news/mac/0,2125,50078,00.h
Re:You're running on old hardware right? (Score:3, Insightful)
IMHO that is a good tradeoff. Running X on a PC with a decent amount of memory and processing power (basically 64MB+, 200Mhz+) is not going to put any significant load on the machine. Similarly, the average windows machine can easily handle both the GUI and server processes. If you are experiencing performance problems with your server processes because of the GUI overhead any responsible sysadmin would upgrade the hardware because getting that close to the performance limit of your hardware is bound to cause you trouble anyway (a minor increase in server load would be enough).
Don't get me wrong, I love linux and have used it on old hardware and found it served my needs perfectly. However, you really need to know your stuff to get it up and running. When it comes to configuring things windows is easy when it can be and just as hard as unix when it needs to be. Basically, for simple server stuff you can get IIS up and running relatively easy. The default setup for apache on the other hand is pretty useable out of the box but as soon as you need to tweak it even slightly you are on your own. For professionals it doesn't matter, they have the time and need to get familiar with whatever they configure. Basically this type of sysadmin is knowledgeable and expensive. You are unlikely to find one in small organizations. Instead you will find loads of inexperienced script kiddies who terrorize their users with major fuckups. If I sound frustrated its because our local sysadmin (linux) just screwed up our mailserver (suse box and some ancient solaris machine) and I'm expecting some important mails. It's not the first time and I'm afraid there's more downtime ahead.
For the casual admin who just needs to get an unfamiliar service up and running with no fuss the windows way of doing things is simply easier. The overhead of a GUI is irrelevant in any business case you can come up with (business cases also include licensing, sysadmin salaries, hw cost, training cost, etc.).
Re:Point? (Score:2, Insightful)
You really don't know pain until you have had to troubleshoot a block of 40+ junior developers that had access to a 140 million hit a month webfarm running IIS 5.0. It is just not as easy as point and click, and trust me I really wish it was.
Re:Licensing (Score:5, Insightful)
Every man-hour spent reading, auditing, and managing licenses is a man-hour that is not applied to real work (he says, posting to /. from his desk at work ;-). Every hour the compute nodes sit idle while licensing is sorted out is a 4.17% performance hit for that day.
All those licenses cost money, which means fewer CPUs. If a compute node costs $400, and licensing is $100/node, you can afford 25% fewer nodes. This is indistinguishable from a free OS that has a 25% performance flaw.
Then there's risk. The software mafia aren't going to audit a Linux cluster, sapping administrative time, and perhaps cease-and-desisting it offline. Linux cluster admins are never going to go to jail because they threw another machine online for the hell of it. Linus Torvalds will never sue a Linux cluster operator into oblivion to make an example of them. These are all possibilities with a proprietary product, and all-too-likely with a notorious lawyer-pit like Microsoft.
Why is a hammer called `an American screwdriver?' (Score:3, Insightful)
Actually, it does loosen the screw. The rellies on the farm use a hammer quite effectively as a screwdriver (both ways) and spanner where appropriate. It just doesn't do the screw in question much good...
I've used a hammer myself to gut a dead hard drive for the magnets, when I didn't have a small enough star driver. I just flattened the top of the bolt out to tinfoil thickness and pulled it straight through the metal cover. The technique with screws is different, some light taps can loosen them in their substrate (typically wood or sheetmetal) enough to winkle out by hand. Using somer CRC/WD40 often helps as well.
GUI overhead - The Functionality Bloat (Score:3, Insightful)
Remember that one of Microsofts contentions in the anti-trust trial is that they cannot unbundle Internet Explorer from Windows, that the system is so interdependent that no elements can be left out and still function.
So they cannot compete on price, since all other things being equal a Windows machine must have a video graphics card.
They cannot compete on performance, since all other things benig equal a Windows machine must spend resources on storing and running the GUI.
Yesterday, I was showing a very happy WindowsXP owner (who also happens to be a somewhat savvy computer consultant with Unix and Linux experience) the beauty of Debian's apt and dselect packages. He was so happy with the granularity of not installing anything that he doesn't want, that I gave him my Debian 2.2r4 CD. (I'm running Woody anyway)
Bob-