Building Your First Cluster? 71
An anonymous reader asks: "I'm interested in building a DIY cluster using Linux and will be using conventional Linux software. However, the number of possible ways to do this is huge. Aside from Beowulf, there's Mosix, OpenMosix, Kerrighed, Score, OpenSSI and countless others. Therein lies the problem. There are so many ways of clustering, development seems to be in fits and starts, most won't work on recent Linux kernels and there's no obvious way to mix-and-match. What have other people used? How good are the solutions out there?"
Don't sweat it ... (Score:5, Funny)
Try them all. After all, you just KNOW your first one's going to be a clusterf*ck.
Seriously, if you're going to take that route, you really should be prepared to invest the time in test-driving several different solutions.
Re:Don't sweat it ... (Score:2)
Well, uh... (Score:2)
Re:Well, uh... (Score:2)
I tried to set up 5 node (7 processor) distributed compile farm for a while which let me build gentoo packages with blazing speed. unfortunately, I couldn't get cross compiling to work, nor could I get XCode integration working, so in the end, I had a 400mhz G3 and a 800mhz G4 doing distributed builds with distcc [samba.org] and a 1ghz and a dual
Re:Well, uh... (Score:5, Insightful)
Also, I'd go a render farm (if that's the task) if I had to choose between clustering and SMP, because if one node dies (depending on the managing application) the job just continues, whereas if it's on one single monster machine with no fault tolerance, if the job dies you often have to start rendering again from the beginning. Not fun.
So let's back up and ask:
1. What problem are you trying to solve?
2. If it's a learning experience, try them all, take notes on which suit you best for tasks a, b, and c,
3. What are your priorities
Rocks (Score:5, Informative)
Re:Rocks (Score:5, Informative)
I'm not familar with other solutions, but Rocks is remarkably easy to install.
Re:Rocks (Score:3, Funny)
Thanks... I'll be here all week!
Find the problem before trying to solve it (Score:5, Informative)
Cluster need special software to take advantage of the disturbed computing. They are built with a specific task in mind. Or do you already have a need and just failed to tell us?
For me, I run my network with distcc (http://distcc.samba.org/) So all of my Gentoo boxes can compile using shared computing power. It cut a typical 33Min app down to less then 2 mins doing this. And works wonders for my slower laptop.
With distcc, all you need to do is have the same tool chains. (glibc, gcc, coreutils, etc) You can even specify how many threads per box you want running to fine tune your network.
On the other hand, if you just want to learn, then you should try them all. The all suit different needs.
Re:Find the problem before trying to solve it (Score:3, Funny)
Well, indeed, clusterf*cks might turn your distributed computing project into a disturbed one..
Re:Find the problem before trying to solve it (Score:2)
And don't forget, ccache [samba.org] can work with distcc [samba.org], for an even bigger speedup...
Re:Find the problem before trying to solve it (Score:3, Insightful)
And specifically, is this a processing cluster or a failsafe cluster? I kind of assume a processing cluster, since that's what most people on slashdot refer to as a cluster, but in my experience most of the clusters out there are failsafe clusters ("5 9's" of service versus raw horsepower). Two rather different applications of clu
Re:Find the problem before trying to solve it (Score:1)
You will need to answer "why?", before anyone can really help you with "what?" or "how?".
Re:Find the problem before trying to solve it (Score:2, Informative)
Author did fail to say what the purpose was, but here are some good starts.
Apache cluster [howtoforge.com]
MySQL cluster [howtoforge.com] (should also refer to mysql.com resources)
Ultra monkey [ultramonkey.org], heartbeat and the like [linux-ha.org] can make cluster as well.
Re:Find the problem before trying to solve it (Score:2)
If learning is his goal he could use qemu Xen or VMWare and create a virtual cluster on one really fast Linux box with a good amount of ram. He could also try out different clusters all with out hooking up five or six old boxes.
Just to add to the confusion I would like to mention the could use Plan 9 for his cluster since it is distributed by nature.
I would love to set up an OpenMosix Cluster
Re:Find the problem before trying to solve it (Score:1)
Nicely put; sometimes I feel like my computing is disturbed, too.
Re:Find the problem before trying to solve it (Score:2)
1. Cluster unpatched Windows 2000
2. Install spyware
3. Install SQL Server (unpatched)
There's your disturbed cluster.
Or, for another form of disturbed:
1. Move to Sv^Hweden
2. Start legal torrent site based in Sweden
3. Wait for our govermnent (Bush Administration) to coerce the Swedish government into breaking the law by illegally siezing your servers. That'll be a disturbed cluster!
Xboxen (Score:1, Funny)
LinuxClusters.com book is a good reference (Score:2, Informative)
http://linuxclusters.com/compute_clusters.html [linuxclusters.com]
At least get to know various approaches at a high level before proceeding...
Whatever happened to... (Score:4, Interesting)
Two computers make a 1 dimensional "cube." Four, in a round-robin make a square. Six, properly connected, make a regular cube and so-on. Does anybody out there know if they still connect clusters this way and if not, why?
Re:Whatever happened to... (Score:1)
Plenty, eh?
Re:Whatever happened to... (Score:5, Informative)
Re:Whatever happened to... (Score:3, Informative)
Re:Whatever happened to... (Score:2)
Re:Whatever happened to... (Score:2)
Re:Whatever happened to... (Score:3, Informative)
For instance:
http://krone.physik.unizh.ch/~stadel/zbox/start [unizh.ch]
Re:Whatever happened to... (Score:1)
But you're right, toroids constitute the most popular k-ary n-cube architecture today. Folded 3D toroids (eg. by Avici) are especially sweet (uniform, short interconnects, low diameter, huge path diversity).
Re:Whatever happened to... (Score:2)
Poor choice of word. I mean a bi-directional, linear arrangement of nodes.
It doesn't have to be p2p, I believe. One could implement a hypercube using ethernet switches.
Re:Whatever happened to... (Score:2, Funny)
build a beowulf ... (Score:1)
i'd recommend looking at the ease of building diskless clusters w/ warewulf.
Re:build a beowulf ... (Score:1)
Re:build a beowulf ... (Score:2)
Just underneath the article at the top of the page is a line where you can change the threshold you read the articles at. There is a "change" button to the right of that line, that submits the changed threashold. To the right of that is a "reply" button - that is to post a new subject in the article thread. It took me a while to find it too.
My Experence (Score:4, Informative)
I am in the exact same position. (Score:2, Interesting)
sweating it out.. (Score:5, Interesting)
1. I do beowulf, other clusters arent my thing. 2. I can handle C and C++, but I'm not a guru. 3. I can fumble my way through unix-linux but I get cranky with new versions (command/ flag changes in utilities). 4. I have 6 lazyish years as a unix sysadmin.
getting prepped,
You want to make sure that the boxes that are talking to eachother are very secure from the rest of the world. Most of the concepts on a cluster are about trashing the security of the machines in question. There are ways to make a secureish cluster, but a good firewall is a better way to go. let the firewall talk to your "head node", and preferably locks the "body nodes" from seeing the outside world. There are a ton of ways to get this done. on the cheap have all the body nodes have a non-existant gateway ie 192.168.0.1, set the firewall as 192.168.0.129 (forget dhcp) and let the head node point to the 192.168.0.129,, and have the firewall route services (ssh, ftp, telnet (ok not ftp or telnet)) to the head node.
getting started
1. Load all the boxes with the same OS. (the same way) (DONT select SELINUX or you will cry) 2. build a hosts file (names for machines) /etc/hosts
3. build a hosts.allow , hosts.equiv (still in /etc)
4. add in some entries into securetty for your rlogin rcp rsh..
5. youll probably have kerberos(weakly secured) rlogin rcp rsh... you want to rename those and replace them with the non secure versions, there are other ways, but this saves a bunch of hassle.
6. pop into /etc/pam.d and adjust the rlogin rcp rsh.. (this may not be needed in some cases).
7. add in a + + in the .rhosts file of each cluster user.
after you have pulled your hair out decyphering my glossed over instructions, you should be able to type: rsh node002 and be at the prompt for node002 with no password asked, and no silly kerberose failed: trying /usr/bin/rsh message given.
At this point then you can configure LAM (you may nee to download it and get it installed on your boxes)
basically it needs an arbitrary file Mynodes.txt that will contain the list of nodes you wish to launch. you type in lamboot Mynodes.txt and then it will kick back some silly error 99% of the time because something small was forgotten. you fight through those errors until it finally gives you a sucess.
Now your golden, then its just a matter of figuring out how to compile and run MPI programs with the mpiCC and mpirun. But if you got through the first gloss over then the rest is a snap.
Remember if these machines see the outside world they are naked, defenseless, and totally exploitable..
Be aware that these instructions can cause all sorts of havok and any reasonable person would just hire an expert.
Honestly I hope that this gives you a starting point. You'll still need a bunch of time with google.
GOOD LUCK!!
Storm
oops glossovers.. (Score:2)
set up an NFS share on the head node, it will simplify a whole bunch of data collection. oh, dont overwrite the home directory as your share, it will cause weird issues, make the directory something like "clustershare" it will make sure that you launch the exact same executable on all the boxes.
The LAMRSH variable needs to be set to "rsh" otherwise you get an error message.
________________
Most of this guide is if
Re:sweating it out.. (Score:2)
Why?
Because it doesn't work like Mac OS. Mac OS let you share out an entire volume and drop anything, anywhere, any time. Learning is hard!
Rocks Rocks! (Score:5, Informative)
They're also really great guys.
On the other hand, Oscar is supposed to be good, and if you're not into the whole batch-mode thing, you can get OpenMosix up and running using http://clusterknoppix.sw.be// [clusterknoppix.sw.be] ClusterKnoppix, and just fire jobs off into space and let them find their own unburdened node.
But still, Rocks is really an elegant and clean way to go, plus it will scale up in case you're going to deploy a huge one of these for real after you get your feet wet.
Re:Rocks Rocks! (Score:1)
Re:Rocks Rocks! (Score:2)
I'll admit, I've never tried this because I couldn't get anyone to spring for the displays, or the space to stack several monitors, but just follow their instructions, and you're off and running.
"cluster" means lot of different things (Score:2, Insightful)
This is a good reference:
http://linuxclusters.com/compute_clusters.html [linuxclusters.com]
java (Score:1, Informative)
Depends on what you're doing (Score:4, Informative)
I run ClamAV and Spamassassin- both very slow programs- with cexec [internetconnection.net] which simply lets me farm regular unix tools across multiple (lots) of CPU servers. This lets me replace the clamscan and spamc programs with "wrappers" that use my farm. I like cexec because it doesn't make me create lists of clients and servers, but automatically load balances and fails out very nicely.
For my frontend web servers, I use fake/heartbeat and some custom proxy software for routing frontend requests to backend farms.
I haven't found a real reliable replicated directory- with one, I could use cexec as a filesystem... Maybe some day...
Re:Depends on what you're doing (Score:3, Insightful)
In general, these are the two things that should be decided last. Other posters have addressed the 'why do you actually think you need a cluster' issue, so I will take a look at the 'why do you want to run Linux' bit.
If what you want is reliability, then nothing beats OpenVMS. You have to pay a premium for hardware that can run it (VAX, Alpha or Itanium only), but if you really need that much reliabili
Diskless OpenMosix (Score:4, Informative)
So I'm a Debian fan, so that involves just creating one large computer (or two with redundancy using linux-ha) with a good RAID as a shared home directory. Then just install the "diskless" package. This will allow you to spawn off as many hosts that mount root off of NFS as you like. All you have to do is get the compute nodes to boot a kernel that supports nfs root (I used a single floppy, but you can do bootcds or net-boot if you're more sophisticated).
I used a Mosix kernel at the time, but I suppose OpenMosix is a better bet today. Mosix pretty much makes the entire system look like a massive SMP, so you just launch a whole lot of batch scripts on one computer, and it automatically distributes the load out to idle machines, and ships the results back to the one you started on. You just choose a node to become the master diskless-image, and then use the diskless scripts to clone it as many times as you like.
The compute nodes could have a local drive, but I just used them for swap and maybe local
The other nice thing about OpenMosix is that it's architecture agnostic, so you could conceivable join and remove nodes that were all different speeds, AMDs or Intels or maybe even 64-bit platforms in any combination. The faster processors would get the heaviest loads first, etc.
After you have this system up and running, you might start playing with more sophisticated stuff, like parallel distributed global filesystems and the like. But before that you could certainly make your NFS root server scale up by splitting it up across multiple machines (for
Anyway, it's the systems management that will get you... so I recommend using Debian, getting real cozy with aptitude, and searching the apt repository for all of the little command and monitoring thingies that will help you, like clusterssh and cfengine and nagios2 and stuff.
Burning a bunch of ClusterKnoppix CDs will pretty much get you on track with most of this, I'd imagine. Also check out the "KNOPPIX Remastering" howto so you can customize your own livecds, should you choose that path insteads of diskless nfsroot.
So that's a software approach, the hard part is really selecting, testing, and optimizing all of the hardware. The slowest component is always going to be storage (invest in lots of separate SATA cards using the Linux software RAID5 or RAID10 - reconfigure and test lots with hdparm -t and bonnie++ and format reiserfs), followed by network (gigabit NICs are cheap - you could afford separate ones for the NFS and the "real" network, though gigabit switches are still up there - Linksys and D-Link make some good 16-port ones for ~$300).
Um, if you're looking for parallel applications, povray is fun. And for a time we could sort of measure how many nodes I had up and running by monitoring my stats at distributed-net . But with OpenMosix, just about anything with lots of CPU-intensive parallel batch processing is fair game and works out of the box.
Have fun!
OSCAR (Score:3, Informative)
I haven't tried any others, but OSCAR installs pretty easy. Just run the installer on the head node, and when it is done it feeds an image to each of the other computers that are a part of the cluster. It includes the ganglia monitoring tools and the apache server so you can view it.
better reason (Score:2, Insightful)
Imagine a... (Score:1, Funny)
warewulf-cluster (Score:2, Informative)
OpenMosix LiveCD (Score:4, Interesting)
This being said, for an instant trial, there are some OpenMosix LiveCDs, such as Quantian or other variants of Knoppix. Put the Quantian DVD in the 1st PC, boot, enable the remote boot option, boot the other computers over the network. Here: you have an operational cluster.
I think there may also be Rocks LiceCDs but haven't tried them. And remember your electricity bill when playing with clusters !
Beowulf! (Score:2, Insightful)
OpenMosix (Score:2)
Re:OpenMosix (Score:1)
I would say that if you're not smart enough to do either of the following :
a) Run Memtest.
b) Remove one node at a time and test.
Then you don't need a cluster anyway. So it sounds like it turned out alright in the end.
Re:OpenMosix (Score:1)
Perfect example of when to use VMware (Score:2)
Warewulf (Score:1, Insightful)
So far, no one has mentioned. Warewulf [warewulf-cluster.org].
I have built three Warewulf clusters in the past year. I like how light weight and customizable WW is. It consists of a bunch of scripts that netboot/etherboot/PXE boot a custom RAM disk as your root file system from a tftp server (in my case the head node). (The smallest RAM disk we have built is around 10 MB. Everything else can be NFS mounted so each of the nodes has the capabilities of a standalong workstation.) From there you can configure it to do whatever you
my first cluster setup (Score:4, Interesting)
Once you get that going, you might look at PVFS2 [pvfs.org] Parallel Virtual File System. "PVFS2 stripes file data across multiple disks in different nodes in a cluster. By spreading out file data in this manner, larger files can be created, potential bandwidth is increased, and network bottlenecks are minimized."
Good Luck!!
Bullet Proof Linux is the way to go (Score:1)
We needed a cluster for load balancing a typical web application, with OpenSSI http://www.openssi.org/ [openssi.org] being the chosen SSI system. Sadly, OpenSSI is far from a "working" solution, and needs quite a bit of massaging - especially if you have newer equipment.
The guys at Bullet Proof Linux have been professional, helpful and incredibly patient in their efforts to get us going. Worth every cent.
Any they did it all
sol or xgrid (Score:2)