Ask Slashdot: Best Linux Distro For Computational Cluster?

Please create an account to participate in the Slashdot moderation system

Ask Slashdot: Best Linux Distro For Computational Cluster? 264

Posted by timothy on Thursday May 26, 2011 @03:39PM from the what-will-you-be-computationalizing? dept.

DrKnark writes "I am not an IT professional, even so I am one of the more knowledgeable in such matters at my department. We are now planning to build a new cluster (smallish, ~128 cores). The old cluster (built before my time) used Redhat Fedora, and this is also used in the larger centralized clusters around here. As such, most people here have some experience using that. My question is, are there better choices? Why are they better? What would be recommended if we need it to fairly user friendly? It has to have an X-windows server since we use that remotely from our Windows (yeah, yeah, I know) workstations."

This discussion has been archived. No new comments can be posted.

Ask Slashdot: Best Linux Distro For Computational Cluster?

Load All Comments

Search 264 Comments Log In/Create an Account

Comments Filter:

YDL (Score:2)

by metalmaster ( 1005171 ) writes:

Yellow Dog Linux ftw!
RHEL (Score:3)

by morcego ( 260031 ) writes: on Thursday May 26, 2011 @03:44PM (#36254872)

Redhat Enterprise Linux.
If you need something cheaper (no licenses), you can always go CentOS. Or you can mix both, having some RHEL and some CentOS machines.

Share
twitter facebook
- Re:RHEL (Score:5, Insightful)
  
  by pavon ( 30274 ) writes: on Thursday May 26, 2011 @03:54PM (#36255046)
  
  If you need something cheaper (no licenses), you can always go CentOS.
  If you want something compatible with Red Hat but cheaper, you should go with Scientific Linux, which is the same sort of idea as CentOS, but has more timely releases, and is used by other major clusters, like the ones at Fermilab and CERN.
  
  Parent Share
  twitter facebook
  - Re:RHEL (Score:5, Informative)
    
    by Anonymous Coward writes: on Thursday May 26, 2011 @05:14PM (#36256320)
    
    I used to be on the CMS/LHC team at Fermilab. We used Scientific Linux on the 5500 Linux workers used for collider event reconstruction. SL is built with computing clusters in mind. I highly recommend it.
    
    Parent Share
    twitter facebook
    - Re:RHEL (Score:4, Funny)
      
      by Have Brain Will Rent ( 1031664 ) writes: on Thursday May 26, 2011 @06:10PM (#36256876)
      
      Just for a moment there I saw 5500 students/programmers/supportfolk/etc. all sitting in little cubicals slaving away at some problem after having had "Scientific Linux", aka the whip, "used" on them....
      
      Then I finished parsing the sentence.
      
      Parent Share
      twitter facebook
  - Re: (Score:2)
    
    by futurekill ( 745161 ) writes:
    
    I'd go with SL as well...CentOS is currently experiencing some organizational turmoil that really makes me doubt its future.
    - Re:RHEL (Score:5, Informative)
      
      by greg1104 ( 461138 ) writes: <gsmith@gregsmith.com> on Thursday May 26, 2011 @07:31PM (#36257810) Homepage
      
      Not just currently. Today's organizational turmoil within CentOS is nothing compared to when they lost access to much of the infrastructure a few years ago [theregister.co.uk]. I just wrote a blog entry on the rise of and fall of CentOS [2ndquadrant.com]; the theme is why it's important to build an open community, not a tight clique, if you want an open-source project to scale.
      
      Parent Share
      twitter facebook
- Re:RHEL (Score:5, Informative)
  
  by b30w0lf ( 256235 ) writes: on Thursday May 26, 2011 @04:32PM (#36255670) Journal
  
  Agreed.
  A primary component of my job is the design and maintenance of high performance compute clusters, previously in computational physics, presently in biomedical computing. Over the last few years I have had the privilege of working with multiple Top500 clusters. Almost every cluster I have ever touched has run some RHEL-like platform, and every cluster I deploy does as well (usually CentOS).
  Why? Unfortunately, the real reasons are not terribly exciting. While it's entirely true that many distro's will give you a lot more up-to-date software with many more bells and whistles, at the end of the day what you really want is a stable system that works. Now, I'm not going to jump into a holy war by claiming RedHat is more stable than much of anything, but what it is is tried and true in the HPC sector. The vast majority of compute clusters in existence run some RHEL variant. Chances are, if any distro is going to have hit and resolved a bug that surfaces when you have thousands of compute cores talking to each other, or manipulating large amounts of data, or running CPU/RAM intensive jobs, or making zillions of NFS (or whatever you choose) network filesystem calls at once, or using that latest QDR InfiniBand fabric with OpenMPI version 1.5.whatever, it's going to be RHEL. That kind of exposure tends to pay off.
  Additionally, you're probably going to be running some software on this cluster, and there's a good chance that software is going to be supplied by someone else. That kind of software tends to fall into one of two camps: 1) commercial (and commercially supported) software, and; 2) open source, small community research software. Both of these benefit from the prevalence of RHEL (though, #1 more than #2). If you're going to be running a lot of #1, you probably just don't have an option. There's a very good chance that the vendor is just not going to support anything other than RHEL, and when it comes down to it, if your analysis isn't getting run and you call the vendor for support the last thing you want to hear is "sorry, we don't support that platform ." If you run a lot of #2, you'll generally benefit from the fact that there's a very high probability that the systems that the open community software have primarily been tested on are RHEL-like systems.
  Finally, since so many compute clusters have been deployed with RHEL-like distros, there is oodles of documentation out there on how to do it. This can be a pretty big help, especially if you're not used to the process. Chances are your deployment will be complicated enough without trying to reinvent the wheel.
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by cloudmaster ( 10662 ) writes:
    
    I don't build or support HPC clusters in my current job, but in my previous job working for the people who probably made the CPU in your cluster, I did. :) We specifically did performance testing on the hardware before it was released to the public. We did the testing with RHEL and SLES because that's what pretty much everyone who built clusters does. Now, "everyone does it" doesn't mean it's the best, but just like nukem996 said below me, it does mean it'll be best supported. If you have a problem and
  - Re: (Score:2)
    
    by dbIII ( 701233 ) writes:
    
    I think the main reason for that is that some of the expensive commercial software that runs on clusters will only run properly on RHEL or similar. One maddening example required an old version of RHEL only because the horrible Macromedia flexlm licence manager (designed to punish only the honest) would not run on anything newer - the actual software suite that we had paid for would run on any version of linux on x86 or x86_64. RHEL has plenty of stuff built in to be compatible with old binaries which is
  - Re: (Score:3)
    
    by RogerWilco ( 99615 ) writes:
    
    If you run a lot of #2, you'll generally benefit from the fact that there's a very high probability that the systems that the open community software have primarily been tested on are RHEL-like systems.
    We used to run SLES and OpenSuse on our old cluster, and switched to Ubuntu for our new one. We had several reasons for that:
    1) We found that most developers are on a flavour of Ubuntu. It's really become the nr1 desktop distro.
    2) We made an inventory of what was available though rpm and though apt, and a lot of packages that our users needed were available though apt but not rpm.
    3) CentOS and Scientific Linux were also considered, but seemed to be lagging behind Ubuntu in what versions of packages were sup
- Re: (Score:2)
  
  by nukem996 ( 624036 ) writes:
  
  I second this mainly because it sounds like you don't have much experience in setting up a cluster. By using RHEL you get tech support which may help when your stuck. If your company doesn't want to pay for it CentOS is good because I beleive you can just pay for RHEL support and Redhat will support it.
- - Re: (Score:2)
    
    by morgan_greywolf ( 835522 ) writes:
    
    RHEL is fine, CentOS is just awful, and anytime someone offers up CentOS as a substitute for RHEL, I wonder of they've ever used CentOS. Watch for circular dependencies and lots of unavailable packages.
    I've never seen that problem with CentOS.
    Everything you could need is an apt-get away, rather then google-the-wget away with CentOS and dag. I know my situation isn't a cluster, but we're running 20 Ubuntu servers in 15 colos currently, and our experience has been by far the best with Ubuntu.
    The problem with
  - Re: (Score:2)
    
    by WuphonsReach ( 684551 ) writes:
    
    RHEL is fine, CentOS is just awful, and anytime someone offers up CentOS as a substitute for RHEL, I wonder of they've ever used CentOS. Watch for circular dependencies and lots of unavailable packages.
    
    The only way you get into dependency hell in CentOS over RHEL is if you don't know what you're doing and how to control pulling packages from non-standard repositories.
    
    And from the sounds of things, you're the type who adds a 3rd party repository and pulls everything in, instead of using the "includepkg
- - Re:Gentoo (Score:3, Insightful)
    
    by miknix ( 1047580 ) writes:
    
    With some chance of being modded down, I suggest Gentoo Linux. With Gentoo you can compile your kernel and everything else which might give you some arguable performance increase. Because Gentoo is a source-based distribution, it might help you with scientific development because all the library (boost, itpp, lapack, etc) headers (and source) are immediately available. There is support for scientific libraries like atlas, ACML, etc.. and you can easily change the default library for blas/laplack using a sim
    - Re: (Score:3)
      
      by cloudmaster ( 10662 ) writes:
      
      The problem with that suggestion is that the people maintaining the code don't have a clue what QA means. And before people whine - I used Gentoo as my primary distro for around three years. The emerge system is great - but the data inside is crap.
      If you want to build your stuff from source and actually have a working system, look at the Debian-based distros. There's this nifty "apt-build" thing that lets you build software with whatever compile options you want (so you can still do -O3 -funroll-loops on
  - Re: (Score:2)
    
    by morcego ( 260031 ) writes:
    
    However, if performance is your number TWO priority, and stability is your number ONE, you should use the stock kernel. If you are using an enterprise distribution, that is.
Scientific Linux (Score:5, Informative)

by stox ( 131684 ) writes: on Thursday May 26, 2011 @03:45PM (#36254884) Homepage

Built for that very purpose.

Share
twitter facebook
- Re:Scientific Linux (Score:5, Informative)
  
  by boristhespider ( 1678416 ) writes: on Thursday May 26, 2011 @04:55PM (#36256004)
  
  Being in academia and spending time in a lot of departments I can at least confirm that a large number of departments are running Scientific. I've worked in Britain, the USA, Canada, Norway and Germany and while Germany (predictably enough) has a hankering for SuSE, the others have a tendency to run Scientific.
  I did type in a long and boring anecdote about my experiences administering things running SGI Irix and Solaris back in the day, but wiped it when it began to look a bit incriminating and for all I know my ex-boss reads Slashdot. So I'll summarise as "don't administer SGI Irix or Solaris if you can avoid it". I'm no computer scientist, so maybe people who are better at it have no problems, but as a vaguely-competent scientist with an interest in computers but little more (like the original poster) I didn't get on with either of them. Red Hat was fine, and we hung Fedora machines off our central network and that was OK even though it was Fedora Core 1 with all its teething problems. And Scientific is very widely used in academia on big networks.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by afidel ( 530433 ) writes:
    
    Hmm, Solaris 9 and Redhat 6 were so close that I could often take config files from one and make minimal changes and run them on the other. Of course our Solaris install was running the optional gnu utilities so that helped =)
  - Re: (Score:3)
    
    by jd ( 1658 ) writes:
    
    Rocks is another good distro for this. It's designed specifically for cluster use, with packages pre-built with that in mind.
    It also depends some on what clustering system you're using. If you're wanting to use MOSIX or Kerrighed, then use a distro the one you want to use is well-tested on. Kernel patch conflicts can otherwise make things very difficult.
    - - Re: (Score:2)
        
        by jd ( 1658 ) writes:
        
        Yes but that doesn't mean everything was compiled with RHEL's standard compile flags or that different patches weren't included/excluded. I haven't looked at the specifics of Rocks' packages for a while, but I can trust they're tried-and-tested in cluster environments. RHEL may be 100% identical in all respects, but then it might not be. All I'm really sure of is that Red Hat's QA won't have put clustering as high on their list of things to test against.
        That's no criticism of Red Hat. They've a finite numbe
- - Re: (Score:2)
    
    by stox ( 131684 ) writes:
    
    Actually, I run a 500+ seat callcenter on OpenAFS. Makes my life much easier than the Samba implementation it replaced.
NPACI Rocks (Score:5, Informative)

by rmassa ( 529444 ) writes: on Thursday May 26, 2011 @03:46PM (#36254894)

NPACI Rocks is probably your best bet. http://rocksclusters.org/ [rocksclusters.org]

Share
twitter facebook
- Re:NPACI Rocks (Score:5, Interesting)
  
  by daemonc ( 145175 ) writes: on Thursday May 26, 2011 @04:43PM (#36255830)
  
  Seconded. I used Rocks to build clusters for the university for which I worked, and it made my life much, much easier.
  If you are already familiar with Redhat administration, you'll be happy to know Rocks can use either Redhat or CentOS as its base OS.
  It uses meta-packages called "rolls", which completely automate the installation and configuration of your computing nodes. There are rolls that include most of the commonly used commercial and Open Source HPC software out there, or you can "roll" your own. Basically you just configure your head node, and then adding a compute node is as simple as setting the BIOS to boot over PXE, plug it in, and done.
  Rocks, well, rocks.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by sirsnork ( 530512 ) writes:
  
  This, ignore any other suggestion and use Rocks. You're doing yourself a disservice if you don't at least look at it.
- Re: (Score:2)
  
  by TooMuchToDo ( 882796 ) writes:
  
  Rocks only goes so far though. If your cluster is small, it works like a champ. You want any heavy level of customization and it quickly becomes unwieldy. Once you go above 1000 nodes, want to have your nodes on public network addresses instead of an RFC1878 block, etc, you might as well put TFTP/Kickstart/etc together yourself.
Scientific Linux (Score:5, Informative)

by Skapare ( 16644 ) writes: on Thursday May 26, 2011 @03:47PM (#36254914) Homepage

How about Scientific Linux [wikipedia.org]?

Share
twitter facebook
Scientific Linux (Score:3)

by Ether ( 4235 ) writes: on Thursday May 26, 2011 @03:49PM (#36254940)

Scientific Linux. http://www.scientificlinux.org/ [scientificlinux.org] Has the benefit of RHEL: a stable OS environment without some of the headaches of CentOS. If you have money (you probably don't) RHEL is good.

Share
twitter facebook
- - Re: (Score:2)
    
    by swv3752 ( 187722 ) writes:
    
    Get a damn clue. The Grandparent posted at almost the same time as the one earlier post for Scientific Linux, and much earlier then anyone else. Perhaps idiots like yourself should first learn how to read threaded discussions before posting.
Fedora (Score:2)

by tanawts ( 786512 ) writes:

Fedora has components to help manage large deployments. https://fedorahosted.org/spacewalk/ [fedorahosted.org] It also has FreeIPA to help with a secure and scalable means of managing authentication/authorization/resources within the cluster. http://freeipa.org/page/Main_Page [freeipa.org]
- Re: (Score:2)
  
  by Fjandr ( 66656 ) writes:
  
  Fedora goes to they other extreme from CentOS. The update cycle is too short, which means you have increased worry about instability. Stuff just breaks sometimes, even though it's a good distro on the the whole for many purposes. I'd assume stability is a top priority for someone putting together a cluster.
Which editor should he use? (Score:2, Funny)

by Albanach ( 527650 ) writes:

Now we have such a clear winner on the choice of distro, perhaps we can discuss which would be the best editor on the cluster?
- Re: (Score:2)
  
  by Kamiza Ikioi ( 893310 ) writes:
  
  vi
  - Re:Which editor should he use? (Score:5, Funny)
    
    by ebuck ( 585470 ) writes: on Thursday May 26, 2011 @04:58PM (#36256056)
    
    vi
    Clearly the best choice. It is so heavily optimized that even its name takes up only 40% of the required character of the second best contender, emacs.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by greg1104 ( 461138 ) writes:
      
      ed is also two characters, and they are adjacent on the keyboard. If you are a touch-typist, it takes 50% of the hands and 50% of the fingers necessary to start that inefficient vi editor.
    - Re: (Score:2)
      
      by cloudmaster ( 10662 ) writes:
      
      No, the second contender would be vim. The emacs option doesn't come up until you've dismissed both vi and vim. :)
- Re: (Score:2)
  
  by gilleain ( 1310105 ) writes:
  
  Now we have such a clear winner on the choice of distro, perhaps we can discuss which would be the best editor on the cluster?
  Sounds good - and finish up with a reasoned, polite exchange of views on which programming languages to use on the new cluster?
  - Re: (Score:2)
    
    by rgbatduke ( 1231380 ) writes:
    
    Now you're just teasing. After the distro wars, the editor wars, you want to start the compiler wars? Oh, wait, half of the respondents are replying with interpreted languages (demonstrating a pretty profound ignorance of 99% of scientific computing) so clearly the question was serious...;-)
    
    Personally I vote for slackware for a distro, jove as an editor, and do languages other than C (and somewhat regrettably, fortran) exist? They do not. Maybe perl or one of them new-fangled interpreted languages to
- Re: (Score:2)
  
  by betterunixthanunix ( 980855 ) writes:
  
  The POSIX standard editor of course.
- Re: (Score:3)
  
  by afidel ( 530433 ) writes:
  
  None because you don't run interactive processes on a cluster but instead submit a package to the job scheduler.
Rocks Cluster uses a modified Centos (Score:2)

by w3rdna ( 253598 ) writes:

Centos is modified to be the base OS for the ROCKS Cluster.
http://www.rocksclusters.org/wordpress/
- Re: (Score:2)
  
  by erikscott ( 1360245 ) writes:
  
  The Rocks approach is nice for quickly regenerating a failed node. And it's Centos under the covers, as noted, so it's RHEL in disguise. If you're running 16 boxes with dual quad-cores, you'll lose the occasional disk drive. If you run 64 cheap desktops with single-socket dual-cores, you'll lose a disk drive every week or two.
  - Re: (Score:2)
    
    by KainX ( 13349 ) writes:
    
    The Rocks approach is nice for quickly regenerating a failed node. And it's Centos under the covers, as noted, so it's RHEL in disguise. If you're running 16 boxes with dual quad-cores, you'll lose the occasional disk drive. If you run 64 cheap desktops with single-socket dual-cores, you'll lose a disk drive every week or two.
    Of course, if you're using a modern (read: stateless) provisioning system, "regenerating a failed node" simply requires a power-cycle. And you lose far fewer disk drives since they're not used for the OS. And replacing a dead node with a new one is a single command and a power button.
    Systems like ROCKS only seem great if you haven't used anything else. :-)
- - Re: (Score:3)
    
    by 0racle ( 667029 ) writes:
    
    Ok, none of that is true. Even as a troll, that's pretty pathetic.
Distro isn't the biggie, it's the scheduler (Score:5, Interesting)

by javanree ( 962432 ) writes: on Thursday May 26, 2011 @03:51PM (#36254990)

I've worked with various clusters over the past year.
The distro doesn't really matter, mostly it's what you feel most comfortable with. I'd slightly favor RedHat Enterprise or a respin of it, since it's easiest in terms of drivers for commercial cluster hardware and commercial software support, but Debian would be just as fine. I would choose a 'stable' distro though, so no Fedora, no Ubuntu (even their LTS isn't exactly enterprise grade compared to RedHat / Suse or even Debian stable) You don't want to have to update every week since this usually requires quite some work (making new images and rebooting all nodes)
What I found out matters a lot more is the scheduler you will use; Sun Grid Engine, PBS, Torque or slurm to name a few. Every scheduler comes with it strong and weak points, be sure to look at what matters most to you.
If you are unfamiliar with all of these things, pick a complete bundle like Rocks (it's based on RedHat Enterprise Linux), which makes setting up a cluster quite easy and still allows you to choose which components you want. That'll greatly improve your chance of success. But be warned; it's still a steep learning curve building and specially configuring a cluster. The most time is spent tuning queuing parameters to maximize the performance of your cluster.

Share
twitter facebook
- Re: (Score:2)
  
  by RichM ( 754883 ) writes:
  
  I would choose a 'stable' distro though, so no Fedora, no Ubuntu (even their LTS isn't exactly enterprise grade compared to RedHat / Suse or even Debian stable
  That would depend on if you mean stable = "ancient" or stable = "modern but secure".
  Ubuntu LTS is in the latter category from my experience.
- The distro matters in many ways (Score:3)
  
  by Skapare ( 16644 ) writes:
  
  The distro does matter, often in ways not particular to being a cluster, but perhaps in ways making it easy to manage in general. For example, I'm moving away from Ubuntu (server) because it is too hard to selectively upgrade a single package or group of packages without imposing an upgrade on other packages. This is where "hand holding" has turned into "wrist crushing". So I'm moving to Slackware (which is getting a lot more capability through the SlackBuilds community).
This Question (Score:3)

by SleazyRidr ( 1563649 ) writes: on Thursday May 26, 2011 @03:52PM (#36254996)

My comprehension of this question is roughly 'please have a flamewar about the different flavours of Linux.'

Share
twitter facebook
- Re: (Score:2)
  
  by blair1q ( 305137 ) writes:
  
  ..."and whoever is left standing and doesn't have too much shit on him in the end shall be king!"
  Pretty much how all reviews go, minus the fun for the spectators.
X window (Score:2)

by turbidostato ( 878842 ) writes:

"It has to have an X-windows server since we use that remotely from our Windows (yeah, yeah, I know) workstations."
So what? One one hand in order to run Linux graphic apps on Windows you need an X-Window server... on the Windows machine, not the Linux one. On the other hand, how is it that you *must* use GUI-based apps? There's *really* no operational alternatives? (I've been administrating Linux and Unix systems for almost two decades and I never needed -as in "must", GUI-based apps for that).
- Re: (Score:2)
  
  by grimsweep ( 578372 ) writes:
  
  Good chance that the GUI request deals primarily with user-friendly aspects of using the cluster. There are always alternatives to GUI-based apps, but there are plenty of times where using one will save you time and effort. Have you ever tried substituting Gimp with Image Magick? You can't beat the latter for batch image processing, but I wouldn't ask anyone to design a logo with it.
- Re: (Score:2)
  
  by bugi ( 8479 ) writes:
  
  He was asking for X libraries, not an X server. He's covered. No non-embedded distro will ship without such.
  In X-land, the server is what talks to your display device on behalf of your other programs. The server manages the scarce resource. The clients bribe the server for access.
- Re: (Score:2)
  
  by Eponymous Bastard ( 1143615 ) writes:
  
  I've been against this wall before. There are a few things to consider:
  - In a university environment the "compute cluster" is not going to be in a data center far away, but rather in "lab" (read office) with 16 8-core machines, so the machines might actually be used locally either with a monitor for each grad student or a KVM switch for the single student/admin. For newbie admins it's easier to flip the KVM switch and click their way through the admin guis.
  - In a mixed Win/Linux environment, you're right, a
  - Re: (Score:2)
    
    by Dogers ( 446369 ) writes:
    
    For a Windows XServer, try http://sourceforge.net/projects/xming/ [sourceforge.net]
    Works great when I've needed it!
  - Re: (Score:2)
    
    by Doug Neal ( 195160 ) writes:
    
    NX [nomachine.com] is also an option.
  - Re: (Score:2)
    
    by cloudmaster ( 10662 ) writes:
    
    FYI:
    You can launch vnc via [x]inetd and have it connect to localhost via XDMCP, using PAM for authentication at the chooser you ultimately get. Don't bother with VNC passwords, and that frees you from having to give people specific port assignments. You (generally, depending on how you set it up) lose the disconnected support - but a setup like that is a real nice way to get around the difficulty (and cost) in setting up Windows X Servers; the VNC clients are all comparatively simple to set up and secure.
SUSE Linux or SUSE Linux Enterprise (Score:2)

by hotfireball ( 948064 ) writes:

If you are OK to go with RHEL, you also can look for SLES: SUSE Enterprise Linux Server. They also have SUSE Studio where you can make your own appliances. If you are large enterprise, they will even give you SUSE Studio appliance to be hosted in-house in your company for your own needs. They also have SUSE Manager — same as Spacewalk, but has more features in it (and is backward compatible with a Spacewalk).
Red Hat for support (Score:4, Insightful)

by guruevi ( 827432 ) writes: on Thursday May 26, 2011 @03:56PM (#36255082)

RH support is phenomenal and that's why a lot of businesses use it. If you want it on the cheap, go with what you're comfortable and have your specific calculation packages built in (Debian if you like apt and open source packages, RPM if you use a lot of commercial packages). If you're looking for performance and specific hardware enhancements, go Gentoo or one of it's brethren. Go with something that you can easily re-image if you're looking for lots of changes in software lineups or conflicts.

Share
twitter facebook
Scientific Linux 6.0 or RedHat Enterprise 6.1 (Score:2)

by Billly Gates ( 198444 ) writes:

Scientific Linux 6.0 is built on Redhat Enterprise Edition 6 which is highly tested and tuned for server throughout put, power management, and stability compared to a stock vinalla kernel. The performance will be much better than a stock debian stable kernel or Ubuntu for example. Redhat has a bunch of hackers. Scientific Linux includes apps used for scientists which maybe your target market if you are a university too. If your old cluster has scripts and tools optimzied for Redhat and RPMs then makes sense
- Re: (Score:2)
  
  by blair1q ( 305137 ) writes:
  
  Scientific Linux 6.0 is built on Redhat Enterprise Edition 6
  So is Scientific Linux 6.0 free?
  - Re: (Score:2)
    
    by blair1q ( 305137 ) writes:
    
    eh, nemmind. my brain wandered while my eyes worked over your other two paragraphs.
NPACI Rocks (Score:2)

by jfp51 ( 64421 ) writes:

NPACI Rocks without a doubt. Red Hat centric, you need to put in some work to understand how it ticks, once you so and set up your cluster properly, it is very solid and reliable.
Funny how 128 cores used to seem like a lot (Score:2)

by Quila ( 201335 ) writes:

I was just pricing 2U database servers that had 32 cores each. A 128 core cluster is now just four small off-the-shelf servers in a rack for less than a hundred grand.
- Re: (Score:2)
  
  by Nite_Hawk ( 1304 ) writes:
  
  I know, it's insane. Once the 16-core Interlagos chips are out you could do the entire thing in a fully populated Dell 6145 2U enclosure (2 nodes).
- Re: (Score:2)
  
  by Bill_the_Engineer ( 772575 ) writes:
  
  Out of curiosity, what did you pick for environmental control (ie. heat) ?
  - Re: (Score:2)
    
    by Quila ( 201335 ) writes:
    
    They're going into one of the dozens of racks we have. We have more of an issue of power right now than heat.
x-window server? (Score:2)

by tokul ( 682258 ) writes:

It has to have an X-windows server since we use that remotely from our Windows (yeah, yeah, I know) workstations.
Are you sure that you know? You run local x window server on your windows machine when you use x window programs.
CentOS, Scientific Linux, Ubuntu, Debian (Score:5, Informative)

by MetricT ( 128876 ) writes: on Thursday May 26, 2011 @04:06PM (#36255254)

I've got 10+ years experience managing a large (2000 core, 1+ PB storage) compute cluster. If you're using one of those annoying commercial apps that assume Linux = Red Hat Linux (Matlab, Oracle, GPFS,etc.), then CentOS or Scientific Linux are the way to go.
If you don't have that constraint, consider Ubuntu or Debian. apt-get is my single favorite feature in the history of Unix-dom. Plus, there are often pre-built packages for several common cluster programs (Torque, Globus, Atlas, Lapack, FFTW, etc.) which can get you up and running a lot faster than if you had to build them yourselves.

Share
twitter facebook
- Re: (Score:2, Informative)
  
  by Anonymous Coward writes:
  
  I run matlab instances here on my debian vms - no problems. All in all, we have about 800 machines here over several clusters, and everything runs on debian.
Debian (Score:2)

by Alex Belits ( 437 ) * writes:

Debian -- easy to manage, easy to create new packages for, least amount of nonstandard, distribution-specific stuff (except configuration files management, but that is a result of having to keep individual packages' configuration tied to packages).
Response & questions (Score:2)

by multimediavt ( 965608 ) writes:

1. What types of computation is the cluster going to be used for? MD, CFD, ???
2. What software will be used on the nodes? CHARMM, GAMESS, LAMMPS, NWChem, etc.
3. Do you have a preference for a Linux distro? If not, it really doesn't matter that much if you are rolling your own cluster and software stack. It will just determine what things are used for package management and what services in the distro you might want to turn off in order to get the most memory for apps and not the base OS.
4. You should be u
debian squeeze (Score:3)

by dermond ( 33903 ) writes: on Thursday May 26, 2011 @04:08PM (#36255280)

we run our 320 core cluster on debian squeeze. infiniband support out of the box. the gridengine is a mater of apt.-get install. comes with tons of scientific sofware.

Share
twitter facebook
Swing the license for RHEL (Score:3)

by Zemplar ( 764598 ) writes: on Thursday May 26, 2011 @04:08PM (#36255284) Journal

Scientific Linux is totally awesome, but a project of this size, especially with the IT knowledge on hand, needs the support and first-rate product which RedHat provides.

Share
twitter facebook
Building Clusters (Score:5, Informative)

by Nite_Hawk ( 1304 ) writes: on Thursday May 26, 2011 @04:15PM (#36255412) Homepage

Hi,
I work at a Supercomputing Institute. You can run many different OSes and be successful with any of them. We run SLES on most of our systems, but CentOS and Redhat are fine, and I'm using Ubuntu successfully for an Openstack cloud. Rocks is popular though ties you to certain ways of doing things which may or may not be your cup of tea. Certainly it offers you a lot of common cluster software prepackaged which may be what you are looking for.
More important than the OS are the things that surround it. What does your network look like? How you are going to install nodes, and how you are going to manage software? Personally, I'm a fan of using dhcp3 and tftpboot along with kickstart to network boot the nodes and launch installs, then network boot with a pass-through to the local disk when they run. Once the initial install is done I use Puppet to take over the rest of the configuration management for the node based on a pre-configured template for whatever job that node will serve (for clusters it's pretty easy since you are mostly dealing with compute nodes). It becomes extremely easy to replace nodes by just registering their mac address and booting them into an install. This is just one way of doing it though. You could use cobbler to tie everything together, or use FAI. XCAT is popular on big systems, or you could use system imager, or replace puppet with chef or cfengine... Next you have to decide how you want to schedule jobs. You could use Torque and Maui, or Sun Grid Engine, or SLURM...
Or if you are only talking about about like 8-16 nodes, you could just manually install ubuntu on the nodes, pdsh apt-get update, and make people schedule their jobs on google calendar. ;) For the size of cluster you are talking about and what I assume is probably a very limited administration budget, that might be the best way to go. Even with someting like Rocks you are going to need to know what's going on when things break and it can get really complicated really fast.

Share
twitter facebook
- Re: (Score:3)
  
  by clutch110 ( 528473 ) writes:
  
  This post is full of good information. I have been managing HPC for seismic companies for the past 8 years now. I regularly use xCAT as I find that after a few nodes automation is the way to go.
  You will find that most clusters run RedHat or a variant of the OS. Most places run CentOS on the nodes and have a machine with RedHat stashed around somewhere in case a problem occurs and they need to reproduce it on a "supported" OS.
  Why is there a requirement for a full blown X install? Are these machines deskt
- Re: (Score:2)
  
  by Junta ( 36770 ) writes:
  
  Incidentally, I'm an xCAT developer and am always interested in ways to make it scale down a bit better as well as it scales up. Historically, it has been worth it at large scale, but a bit too much configuration for small systems. A lot of settings support autodetect now and if you don't care much about the nitty-gritty, applying the default templates provides a serviceable set of groups that can drive configuration instead of micromanaging all sorts of details.
  In terms of DHCP, it generally allows and u
Scientific Linux (Score:3)

by scheme ( 19778 ) writes: on Thursday May 26, 2011 @04:30PM (#36255628)

A lot of this depends on what you're doing with your cluster and what apps you're running. However, Scientific Linux is used by quite a few large clusters and all of the US ATLAS and CMS clusters run on. As others have mentioned, you probably want to be more interested in how the cluster is managed and nodes setup and kept up to date. I'd recommend something like cobbler and puppet or some other change management system so that you can setup profiles and automatically have that propagated to the various nodes automatically. This is preferable and easier than going through and making the same configuration changes on 5-10 machines.

Share
twitter facebook
Debian/Ubuntu (Score:2)

by dogmatixpsych ( 786818 ) writes:

I'd have to agree with the Debian/Ubuntu route if you want user friendliness. I've always found Debianesque systems much more manageable than other distros. If I have to provide most of the IT myself, I prefer Debian/Ubuntu. There are some science Debian distros as well (and repositories).

Scientific Linux would likely be faster overall for computationally heavy tasks but it really depends on what you are planning on doing. Debian wouldn't be slow, just not quite as fast as Scientific Linux; but again, t
Scyld Beowulf From Penguincomputing (Score:2)

by adamy ( 78406 ) writes:

Disclaimer, I worked on the produce for a number of years. I now work at a different Linux company...
Scyld is built on top of Red Hat EL, can also run with CentOS, but uses a custom Kernel. It has a lightweight provisioning mechanism that makes maintenance of compute nodes very easy, and the single system image approach makes job management significantly easier than a traditional Beowulf cluster. I don't know if they test it out with Scientific Linux these days.
RHEL/Scientific Linux & Perceus (Score:2)

by Kludge ( 13653 ) writes:

We use RHEL/Scientific Linux & Perceus (http://www.perceus.org/). It is solid and easy to add new nodes.
X11 ...server? (Score:3)

by Fishbulb ( 32296 ) writes: on Thursday May 26, 2011 @05:06PM (#36256180)

Correction: the X11 server runs on your glass; eg: your Windows system. All you need then are X11 clients on the Linux cluster nodes.
So yeah, you'll need the libs and other support files for X11, but not the server itself. You'll save a bit on disk space by not installing the server. If it's just a single X11 client you need to run, then you can figure out exactly what it needs and not have a bunch of other crap (fonts, *GL, window managers, libs you're not using...) installed. Plus, you won't have a daemon running that takes resources despite being idle, and is an attack vector since it manages user logins.

Share
twitter facebook
- Re: (Score:2)
  
  by WuphonsReach ( 684551 ) writes:
  
  Correction: the X11 server runs on your glass; eg: your Windows system. All you need then are X11 clients on the Linux cluster nodes.
  
  Ah, the joys of X11 terminology.
  
  I can understand why they flip-flop the server/client locations and understand that it's technically correct, but it confuses the hell out of folks more often then not.
a vote for Gentoo (Score:2)

by smoothnorman ( 1670542 ) writes:

I built a small (32 node) Beowulf cluster for an informatics group at the University of Bonn. We started off with a SuSE, discovered that it was hard to get some drivers compiled, then went to Debian, discovered that some of the boot up scripts were a bit troublesome to keep up high availability, then went to Gentoo ahref=http://www.gentoo.org/rel=url2html-17894 [slashdot.org]http://www.gentoo.org/ /> and were quite pleased how *everything*, including rebuilding a node up from the boot loader, could be scripted. Of
Who will administrate it ? (Score:2)

by godrik ( 1287354 ) writes:

I believe you can successfully build a computational cluster from any linux distribution. I am sure you could go wild and use slackware if you want.
But I guess the quesiton is who will administrate the cluster ? from what you say, I feel like you will and you say yourself you don't know much about that. Then I would recommend to keep the distribution installed by the vendor because they will probably give you software support. But if you change it, they probably won't.
Important things have already been told
Ubuntu 10.04 LTS with Sun Gridengine (Score:2)

by Falcdragon ( 732699 ) writes:

I'm not really a Ubuntu fan, but with the cluster I manage (120 physical cores, 960GB RAM) we've ended up going with Ubuntu 10.04 running Sungrid Engine for a couple of reasons. - The LTS support, by the time the support period ends we should be replacing the hardware any way. - It provides a Grid engine package by default (might not be the latest but it's good enough) for distributing the workloads - A lot of people are already familiar with Ubuntu - Most third party apps provide support for it - It's very
No X servers required on servers... (Score:2)

by ianezz ( 31449 ) writes:

It has to have an X-windows server since we use that remotely from our Windows

Just to clear out a misconception that arises from time to time: you do not need an X server on a server exactly in the same way you don't need a web browser on your HTTP server. To understand that, you can think of an X server as a "browser" for the X protocol. On the server you just need some support libraries (which help applications in talking the X protocol).
X server (Score:2)

by iceaxe ( 18903 ) writes:

I'll leave the clustering distro advice to others, but if I understand your needs regarding X-windows, what you need is an X server running on your windows (or other ) client machine so that the program running on the cluster can display on your desktop/laptop. The X programs may need appropriate libraries, but you don't need an X server running on the cluster.
See Xming [straightrunning.com] for a good, free, open source X server for windows. There are other options available, but that's what I use, and find it to be stable and
Asking in the wrong place... (Score:2)

by rgbatduke ( 1231380 ) writes:

It's a FAQ there, but you really should be asking this on the beowulf list, after skimming the list archives for any of the eight and a half million answers (in gory detail) that have been posted there in response over the years. Slashdot has plenty of nerds and I'm sure a lot of cluster geeks (who are likely on the beowulf list) but the beowulf list is sort of distilled cluster geekery/wisdom.

http://www.beowulf.org/mailman/listinfo/beowulf [beowulf.org]

rgb (Google "rgb duke beowulf" if you like -- I used to hel
What Will You Run, and Who Will Run It? (Score:2)

by KainX ( 13349 ) writes:

I'll preface this by saying that I'm an HPC admin for a major national lab, and I've also contributed to and been part of numerous HPC-related software development projects. I've even created and managed a distribution a time or two.
There are two important questions that should determine what you run. The first is: What software applications/programs are you expecting the cluster to run? While some software is written to be portable to any particular platform or distribution, scientists tend to want to
use what you're comfortable with. (Score:2)

by markhahn ( 122033 ) writes:

128 cores isn't enough to worry about - just install a distro you like and feel comfortable maintaining. although 128 cores isn't many, you should probably think about the style of install you want. lots of people seem to like diskful installs - afaikt purely because it's familiar. most significant clustering sites use diskless (NFS root) though, because it's so much easier to maintain. there's never any question of nodes getting out of sync. traffic due to NFS root is trivial. another best-practice is to c
- Re: (Score:3)
  
  by MikeDirnt69 ( 1105185 ) writes:
  
  Wrong too. Use the distro you work better with.
- Re:None of them (Score:4, Interesting)
  
  by cashman73 ( 855518 ) writes: on Thursday May 26, 2011 @11:11PM (#36259292) Journal
  
  Actually, in the realm of biomedical supercomputing, "none of the above" has already been done. Check out the Anton supercomputer [psc.edu] designed and built by D.E. Shaw Research. The entire supercomputer, right down to all of the processor cores themselves, were specially designed and built specifically for molecular dynamics research. The system has no operating system and, as such, no overhead. Every processor cycle goes straight into the calculations. It is capable of churning out simulations of 150,000+ atom protein complexes on the order of several microseconds long, using wallclock CPU time of a few days.
  
  Parent Share
  twitter facebook
- - Re: (Score:3)
    
    by blair1q ( 305137 ) writes:
    
    Imagine a Beowulf cluster of BeOS, beotch.
  - Re: (Score:2)
    
    by darkpixel2k ( 623900 ) writes:
    
    Just imagine a Beowulf cluster of bullcrap!
    Can you imagine the licensing costs of Windows bullcrap(tm)? At least with Linux it's free...
- Re:Ubuntu 10.04 LTS - Why? (Score:2)
  
  by Andy Dodd ( 701 ) writes:
  
  Simple question. The OP asked WHY you feel that is a solution for large-cluster HPC.
  It looks like so far your only reason is "i liek it!" - I personally have no opinion or experience with HPC clusters, but so far nearly all of those who do are recommending something that is either RHEL or RHEL-based (Rocks or Scientific Linux), if only because it allows you to leverage commonality with the big cluster operators with installations in the Top500.
  Disclaimer: I'm an Ubuntu user, and I greatly enjoy it, but I
- Re:Ubuntu 10.04 LTS (Score:4, Insightful)
  
  by Hatta ( 162192 ) writes: on Thursday May 26, 2011 @04:54PM (#36255986) Journal
  
  Why? I know Ubuntu is the standard recommendation for grandma these days, but what makes you think it's particularly appropriate for a computational cluster? For instance, do you really need GNOME on a high performance cluster?
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by cloudmaster ( 10662 ) writes:
    
    While I'd probably still recommend RHEL/CentOS/Rocks/whatever, to answer this specific question...
    Ubuntu is an easy-to-use polished layer on top of Debian's unbeatable history of Doing Shit Right. Yes, there are some mistakes in their history like everyone else, so skip the "but in 1996 Debian did some obscure thing wrong" and "one time some boob screwed up the random number generator in ssh" - but overall, Debian is an incredible base for just about everything. Ubuntu takes Debian's inherent coolness and
- Re: (Score:2)
  
  by sirlark ( 1676276 ) writes:
  
  I have to disagree. Ubuntu has a nasty habit of letting non-mainstream, non-desktop related bugs pass through several release cycles. We've just this last week spent 3 full days trying to figure out why my perfectly working NFS boot over PXE cluster broke when we did a safe upgrade. Turns out there's been a bug in portmap since lucid, which still exists in natty which causes the NFS rootfs mount to fail. We had to to recreate the filesystem from scratch and install lucid without updates, then hold portmap b
  - Re: (Score:2)
    
    by cloudmaster ( 10662 ) writes:
    
    So, there's been a bug for years, but you just hit it recently? Sounds like a new bug. ;)
    (I'll pretend I haven't seen all sorts of problems with NFS root on different Ubuntu releases for the last several years; the bug seems to relate to the way the mounting and detecting-of-mounting works; my name's probably in a few of the bugtracker threads)

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

YDL (Score:2)

RHEL (Score:3)

Re:RHEL (Score:5, Insightful)

Re:RHEL (Score:5, Informative)

Re:RHEL (Score:4, Funny)

Re: (Score:2)

Re:RHEL (Score:5, Informative)

Re:RHEL (Score:5, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Gentoo (Score:3, Insightful)

Re: (Score:3)

Re: (Score:2)

Scientific Linux (Score:5, Informative)

Re:Scientific Linux (Score:5, Informative)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

NPACI Rocks (Score:5, Informative)

Re:NPACI Rocks (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Scientific Linux (Score:5, Informative)

Scientific Linux (Score:3)

Re: (Score:2)

Fedora (Score:2)

Re: (Score:2)

Which editor should he use? (Score:2, Funny)

Re: (Score:2)

Re:Which editor should he use? (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Rocks Cluster uses a modified Centos (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Distro isn't the biggie, it's the scheduler (Score:5, Interesting)

Re: (Score:2)

The distro matters in many ways (Score:3)

This Question (Score:3)

Re: (Score:2)

X window (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

SUSE Linux or SUSE Linux Enterprise (Score:2)

Red Hat for support (Score:4, Insightful)

Scientific Linux 6.0 or RedHat Enterprise 6.1 (Score:2)

Re: (Score:2)

Re: (Score:2)

NPACI Rocks (Score:2)

Funny how 128 cores used to seem like a lot (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

x-window server? (Score:2)

CentOS, Scientific Linux, Ubuntu, Debian (Score:5, Informative)

Re: (Score:2, Informative)

Debian (Score:2)

Response & questions (Score:2)

debian squeeze (Score:3)

Swing the license for RHEL (Score:3)

Building Clusters (Score:5, Informative)

Re: (Score:3)

Re: (Score:2)

Scientific Linux (Score:3)

Debian/Ubuntu (Score:2)

Scyld Beowulf From Penguincomputing (Score:2)