Recommendations On Supercomputing Hardware? 32
dameon asks: "I have been asked by my supervisor to select a replacement for our current SGI Onyx2 space heater. The current setup contains 24-195 Mhz IP27 processors, 12GB main memory, and around 140 GB of total storage space. We use it to run a bunch of CFD (computational fluid dynamics) code. Currently the demand on our system is so much that the jobs are backing up. So, they came to me with two quotes and said: "Which one is better?" I have had limited experience in the field of powerhouse number-crunchers. The two quotes I have received are from HP and SGI. SGI's quote is for: an Origin 3400 with 12 GB Memory, 24-400MHz/8MB R12K's, and 1/2 TB of storage space. HP is offering 3 9000 series N-4000's adding up to about the same specs in total, with the exception of the processors. Hp is offering 550 MHz PA8600's (1.5MB) processors in their setup (it also has more storage space setup with a hyperfabric configuration). All of the software we use will run on both platforms. So, I would like to put this to the Slashdot community: Which one is better?"
"The HP system is freaky expensive, but is the extra 150 MHz/processor worth the extra money? What else do I need to take into consideration? SGI's processors (while slower) have more cache. Overall, what do I need to look out for when spending this much money? What is the best deal? Am I missing another possible solution altogether? And yes, I already suggested a cluster of linux boxes similar to the one at Los Alamos, but the apps we use have no Linux support."
Have both vendors benchmark one of your apps... (Score:1)
In my experience, SGI has a better HPC software development environment (read: Fortran compiler) than HP, but then again I haven't touched an HP system since we decommissioned our Exemplar 2.5 years ago.
--Troy
If not borrow... (Score:1)
Definitely pursue competitive bids from IBM and Compaq.
Good judgement comes from experience, and experience comes from bad judgement.
Re:I'd go with the Origin (Score:1)
Secondly, and straying off-topic, I have absolutely NO experience in Irix, but I currently admin about 40 *nix machines (mostly HP and Sun), and I have found HP-UX to be quite reasonable in the admin department. I would be very interested to know what it is that makes Irix nicer to admin. Seriously. I have heard the same said about most of the major Unix variants at one time or another, but never with any justification. Is it just that some people are more familiar with one (as I am with HP-UX) or is there really a big difference?
Re:We like SGI's (and Suns) (Score:1)
IBM and HP have very fast processors but are also very expensive. Generally not worth it unless you have a good reason for choosing them.
Especially if you are using that kind of memory I would recomend the SGI. It is cheaper. The processors may not be the fasters but they aren't slouches either. If they want to upgrade the machine there is a very clear upgrade path (just plug in more components)
I'd go with the Origin (Score:1)
Back-plane architecture is super fast. It's the I/O champ of big computers.
Nicer admin tools
Your site people already knows IRIX
I'd consider offerings from Sun, but if the only choices are SGI or HP - it's SGI hands down.
Fast RAIDs and SANs: http://www.datadirectnet.com/ (though they don't have any SANs for SGI yet)
Re:Linux and Intel (Score:1)
Heck, if you want shared memory supercomputing power, get the real stuff, either a NEC, a Fujitsu or a Hitachi (Crays are now way behind in performance for their shared memory offerings, they only have NUMA T3s).
For many (real) applications, a NEC SX-5 with 16 CPUs will kick an AlphaServer GS 32 CPU box right out of the water. Don't look at the Top 500 (they're near the top anyway), their benchmark is so parallel that a distributed.net-style effort could get the top spot...
http://www.hstc.necsyl.com/ [necsyl.com]
--
Pierre Phaneuf
Re:NEC SX serie (Score:1)
About some talks about memory bandwidth: the data bus is 256 BYTES wide. I repeat: this is not "bits", but "BYTES"!
There is also a crossbar interconnect device that can connect a number of SX-5s together at 8 or 16 gigabytes per second (I think it's 16, but I'm not sure).
E-mail me if you want a sales contact.
--
Pierre Phaneuf
port your app!! (Score:1)
Hence get your CFD vendor to port their code to Linux/x86 and buy yourself a stack of Athlon (thunderbird) 950's. These are nearly as quick as the fastest Alphas (ev67/667/8M)... For the money you'll get about 4 times the performance out of your purchase if you can go with Athlons... If your CFD vendor won't port then find another vendor!
If you have to go with the expensive hardware then look on the SPEC website (www.spec.org) to see how they compare - SPEC is essentially biased towards the scientific and CFD apps you are running. Look for details on each part of SPEC to see which is closest. Also look at Compaq gs320 hardware - it might be the fastest of the 'big iron'
If Athlons came in duals or quads they'd be the only choice. We're still forced to look at Intels (which are about 30% slower per MHz and cost more than Athlons) just 'cos they come in duals
And once again, try your app on the real hardware before handing over that sort of cash.
Re:Linux and Intel (Score:1)
This assumes a) fast Athlons (not necessarily duals) b) distributed memory (aka MPI) apps are ok.
Linux and gcc aren't superior to proprietary OS's (such as IRIX, Tru64 on SGI and Compaq/Alpha hardware) or their excellent C compilers, but it doesn't matter 'cos MIPS, Alpha, SPARC and HP MHz just haven't kept up with the x86 crowd.
Of course if you need large shared mem boxes then SGI and Compaq et al are your only choice, and boy do you pay for it...
Re:port your app!! (Score:1)
... and before anyone says that a stack'o'smaller boxes won't work - the guy said that his CFD app wouldn't run on linux, NOT that it required a large shared mem machine. Hence distributed mem systems may be an option. Even a pile of es40 Alphas would give you better price/performance than a single big shared mem box... not anywhere near the price/performance of linux/athlon, but hey.
Re:I'd go with the Origin (Score:1)
If you are curious about the Sun, you can spec & price one yourself on their website. You would want to look at maybe a 32 processor E10000. The apps *may* run on it, but they are usually used as database servers. The E10000 architecture is from Cray. Sun bought the E10000 product line when SGI bought Cray Research Inc since the E10000 competed in the same market at Origins. E10000's sell better than Origins though.
Re:There ARE no others to quote! (Score:1)
Compaq has several Alpha [compaq.com] options.
(I don't work for Compaq, but I use a big pile of DS10L [compaq.com]'s.)
Alpha (Score:1)
If you need shared-memory, you'll have to pay the bucks and get one of Compaq's larger systems and run Tru64Unix.
If you can do smaller or distributed-memory (MPI) jobs, get a number of the 1-space DS10L [compaq.com] and run either Tru64Unix or Linux [compaq.com]. The other great thing about going Alpha/Linux is that Compaq has ported their excellent compilers [compaq.com], so you don't have to give up performance by going with Linux.
(Yes, I know you can run Tru64U-compiled executables on Alpha/Linux by copying the appropriate libs, but strictly-speaking it's a violation of the Tru64U license. Please let's not get into a discussion of how this is a great arg for open source solutions, etc. I agree. Run Linux on your Alpha, use gcc if that's good enough, buy Compaq's compilers if you need performance.)
Shared Memory, etc. (Score:1)
Most importantly don't get a machine to which you can't port your code reasonably. BTW, why the hell can't you run your codes under Linux? Is this a commercial app like Fluent, FIDAP, Inca, etc?
Re:We like SGI's (and Suns) (Score:1)
Not to be to snide, but your employer does heavy-duty CFD in MATLAB?!?!?!?!. I hope these scripts are calling C or Fortran somewhere.
Seriously though, what kind of performance are you (they) getting out of MATLAB?
Re:I'd go with the Origin (Score:1)
Re:Shared Memory, etc. (Score:1)
Make the salesmen work for their commission! (Score:1)
Re:I'd go with the Origin (Score:1)
However the champion system for ease of administration is IBMs AIX. The with the "smitty" tool, you can add in a few new disks, and set up some a couple of mirrored file systems (optimized for database IO performance) in about 10 minutes. Compared with about 45 minutes on HP, and, days of manual reading on Solaris.
Re:We like SGI's (and Suns) (Score:1)
PSX (Score:1)
go with the trend: beowulfs (Score:1)
This needs a little more looking into: why won't it run on linux? I assume it's Fortran (or C, in which case you're super lucky). What code compiles on IRIX and HP-UX, of all things, but doesn't compile under linux? Have you tried proprietary (eg PGI) compilers? Is this a library issue? If I were you I'd look into that angle a little more.
Why would I pick a beowulf?
- transparency: no proprietary clustering, SMP or other code. You will need to use either home-coked or open source tools to administer it (ANL's chiba tools being an example of the type), but that is much more transparent than either HP's or IBM's ways.
- Linux: if you have a problem, you have a huge base of users and admins to talk to (ExtremeLinux folks, Beowulf underground folks etc etc). Open source makes problems ultimately the admin's responsibility, and you might not like that accountability, of course. But if you've dealt with large company support, you'd know it's not the best thing to depend on.
- Cost: reasonable beowulfing hardware costs a small fraction of what O2k's, SP's or that class of machine cost.
- Reusability/Extensibility: PCs are easy to downgrade to desktops after a while, just by adding a good video and sound card.
- Scalability: first off, I know from personally having administered beowulfs larger than 24 boxes, that there are next to no scalability issues with that order of number of machines. On the other hand, you can add on as you go. Beowulfs typically perform well in inhomogeneous environments.
I'd go with Alphas, since CFD + a lot of other scientific computing applications in physics and chem are fp intensive.
Hope this is somewhat comprehensive.
Btw, everyone's doing this: it's not risky anymore. On the other hand the beowulf approach has its merits, it's not a fad. That is to say, I think linux clusters, as opposed to SPs are here to stay.
--dubido
...cant you borrow them? (Score:1)
Are you going to keep the present system running in parallel? if not what do you plan to do with it?
If your system is only just starting to back up, and the new [sgi] system represents at least a doubling of power, then with both SGI systems running you'd have 3+ times the power, is that enough for the forseeable future?
Surely IBM and Compaq (ie DEC) are worth getting a quote from before you commit?
Re:I'd go with the Origin (Score:2)
Yes, SAM is nice, and yes, most things on solaris do take days of manual deciphering to accomplish. However, I have heard (totally unsubstantiated, mind you) that, with its reliance on database configs rather than text files, AIX is quite difficult to admin w/o the aid of smitty or Xsmit(smitX?). That's my main beef. Working in a highly heterogeneous environment, I cannot afford to get too comfortable with "flavour specific" tools, lest I forget how to do it all at the command line.
That's why I like to install the GNU utils on every machine I admin.
Unix ain't Unix, but GNU *IS* Unix... sorry, RMS!
(Hey, I like that. Meet my new
NEC SX serie (Score:2)
NEC is involved in some Linux strategy, even at the level of its supercomputers. For example, the SX-5 can use an Intel-based Linux machine as a support system.
--
Pierre Phaneuf
End Result NOT Clock Speed Or Cache (Score:2)
All of those things factor into how many instructions per second it can handle but you can't judge a device by that either. Little things like getting the data in and out need to be considered too.
The software you run might run better with one hardware setup than another too.
I don't think that you're going to know which one is better until you try each computer for yourself. Crunch the same numbers on each and see for yourself which one you're more happy with.
You wouldn't buy a car without a test drive, you wouldn't buy a house without a walkthru. Don't buy a computer until you've cunched some data. With the kind of money your company will be spending for the darn thing you're going to have to live with what you get for a while, be damn sure you're happy with it's performance.
Other quotes! (Score:2)
Re:Other quotes! (Score:2)
Re:...cant you borrow them? (Score:2)
#include<std_disclaim.h> (Big Blue)
--
There ARE no others to quote! (Score:2)
I think you'll find that the 400MHz R12k's (of which we have 8 in an Origin 2000) kick serious ass. The huge cache and fat pipelines beat 800MHz Pentiums by a factor of at least four in our cursory benchmarks on some of our non-parallel codes.
Bingo Foo
---
HP thoughts... (Score:2)
I am working for a small company with needs for reasonable computing power and we just purchased a top-of-the-line HP workstation.
I am more of a hardware guy. This machine is very well built; not like the cheap 712/60's I've used in the past. The electrical and mechanical construction of this computer reminds me of the big iron machines of the past.
Plus, it's faster than hell.
So, from a hardware standpoint I love HP; it is sweet. But as others have remarked, HPUX is a pain to admin--at least for a HPUX newbie. I have admin'd Linux and BSD boxen for seven years, but HPUX is a new animal. If it ever gets even a tiny bit confused, it throws it's hands up in the air and vomits. This annoys me.
As far as SGI systems go, they have never struck me as very well built, even though I like them. A few years ago the University I was attending (getting a MS in EE) purchased what was then the fastest graphics computer in the world, a many (128?) processor SGI/Cray in nine racks. The dust level in our machine room (which was pretty high by computer room standards--it was installed by Univac for the 1108) caused it to fail within a month. Even with the dust problems fixed (the FS guys took the entire thing apart and vacuumed each board--plus replaced a lot) the machine has been a constant battle, and Irix issues cause it to reboot frequently.
An 1986 vintage HP9000 series 800 mini (the fastest box in '86) has been running nonstop in the same room for going on fifteen years with almost no downtime. Yeah, it's old and is more dust-tolerant, but it also seems better built.
Clearly I'm not an expert, but I would vote for the HP hardware over the SGI. Too bad you can't run your apps on the HP box with Linux installed...
We like SGI's (and Suns) (Score:3)