Which Processor Is Best For Real-Time Computations? 232
NoWhere Man asks: "For the longest time my friends and I have been arguing over which processor is better (Intel or AMD). I know this is an ongoing battle everywhere as well, but it took an interesting turn the other day. Which processor would be better for realtime, high end mathematical computations? AMD's Athlon? P3 Xeon? or Dual Processors? If anyone could recommend system specs, keeping it cost effective at the same time, it would help."
Re:Couldn't you just analyze the program? (Score:1)
Unfortunately, you CAN'T do that. It's not just the instructions that get sent to processors, but the things inside the processor's registers (which can get modified after an instruction) and the cache. So while you can send instructions to two processors, they can't share registers (or if they can, you can't guarantee that they'll be accessed/modified in the right order).
For those of you who aren't programmers/engineers, registers are what hold your immediate data inside the processor. Data which changes or is accessed frequently is stored in a CPU register. Access to a register is *much* faster than to memory. Each CPU has its own registers. At the lowest level, the entire program (thread/process) is controlled with these registers. Since registers aren't shared between processors, this sort of thing isn't possible.
This is why a single thread can go only to one processor. If your program runs using multiple threads or processes, that's a different story.
Re:Couldn't you just analyze the program? (Score:1)
Yeah. And with 9 woman, you can make a baby in 1 month (Something like, one do one leg, another one do internal organs, and so on...)
Cheers
--fred
Consider timer interrupt frequency. (Score:1)
The x86 processors have a timer interrupt every 10 milliseconds, the Alpha every millisecond. If one is using a non-real time OS, the greater interrupt frequency will give nearer real time performance (correct?).
PAC
A Beowulf cluster of 65C02 emulators? (Score:1)
Oh wait, today's the second...
Mark Edwards [mailto]
Proof of Sanity Forged Upon Request
Re:Alpha Alpha Alpha! (not $$$ SGI) (Score:1)
If SMP did not work, then why clustering? (Score:1)
Non-x86 Architectures (Score:1)
Re:Alpha Alpha Alpha! (not $$$ SGI) (Score:1)
Re:the 68000 BABY!! (Score:1)
the best system ever (Score:1)
VMS 5.2
8 megs of RAM
you get one of these puppies, you be encoding mp3s at the rate of 1 every
few days! i have used one of these for years and i must say it is simply the
best.
ps: VMS RULES YR WORLD
pps: eat me.
------------
a funny comment: 1 karma
an insightful comment: 1 karma
a good old-fashioned flame: priceless
Re:I don't think you understand what you are askin (Score:1)
Re:the 68000 BABY!! (Score:1)
Assuming it's the latter, he'd have to plug those 10000 processors into some other hardware to make them work. Even if it were possible, it would be very difficult and expensive. Plus the fact that the overhead of the communication between all those nodes would seriously diminish performance.
Re: complaints from Tyan and Abit (Score:1)
this just in-sorta related (Score:1)
Athlon or dual Celeron's (Score:1)
Re:Intel (Score:1)
Re:R.A.I.P. (Score:1)
Check out the new toys at Microway [microway.com] -- 750MHz 21264s, 48-node Beowolf clusters, woo hoo!
Re:Broader View (Score:1)
As far as I know, the Athlon chips are SMP capable. The problem is nobody has yet produced a chipset which supports multiprocessing Athlons.
I think AMD is currently concentrating on the desktop market (where few people go for SMP) which is why the haven't been as aggressive as I'd like them to be getting out an SMP motherboard.
Initially (last summer), AMD's faq said to expect a SMP board Q1 of this year, obviously that hasn't shown up yet, but the last I've read is that the AMD SMP chipset (and MB) is due sometime in Q3.
A hope they hurry, not being able to by an SMP Athlon is the only reason I haven't upgraded my system yet.
Re:I don't think you understand what you are askin (Score:1)
It's kind of like "The Hitchhikers's Guide". We all know that the answer to the ultimate question of Life, the Universe, and Everything is '42'. The problem is that none of us seem to know what the actual question is...
On the other hand, the poster may be asking *exactly* the right question. In which case, I'm rather curious as to just what it is he *is* working on.
Sincerely,
Steve Bergman
Re:multi-threading (Score:1)
Problem domain is what matters (Score:1)
Re:But dosn't having one CPU keep latency down? (Score:1)
Re:to ALL my brothers and sisters in latency (Score:1)
You're the first one I have seen in this post that understands whats involved. Who cares how fast your CPU is if your A/D converters can scan only 10 times a second. Trying to model processes that run at a rate faster than your hard drive's seek + write times, is sensless to do in a real time system...because it cant be done. Unless you want to throw away all data except then final calculation. But if you do that...how do you verify your model is correct??? Obviously, from this discussion most
Re:Couldn't you just analyze the program? (Score:1)
I understand, the forthcoming IA-64 architecture is supposed to implement this very scheme in the hardware. It is one of the reasons that the compilers for the IA-64 chips are going to have to be 'smarter'.
Re:Architecture makes the difference (Score:1)
Perhaps I should be a bit more specific. When I hear "real-time" I think "hard real-time." There is also "soft real-time," such as that used in multimedia apps, which basically means "as fast as possible."
Neither; pick a DSP (Score:1)
You'll find them doing things such as encryption, compression, audio/video processing, yada, yada. All in real time.
See this [ti.com] for more. (a bit dated but still relevant).
Re:Discrete Event Simulation PIII -v- SPARC (Score:1)
It's hard to predict the speed difference of machines unless you take everything into account. A faster CPU is just one part of the equation.
'real time' mathematics (Score:1)
The question you need to be asking yourself is what real-time means for your application, and what calculations you need to perform in 'real time.' In the real-time data analysis example, if your data arrives every 10 ms, real time means cooking everything in 10ms. Fine. In comparison to a wimpy old 386, an RS6000 performs many more ops per 10mS cycle, and the e.g. FFT you need to take can be so much more accurate than it was before. If you intend on doing real-time mathematics, you should really be writing optimized assembly. Yep- instruction sets matter. And how your code is layed out matters. This will make much more of a difference than Athlon vs. PIII.
In the end the choice of which x86 processor will be nearly irrelevant. The goal of real-time is to get some realistically accurate calculation fitting in a bounded time interval. Once the processor is 'fast enough' or your code is 'fast enough' it doesn't matter.
ps. I am about to flame you : )
Re:The stupidest question I've ever heard (Score:1)
I might add, why are they even asking about Intel or AMD? If you want to do high level computational maths then I would NOT recommend either of these.
Maybe A G4 (with altivec) could be considered, or perhaps Sparc or MIPS or Alpha.
Oh but that's right. Nobody on slashdot would have heard of these processors, because we are all x86 user's right. And we overclock our celeries to 600MHz!!! And we run Redhat Linux cos it's da best and we can compile open-source from rootshell.com
Get a life.
Does what used to be a *decent* forum for geek's have to be turned into a place of cluelessness and trolls by a bunch of teenage kiddies?
pfft.
anything but x86 (Score:1)
Re:Thinking for difficult operations (Score:1)
Re:How much are we talking about? (Score:1)
Re:Athlon (Score:2)
I am talking *nix here. The flavour is not really important, but for number crunching your *nix machine you don't want X windows (except for Mathematica), and you definitely don't want it running your web/mail server.
Any *nix really - it is not overly important with the exception that it must not be a "leaky" implementation, and it must have a good, optimised gcc and fortran90 implementation. Another useful language (among others) is perl (yes, perl).
A single-CPU process on a Cray J90 will run at approximately the same rate as it will on an x86 (PIII-550), provided various operations are made on the same chunk of data (i.e. it is operating out of the P3's cache).
A celeron will not perform as well as a PIII if there is not the same data having continued operations performed on it - this can mean that trivial programming decisions such as putting a do-while outside of your for-next instead of INSIDE can mean minutes in program runtime. What I am saying here is that CPU cache is of great importance if you are dealing with anything but small quantities of data.
Sun (Sparc), SGI (MIPS) and the other traditional number crunching processors work well, but a lot of this is the systems they are plugged into. A 4-CPU Ultra Enterprise 450 whips butt over something else that clocks at the same frequency - but remember the UltraSparc processors can do more than many others; and they are all-SCSI.
The motorola-family processors: 68K, PPC, etc are seriously useful for number crunching. We have tested clustering 200MHz PPC processor-based (603e) machines (actually Mac clones) and the results were impressive. Even single threads worked suprisingly well for some complex tasks. From people I have been speaking to, they have had similar results with G3s.
Morons! (Score:2)
Athlon 600 : specInt: 28, specFP: 22
Pentium III 600: specInt: 24 specFP: 15.9
Add to this that floating point calculations are the more important of the two, and the athlon is the clear winner. Now, there is another contender. The IBM 7400 (G4), it's spec scores are:
specInt: 450Mhz specInt: 21.4 specFP: 20.4
pretty close to the athlon 600mhz. But the important part is the Altivec unit (Velocity Engine), which is a monster 128 bit wide SIMD math destroyer. The only thing is the software has to be optimized for it. With an Altivec enhanced RC5 decryption client, an G4 450 outperforms a 1ghz (700mhz overclocked) Athlon. With properly coded programs this thing absolutely screams. So, if you are writing your own program, and are proficient enought to include altivec, a G4 may give you the most bang for your buck. The only way I know how to get one is to buy an Apple thought, I hear IBM may be releasing reference boards for Linux systems though. Check it out. As for multiprocessor systems, unless you know specifically that your math calculations won't be done in a series of steps, i.e. one calculation can be performed without knowing the results of a previous calculation. Which, by the way, is rather unlikely. I wouldn't suggest them. If you want some more information, check out these articles.
G4 vs AMD Athlon:
http://www.arstechnica.com
Comparison of Altivec, and some other SIMD's:
http://www.arstechnica.com/cpu
Pentium III vs. Athlon:
http://www7.tomshardware.co m/cpu/99q3/990809/index.html [tomshardware.com]
http://www7.tomshardware.co m/cpu/99q3/990823/index.html [tomshardware.com]
Spec scores taken from http://www.ugeek.com [ugeek.com] If you have anymore questions, you can email me at guso@geek.com
Real-time and high-end math don't mix well. (Score:2)
Most math problems take a variable amount of time to do. And if you don't want to always use the worst case, you can't do it in a garunteed amount of time.
The best solution, in my opinion, for a system that, say, collects data in real-time and does analysis on them is to have a machine (or part of the machine -- wait a minute) running on a garunteed real-time operating system and the math stuff queued up and done later on another machine. For instance, have one machine do measurements and spit out data over a serial port, and another machine that reads from the serial port and does the fourier transforms or whatnot.
You don't necessarily need another machine, however. Real Time Linux allows you to have garunteed processor time / time interrupts... all the things you need for Real Time tasks... and you run all the rest of the Linux stuff after all the real time stuff finishes. This means you could use the same machine for reading as for analysis. Have the data collection as a real-time thread, do the analysis and other stuff in normal mode. I bet other OSs have this too, but I'm only familiar with RTLinux.
If you really need to do the heavy math in real-time, I'd test how fast the math stuff runs on it, and make sure that it runs in roughly 1/4 the time you need. That should leave you with enough leeway so that you don't have to worry about caching, etc. as much, but can still leave the caching on (because caching really helps). Unless it's something where you can never fail, ever, like where human lives are at stake. But then you shouldn't be using plain-vanilla PC stuff anyay. In any case, you'll have to run Real-Time software (like Real Time Linux or no OS and do interrupt stuff).
I still find that the best solution for doing real-time stuff is some nice microcontroller code, if it doesn't involve too much heavy processing power. Stick a PIC in there, you can count up exactly how long it will take really easy.
Broader View (Score:2)
PII's pretty much smoked the hell out of the K6-2. At the time, the K6-2 was mainly just a low-cost alternative. Along came the K6-3, however, and that all changed. (Unfortunately, the K6-3 seems to have slipped between the cracks, and is somewhat hard to find these days.) On an identical system I had both a PII-450 and a K6-3 400 (of which, I might add, the second cost about 1/3rd of the first). For floating point, the PII was certainly more impressive - Quake ran at a signifigantly higher framerate. But for most everything else, from running Netscape to compiling the kernel, the K6-3 pretty much rocked the PII!
So the K6-3 is now my server processor. My website [dusk.org], my mud [dusk.org], and in fact any non-FPE duties I delegate to those nice-n-cheap K6-3's. (You can get a K6-3/400 for $80 now, and there are 475 and 500 mhz versions on the way.)
If you have signifigant floating point operations, then the PII smokes anything in the K6 series.
On the higher end, the PIII is not much more than the PII - just higher clockrates and some FP enhancements. Coppermine gives it a nice speedy bus throughput, so certainly I would say that a PIII/copermine/SMP system would certainly make a very nice server - not cheap, but still cheaper than the equivilent in, say, Alpha processors.
The Athalon, on the other hand, destroys the PIII when it comes to floating point performance. Anything that relies on raw FP performance, such as ray-tracing or other 3D rendering, will show the vast superiority of the Athalon. For other tasks I believe that the Athalon and the PIII (w/ copermine, anyhow) are more or less equivilent.
However - and here's my big complaint - there's still no SMP Athalon! That really, really sucks. Considering that the Athalon is down to $1 a mhz for the mid-range speeds (eg, 700mhz or so), it's almost a crime that there's no SMP motherboard available. A two or four processor Athalon system costing less than $2000 could probably do the same amount of rendering as a $10,000+ Alpha system. It's a REAL shame.
(d) None of the above (Score:2)
Do you know what real time means? (Score:2)
The two processors pitted against each other rely heavily on caching to achieve their performance. Caching makes it difficult to make real time predictions, unless you stick to the worst case analysis: i.e. ensure that the deadlines can be met even with all caching disabled.
As for the software, you need a real time operating system. Not a workstation OS like Linux that can disable interrupts on a processor for eons of processor time, and cannot be preempted while running kernel code.
I think you may be talking here about quasi-real time or soft real time: which means ``fast enough to draw pretty pictures on my screen at a decent refresh rate when my system is not too bogged down''.
Re:I don't think you understand what you are askin (Score:2)
His question, as stated, can not really be answered. Thus, we have to second-guess the guy in order to provide *any* answer. "Real-time, high-end mathematical computations" doesn't make much sense. About the only systems that really match that description are military grade custom hardware systems. If he is truly trying to implement such a system on commodity x86, then he doesn't understand what he is getting into. This, in turn, means that he hasn't done his homework, and is wanting the Slashdot community to do it for him. To use the Unix lexicon, he should RTFM.
Now, if he *had* done some homework, and was instead asking about people's particular experiences with certain systems or configurations to gauge how the theory works out in practice, then I would have sympathy for his cause. But, asking about *just* the microprocessor implies no understanding of the situation. Actually, your processor has very little to do with the real-time behavior of a system. Real time characteristics are more influenced by choice of OS and memory system.
However, I doubt that he was really asking about anything hard real time. He may be trying to build a shoutcast server, on-the-fly mp3 encoder, or somesuch. This is a fairly interesting project, and fits the description of "soft" real-time. It does not fit the description of "mathematical", however.
Given the scenario presented doesn't much pertain to processors, it seems to come down to "Which is better: Athlon or PIII". Without background, we can only answer this in the general case. In the general case, this question has been answered *many* times over and does not need to be repeated in Ask Slashdot.
I'll apologize now if I came off as being abrasive, but it is irksome when people ask questions that they don't even understand enough to communicate properly. My impression is that he threw "mathematical" in there just to sound more interesting, but I could be wrong.
--Lenny
Re:Consumable Processor Units (Score:2)
>you see a 50% increase in overall
>system performance
This simply isn't true if you are running a single instance of a single main application, and there are only background OS tasks competing for the cycles. If your OS supports thread/CPU affinity, you will see one CPU go to 100% utilization, and the other sit at around 2 - 3 % servicing the OS tasks. If your OS does not use affinity and tries to spread the thread between CPUs each timeslice you will see both your CPU's utilization at 52% or so.
If your machine was heavily loaded and your app was slowed because it was already competing for cycles with other processes, then what you said is true, but I don't think that's what the original question was looking at.
-Andy
Consumable Processor Units (Score:2)
The question is a little broken, though, because in this day and age the trailing edge processors (eg, Celeron-400) are so cheap that you would be better staging your machine to use, say, three drops of trailing edge processor(s) on a BX PC100 motherboard, and upgrading as Intel and AMD update their price lists. You should bear in mind that the fastest CPUs are only double the speed of the modern trailing edge ones (I mean here Celerons, not AMD K2s), yet cost five, six times the price or more.
I am sure plenty of people will disagree, but nowadays the CPU is more or less a consumable (especially if you have other Slot 1 motherboards that can get the hand-me-downs).
-Andy
Not an x86 (Score:2)
Re:DO you know what Super computers are for? (Score:2)
Seriously - a couple of points
(1) - I'ts pounds not dollars - yep - almost 400,000 dollars worth
(2) - The DS20 has and even faster memory bus -- I mean Waaayyyy faster - I've seen benchmarks for 8 processor alpha based Beowulfs and they're *still* faster than the SGI hardware.....
Of course... i've always thught of SGI as being nice graphics platforms... so got knows why we got a big cabinet with no graphics card...
Re:Alpha Alpha Alpha! (not $$$ SGI) (Score:2)
Re:New category (Score:2)
If the OS supports threads, there is a good chance that if your applications are designed for threads then the OS will spread the threads across processors. Having come for UNIX in the 80's on x86, to OS/2 (very threaded) in the 90's (migrating to Linux now) I have to say there is nothing like a well threaded application and a OS that really supports this (OS/2 and BeOS come to mind today). Nothing like it. In the 90's I was emailing all the x86 clone manufacturers in hopes they attack Intel with multiprocessor systems but they kept playing the MSFT/Windows game and there haven't been too many survivors. Windows didn't and doesn't thread that well (NT is only OK).
The just of it is, if your current apps aren't multithreaded or you don't run more then just a few applications at one time, SMP won't do you much good.
depends on type of computations... (Score:2)
--
well, actually (Score:2)
why? altivec.
Speed improvements are always arbitrary. Yes, there are times when a G3 350 will be twice as fast as a pentium 350. THere are times the pentium 350 will be faster. Benchmarks are not something you should be listening to, and different processors will be better at different tasks.
However the question is not "which processor is better overall"; the question is "which processor is better for real-time heavy computational math". In which case you really kinda do probably want to go with the G4. "real time" implies you are going to be taking one speciallized [difficult] task and doing it over and over and over with different data, which is what Altivec is designed for (SIMD) and what it excels at. As long as you are willing to go ahead and specially code for Altivec, in this case you will get a speed jolt virtually unparallelled.
Unfortunately, due to manufacturing problems, Motorola and IBM [appleinsider.com] are for the moment having trouble [appleinsider.com] making G4s that run at over 500 mhz, and there are _still_ no multiprocessing G4 mobos available as far as i am aware.
So as soon as the third parties would actually get around to shipping a SMP G4 mobo for use with linux/bsd (apple is a bit tied up in their own problems..), that's what you'd want. As of now G4 may not be the best choice. A good choice to be sure, but i'm a bit dubious as to how well a single 500 mhz G4 would do against, say, four 800 mhz athlons.
Re:Flamebait?? (Score:2)
"Um... yeah, sure... but first, I have a question to ask you. You all have embedded microprocessors in your bodies, right? So my question is, which type of microprocessor is best?"
"AM--Int--MIP--Alp---ARRRGGGGGGGGGHHHHH!!!!!"
*cube blows up*
Re:PPC 7400 (Score:2)
But as other posters have pointed out, "realtime" math processing requires much greater performance than any chip designed for PCs and workstations. This is where we get into the supercomputer realm.
Now for a Beowulf cluster, which still really isn't designed for "realtime" processing, but it might be good enough for a particular application, for math my money's still on Alpha... some of the fastest Beowulf clusters have been based on Alpha.
I will always go back to this, though: it depends on the application. 99.99% of the time, you don't really need or want real-time processing. Its just too expensive and requires really, really sophisticated hardware. In many cases, when you THINK you need real-time processing, what you REALLY need is real-enough-time processing. Which brings me back to Beowulf and Alpha.
PPC 7400 may be good for DSP-related stuff, but an all-around math chip it ain't. The Alpha is it.
Re:well, actually (Score:2)
--JRZ
Re:Couldn't you just analyze the program? (Score:2)
That's the wrong question. By far the fastest methods to calculate primes from 1 to N, for some N, are algorithms based on sieves. Simple calculations, calculations that can easily be parallalized, but sieves take memory. You're accessing memory all the time, while doing trivial calculations. Large amounts of RAM, a fast and large cache, fast memory banks, and a fast disk (for swap) are more important than processor speed. Even better is a tailored algorithm dividing the work in chunks to minimize swapping.
Processor speed might be interesting for some, but it's utterly pointless without context. A slow processor with a large cache, can do many things faster than a fast processor with a small or slow cache.
-- Abigail
Real time description (Score:2)
To summarize, I think there is no best solution. Determination of response time verses throughput requirements, Floating point verses logic computation, hardware acceleratable verses pure SW execution, which OSes are allowed, custom verses off the shelf are all huge factors and completely change the rules.
First of all, no VM.. None.. period. I believe you can accomplish this buy setting swap size to zero, but that's not generally enough.
You're probably not going to want a multi-app optimized OS like windows ( don't flame please ) or UNIX.
You're going to want one of the embedded OS's or just hard-wired drivers. You could even get away with DOS. Where I work, we use QNIX, which is a bit aged, but labels itself as a real-time OS, so I assume it's their primary focus. The best part is that you get all your UNIX functionality, plus several really cool features and network-centric operations ( even more-so than UNIX ).
If you seriously need response time ( and we're talking micro-second response ), then you're probably going to want an embedded processor. I remember back in the days of the 486, there were embedded varients ( I can't even remember the names anymore ). You don't want interrupts to be a part of your basic operation, since you're stealing cycles in an unpredicatble way. IO and polling all the way baby; Can't get much more deterministic than that.
If you're doing much of a custom job, then you might do well with a co-processor type CPU, that gives you the added flexibility of your design. MIPS still takes this approach I believe. Plus there are plenty of high-perf off the shelf Co-Processor designs. DSP and geometry processors are readily available as seperate chips ( Glint comes to mind ).
The author seems to speak about name-brands, so I assume they're not dealing with anything so intricate.. Most likely, they're thinking of MS and real-time apps like video etc. If this is the case, then I'd have to say, the CPU with the quickest response time or the CPU with the greatest ability to handle your type of data.. Obviously floating point is going to point towards Intel ( unless you're dealing with Athalon ). But if you don't use FP, then AMD's K6 line had the shortest pipeline for the bang. A K6-3 is probably your best bet. It has nearly the caching capability of a P-II, comparible integer performance, but lower latency ( especially with branch misses ).
If these calculations are graphics based, again, seperate specialized components are going to be your best bet ( high-end video cards are now parallelized )
If this doesn't suite your needs, and general computation is your requirement. The more cache the better. It doesnt' suite real-time in that it's not deterministic, but, you will achieve more noticable throughput. When Athalon gets it's 2 and 4 Meg caches out the door, I'd vote for that.. But a maxed out Xeon is probably your best bet ( no numbers to back me up.. sorry ).
On the topic of SMP. Deterministic time can not be garunteed. But if we're talking about throughput ( such as live video ) and not response time ( such as an missile tracking system ), AND your application is algorithmically threadible, then 2 or more CPU's will be worth your while. For example, I get nearly dbl my MP3 encoding performance with a dual celerons ( but only when encoding multiple wave files ). If, for example, you were dealing with some non-hardware accelerated video and audio stream, then dual CPU's could probably work to your benifit. ( there's simply no excuse for not having hardware support though ).
Assuming multi-CPU configurations, Athalon has a better setup than the Xeon. And if Athalon could ever get their L2 memory size up to 4 or 8 meg, then you'd have a high-throughput device.
Now, when we introduce price, I'd probably have to say that Athalon is going to win out here. The Xeon just falls out of the picture, value wise.
The coppermine's optimal cache configuration applied to the Athalon would seriously make it the best all around ( minus the deep pipelining/latency ).
The alpha seems to be a good contender everywhere except price. Small simple ordered pipeline. Decent cache size, good bus and SMP features. Problem, of course is price and application availability.
Re:well, actually (Score:2)
XLR8 [xlr8.com] is working on a multiprocessing G4 upgrade card which should be out by the end of 2000.
Thinking for difficult operations (Score:2)
What really strikes me as a problem is that to actually get to the point where mathmetical calculations are a real problem you have to get quite far in your education. Largely CS majors don't have to (and I argue shouldn't have to) know anything above maybe say trig or so.
Unless you are going to work designing something that is actually doing said math (writing Maple/Mathmatica/Mathcad) is almost not used. I have looked over the majority of code for most of the OSS applications that come in a modern Linux app and there isn't one shread of calculus level math. Also considering how many people have historically failed calculus (same with Latin and such) I think this is really hard to measure. The most complex thing I ever did was calculating something like upper levels of digits of pi with a simple C program and that worked fine for everything up to about 300,000 on a 386 I had handy. Plus isn't getting a quad CPU computer a rather large hit in the wallet? I would almost bet they don't even sell them anywhere in a local way. Also judging from the almost complete lack of advanced premade software to do such calculations I am almost at a loss to determine how such software get produced. I think that almost all of it is produced from people who have multiple PHds and such.
Again the most complex piece of software that I ever saw sold was Mathmetica (can't even buy that in any local stores).
Is there any say quite easy kind of text that can teach a person say with nice graphics, tables, figures, and example problems extremely advanced math which would allow someone to get a bearing on what kind of purtchess to make? Is there an incredibly advanced software package (for some PC type system or similar Mac/Windows/Linux machines) ? Or is this all just pie in the sky stuff.
The reason I say this is because most of the books beyond say standard calculus books (because people such as I find it dry, irritating, hard, and a general nightmare) are just dry tomes that present information in a difficult to comphrend way and procide few if any explanation or actual implimentation details.
Is there a general algorithm for say breaking up calculations into steps or pieces that can be done on a machine one step at a time? I mean maybe people would feel better if when calculating some problem that could take a while to solve that the steps (in say machine code/asm) could be displayed and interpreted?
Couldn't you just analyze the program? (Score:2)
How much more effort would you have to do say in a standard C++ program to get it to fully equally use the 2 processors in doing something like calculating all of the primes between 1 and 9,000,000,000,000? Are there good examples of this? What is the absolutely cheapest dual processor system that one could get? Where is this sold?
Re:Architecture makes the difference (Score:2)
Well, FPU performance is good, but OTOH I happen to like really good integer performance (probably because I code silly things like crypto that doesn't use floating point at all). For that I like Alphas or Athlons. I'll bet PPCs and G4s are good for that too, but I'm not much of a Mac person.
Only problem in this case is cost, since the average Alpha system, IIRC, costs more than most x86 systems. That might not be true, so do your research.
A low end Alpha will cost more than a high end Athlon (I'm generalizing here). OTOH, the "low end" Alpha will kick the Athlons butt (much as I like Athlons, the x86 architechure limits what it can do). However, since this guy seems to really be asking about games, he probably is running Windows 9x, so he wouldn't have too much fun on a Alpha running Linux.
For actual real-time stuff (ie not games), I'd go with something ARMish. Ah, those cool little ARMs.
Re:the question is ill-posed (Score:2)
I'm not too much of a gamer (except Freespace2, oh, love that game...) but I think your 3D card is much more important that CPU. For instance, I have a PII-350. I used to have an i740 video card with 4Mb of RAM. Games ran slow and at low res. Then I got a Voodoo3 3000 16Mb of memory (good drivers probably also had an effect). Games ran fast. I was happy. etc etc
Render farms, other highly parallel, low internode communction applications: commodity x86 systems, the more the better.
Or really big SGIs. Or both.
RT control of other systems/experimental setups: I personally prefer the StrongARM series of processors for this role, since the price/performance is practically unmatched, the documentation is through, and programming in assembly for the SA is truly a joy compared to the hideous mess that is the x86.
Do you know where I could find info about StrongARMs? I know DEC designed them and Intel bought it (it being the ARM design, not DEC!), but that's about it. I'm just thinking it would be really fun to buy an ARM (or more probably something that has an ARM in it, plus memory and control boards and etc) and mess around programming it (over the serial port of a PC?). It would become my robot slave! Wuahahahahahahahahah!!!
Data mining, data warehousing: I don't have any personal experience with these applications, but I have heard good things about Suns
I've also heard that in reference to file servers, etc. However, I can say with some experience (ie, the CS dept here is a Sun shop and so is where I work) that anything smaller than an Ultra2 is fscking slow. The nice big servers rock (of course), but I'm not totally convinced that an Athlon (or dual PIII or whatever) with a lot of RAM and big SCSI disks wouldn't do better (and it would certainly be cheaper).
Re:the best system ever (Score:2)
Discrete Event Simulation PIII -v- SPARC (Score:2)
Recently I did a project involving discrete event simulation.
A certain task took 11.827 second on a PIII 500Mhz system running linux, and 16.588 on a SPARCII at 300Mhz. The equivalent time for a 500Mhz SPARC would be 9.953 seconds (I think, just scribbled out the calculation now!).
This gives the SPARC the edge, but there are other factors like the OS to consider. Both machines had 256Mb memory and used gcc to compile.
However if cost is an issue, this will give the x86 chips an advantage. Depends on who is paying really!
Then you have to look at whether you are writing your own code in say MPI to work over a cluster of machines or something to work on a highly SMP machine, like an SGI or high end SUN box. It very much depends on what you plan on using it for; high end maths physics applications - get a nice sun or sgi box if you have the budget!; if you dont have the budget or dont need 'that' kind of high end then a PC will do. And im sure there are plenty of people willing to carry out the which processor is better argument here!
Cheers
~Al
Re:Flamebait?? (Score:2)
My limited knowlege of SMP (Score:2)
The software DOES need to have smp support to run to it's fullest using multi-processors, however that does not mean that the program will not run, or will not run faster with smp. Where the drawbacks of smp come in is in the fact that if a program is not made with smp support, it could run even slower than with a single processor because of the fact that the software will not know how to properly divide the data into more than one processor.
For example, If you are using a program that uses heavy mathematical functions, made for smp, the software will understand that it is quicker to give all of the data having to do with X together to one processor, instead of splitting it. Without smp support, the OS or whatever does it will try to split it as best it can see, where it may be not the most efficient method or maybe even the worst method.
that's just what I understand, anyone who can correct me is more than welcome.
How long is a piece of string? (Score:2)
Re:Couldn't you just analyze the program? (Score:2)
Many non-technical people assume that a system with two 500MHz processors equals a 1000MHz machine, because 500 + 500 = 1000 (unless you're using those old Intel chips, eh? ;-). This is not true. However, this is not true; processing power is not cumulative. Two 500MHz Xeons are not one 1000MHz Xeon, they're two 500MHz Xeons! Simple, ne?
Your idea of "dividing the program into two halves and having one processor work on each" shows a definite lack of understanding about how computers and parallel processing work. Even if the instructions being processed were completely irrelevant, which is what your idea of processor utilization would require, you would gain no performance advantages.
Parallel processing is most basically dependant on having a multi-process OS, such as Unix. Simple operating systems such as Windows 95 are mechanically unable to utilize more than one processor at once. This is perhaps the most fundamental difference between 95 and NT.
Something that trivial would be rather simple -- one processor computes the primes from 0 to 4.5e12, and the other processor from (4.5e12 + 1) to 9e12. However, any real application is better off being designed "from the ground up" to use multiple processes. This way, no speed will be lost on single-processor systems, and multiple processors may be utilized on SMP systems.Heh... yes, dear boy, parallel processing has been common practice for some time now. Anything designed to run on a server, from httpd to Oracle, uses multiple processes, even when it doesn't use multiple processors.
Two Celeron 350s and an Abit BP-6 mainboard. The board will run you between $115 and $135 (in USD), and the processors about $35 each. You may be able to find better prices. About the Celerons: spend the extra fifteen bucks each and get 400s or 500s. If you get the 350s, make sure you get the versions WITH 128k L2!!
Something tells me, perhaps the fact that the UID seems awfully familiar, that this fellow was trolling. Oh well.
Re:Here's what I just did (Score:2)
Of course, it has NOTHING WHATSOEVER to do with realtime computing.
Processor Benchmarks (Score:2)
Here are some SPEC CPU95 Benchmark results. Sorry they are not the latest greatest processors, but this is the most recent I could find.
Processor (Floating Point (SPECfp95), Integer (SPECint95))
Digital Alpha 750 MHz (75, 38)
Intel Pentium III 800 MHz (32.4, 38.9)
AMD K7 Athlon 750 MHz (33.0, 26.5)
As you can see the Alpha really whips the P3 and Athlon, but the pricing for Alphas is also rediculous. The motherboard and processor for an Alpha 750 will likely run two or three thousand dollars. A P3 800 will run about $750 for just the processor, and an Athlon 750 will be aproximately $320. For price/performance Athlon is definately the winner.
Re:the question is ill-posed (Score:2)
Re:What calculation? (Score:2)
Bullshit. On most applications, good compiler generated code is not usually more than 60-70% slower than hand-tuned assembly. I've seen some exceptions to this general rule (ie. naive dot product vs. scheduled dot product on the x87 FPU stack), but the worst I've ever seen was about 8-10 times slower. Show me an application which gets a "50x" performance increase from writing the assembly yourself. Hell, just show me a code fragment from the kernel of the function, and I'll either show you why the code is either miserably written or I'll submit a patch to GCC to optimize for that case. That's a promise and you can hold me to it.
Re:What calculation? (Score:2)
I agree with your point about specialized hardware; that's why I didn't bring it up.
Re:yo! compilers! (Score:2)
BTW, O(a^n) in standard notation is ill-defined, but would probably be interpreted as an exponential growth function with some arbitrary, but fixed, constant a. That places the problem squarely in the category of intractable. Although there are some very clever algorithms that can help in some cases (lattice basis reduction comes to mind).
It depends on the problem (Score:2)
Another problem exists: not all mathematical problems are easily made parallel (or, whithout a significant deal of overhead). Even worse, problems exist which can only be solved sequentially. So for example, having a 4-CPU 400 Mhz box will not always be able to outperform a faster 500 Mhz uniprocessor.
Generally said, it completely depends on the problem.
You can't emulate for real-time (Score:2)
Re:New category (Score:2)
Usually you get the best results if a program can be diveded into several processes, the os can then spread these over all processors. I believe Mosix works this way, for beowulf the programs have to be specially written in order to use it.
Grtz, Jeroen
Re:Couldn't you just analyze the program? (Score:2)
This could be done, but than you would get problems if instruction 2 depends on data that will be created with instruction 1. Or if instruction 1 is a jump to instruction 10 in which case instruction 2 should never have been executed. You see there are a lot of situations in which this would not be possible.
How much more effort would you have to do say in a standard C++ program to get it to fully equally use the 2 processors in doing something like calculating all of the primes between 1 and 9,000,000,000,000? Are there good examples of this?
This depends on how splitable your calculation method would be, again if one calculation is dependand on data from another you might get one processor waiting for another. Effective paralel computing depends very much on how much and what kind of traffic you have between the processes.
Grtz, jeroen
Don't need an editor. (Score:2)
R.A.I.P. (Score:2)
I'm not sure when the point of overhead in processing would overcome the advantage in numbers.
Now, if it was not just redundant processors, but you offloaded some of the processing to special purpose processors. You might get a performance boost there.
This would be limited by the "smartness" of the compiler to divide the tasks into good chunks.
Re:New category (Score:2)
Re:Depends on your software... (Score:2)
Everything I hear indicates Intel didn't bond the SMP pin to the die -- I can't find the original statement, but here's [2cpu.com] a quotation of an example. That's what I was referring to; AFAIK the Celerons up through 533MHz are fine with SMP, but the "Coppermine 128" Celerons are crippled.
Intel appears to have done something similar with the early FC-PGA Coppermines, too -- I've not heard reliable reports of anyone managing to get good SMP out of them. The SECC-2 versions are fully SMP-enabled, though.
I've heard the same -- people tell me they're fine for gaming, but not so good for workstation use. I have no direct experience... but I do believe that you get what you pay for.
---
Depends on your software... (Score:2)
OTOH, if you can't use (or don't need) SMP, go with the fastest Athlon you can get; the FPU (as others have pointed out) is much better than the present Intel PIII FPU's.
If I was building the machine right now, I'd probably go SMP: dual motherboards from ASUS and Tyan both have good reputations for stability, although they aren't the cheapest -- but uptime is more important than upfront cost, too. The BX or GX chipset solutions are much cheaper than the newer Intel 820 and 840 boards, because of the cost of Rambus memory -- and if you run SDRAM on an 820 (and probably the 840 also), you'll be slower than a BX solution anyway.
Then I'd stick a couple of reasonably-fast (600 or 700MHz) Coppermine PIII's on the board, and lots of SDRAM -- enough so the OS never has to swap to the hard drive. Only you know what that amount is, and memory is still pretty cheap.
It doesn't sound like you'll need SCSI (since drive access times and multitasking probably won't be much of an issue), so stick with EIDE for the hard drive -- the cost is much lower. Your graphics card won't be horribly expensive, either, as long as you aren't worrying about high-end screen output (if you are, it's a whole different ballgame).
A representative system (prices are midrange online values, not the cheapest by far; buying it as a package will save you quite a bit):
Tyan 1832 dual motherboard [make sure it's latest-revision, to handle Cumines] ($180)
2 PIII 600E processors [OEM, without heatsink and fan] ($290 each)
Heatsinks and fans for those processors ($30)
IBM or Maxtor 20GB 7200RPM EIDE hard drive ($180)
256MB PC100 SDRAM ($220)
Mid-range graphics card ($100)
Generic floppy drive, case, CD-ROM and keyboard ($160)
Total price, around $1450 (without a monitor)
This is actually quite a nice machine -- good quality parts where it matters, for what I interpret your requirements as being. If you can manage with Celerons, though, get an ABIT BP-6 motherboard and a pair of the fastest Celerons which still support SMP (don't get burned, here!), and you might be under a grand, total...
Ask me again in a couple of months, of course, and everything will have changed. Remember that the newest, fastest PIII's don't support SMP (yet, anyway); neither do the new Celerons; Athlons will, next year, but they'll be replaced by other CPU's by then anyway; and YMMV.
Have fun...
---
Raw number crunching on distributed.net (Score:2)
Re:Depends on your software... (Score:3)
I have that board and I think it is excellent. You'll need a revision F for coppermine support. It comes at a good price as it doesn't have onboard SCSI like many dual systems. When I bought mine a few months ago, the best price/performance seemed to be P2 450's, two of which were costing less than a P3 500. It was $105 for an OEM P2 450 and $75 + $15 for a Celeron 466 + converter card (at that time, P3 500's were going for $280).
Quake 3 with my system is faster under NT than it is under 98. It's nice, and the framerate is more stable.
"Remember that the newest, fastest PIII's don't support SMP (yet, anyway); neither do the new Celerons; "
Celerons are SMP capable. I don't know if Intel has completely disabled it in the new Celerons that they just announced. Either through a socket 370->slot 1 converter, or through some resoldering, or through a dual Celeron board, such as the Abit BP-6. I have heard that there are some stability problems with the Abit BP-6 that can take some effort to iron out.
Alpha Alpha Alpha! (Score:3)
We recently got a 250Grand 8 processor SGI number cruncher, for someone who wants to do MHD calculations, scarey big calculations. We did a benchmark on it on some of my code and found that 1 SGI CPU was 1/3rd of the speed of our 500Mhz DS20 CPU's..... we have a Dual processor DS20 which we acquired a year ago, for 20k, and this is 75% of the speed of our brand new 'Supercomputer'.......
Re:real time, high end?? (Score:3)
posting is "during data aquisition" with soft real time constrains instead of a calculation done after the experiment.
In an industrial environment it might also be important to have an upgrade path and a processor
family which will be supported for a long enough
time (it is really expensive to switch from
transputers to something else
Real-Time does not mean Real-Fast (Score:3)
My vote (Score:3)
Cache size (and speed) is a lot more important in these calculation-intensive benchmarks than it is in other uses, so it's really something you should look out for. It's also one of many small minusses that really make the dual celeron suggestion a less-than-optimal configuration for real scientific use. Today's best high-performance compilers do a reasonably good job of exploiting special P-III instructions and optimizing to squeeze the right data into the cache. Although the celeron will soon have SSE, it will still be configured with 128k of L2 cache, I believe (though I could be wrong. Anybody?). The real killer to the celeron idea is that you do still have to hit your RAM pretty often, and Rambus or DDR running on a fast bus can really, really help you here.
So, if it comes down to the Athlon or the P-III (non-Xeon), I'd still have to go with the P-III. The biggest advantage is the ability to use multiple processors, as number-crunching code can REALLY benefit from SMP. AMD has been promising SMP Athlons for ages, but they're still basically vaporware. Another factor is the availability of (extremely pricey) Rambus RAM, while DDR is just starting to be accepted. Finally, Intel puts out some pretty fast compilers, while a lot of compiler developers fail to optimize for the Athlon as much as they could.
Within the next year, however, we should see faster RAM for AMD chips, SMP Athlon boards, better compiler support (now that people no longer think of AMD as just a low-end provider), and a full-speed L2 cache on the Athlon. Then the chip's FPU can really shine.
This is all assuming that we're talking about scientific crunching on x86 PCs. If you can go for an Alpha, you really should. Check out www.spec.org for benchmarks. Yes, we're all wary of benchmarks, but when a chip routinely beats its competition by a factor of 2 or more in a very respected, industry-standard benchmark, you have to assume that there really is a difference. I have a lot of hope for Intel's second-generation of IA-64 chips, though. They're doing some really interesting things with compiler/architecture design that could blow away the competition.
--JRZ
One word: Alpha (Score:3)
Note that there is a new 1U rack version of the DS10, called the DS10L (code-named "Slate"), that is very attractive for highly compute-intensive tasks. There's a picture of a rack full of these [compaq.com] in the Linux section of Compaq's web site.
Re:Thinking for difficult operations (Score:3)
Not x86 (Score:3)
For straight number-crunching, a fast x86 CPU is probably a decent choice. You will probably want to go SMP for maximum performance. (Are there SMP boards for a G4? That might be a good choice, as it has one of the best FPUs out there right now.) You might also want to look into an Alpha, they have very good floating point, and some really good SMP solutions exist for them.
Another issue to keep in mind is the software side of things. You may want to get a processor that you can write, or can learn to write, assembly code for. You can write much more efficient code in assembly, and (and this isn't always a bad idea) if you do things this way, you can make things even more efficient and not run an OS at all. (Even a 12mHz 386 screams when you don't run an OS on it while you run your code.)
The thing to keep in mind when approaching this problem is that there are a lot more solutions than Intel and AMD, and many of these are well worth investigating further. I am just trying to give you a few ideas here, you will probably want to do your own research and choose the solution that works best for you.
Re:Intel (Score:3)
You are 100% wrong. Intel has NEVER had a respectable FPU. Ever. AMD's Athlon FPU destroys the intel chips. The only thing you might be thinking of is how the early quake games required a Pentium CPU because the id boys hand tuned for the pentium FPU.. 486-class amd/cyrix chips obviously ran horribly here...(and were legitimately outclassed by the pentium FPU at the time)
another thing.. what exactly is a "real time floating point operation" ? RT operation seems like it should have alot less to do with a CPU then it would a system, and more importantly, the software side of that system. RT computing has alot more to do with enforcable upper bounds on wall clock time for a given operation than it does with 'fast fpu'. How a CPU deals with interrupts and cache misses, and under what conditions, and how this interacts with FP ops _might_ make this "topic" relevant.. but i can't believe that some people are saying "AMD" and others are saying "intel" and not expanding any more..
Here's what I just did (Score:3)
Our previous boxes were 600MHz DEC alpha stations running Dec's UNIX, OS/F or whatever it is. We find that the AThlon boxes, which are 32 bit of course as opposed to the 64 bit Alphas, are about as much faster as the clock speed would indicate, i.e. about 30%. As a result we increased our computation resources by a factor of four for less than $20K. We are very happy.
I'm not sure SMP can be justified in this kind of case, as boxen that support it are typically way more expensive than our cheapo Gateways, and SMP generally does not increase speed by the factor you would think. However I'd be interested to hear any results to the contrary. When problems can be split between processors, performance per buck is what matters.
The best solution really has to be tailored to the specific problem being solved; so for example 384MB or PC100 RAM was ample in our case but in a big 3D finite element case it gets to be a problem. As an example we also have an electromagnetic package that runs on NT; because of the software licensing we can't run more than one case at once, and that has to go on the huskiest system I can get the budget for, currently dual Pentium III 733s with 2GB RAMBUS memory. It is way less cost effective than our cluster system.
Hope this helps.
John
the question is ill-posed (Score:3)
Standard office applications:Go for a cheap Celeron and lots of RAM. Most applications will be very responsive in any case.
3D games: an Athlon would probably be your best choice. Decent FPU performance, good integer performance, won't cost you a bundle. Most games don't really benefit from SMP anyway.
Render farms, other highly parallel, low internode communction applications: commodity x86 systems, the more the better.
RT control of other systems/experimental setups: I personally prefer the StrongARM series of processors for this role, since the price/performance is practically unmatched, the documentation is through, and programming in assembly for the SA is truly a joy compared to the hideous mess that is the x86. Only problem is that there no FPU (it does have an integer multiply though).
Data mining, data warehousing: I don't have any personal experience with these applications, but I have heard good things about Suns and the RS/6000's from IBM.
Single-threaded or low parallelism scientific computations: Definitely Alphas. They blow any other processor away on floating point intensive operations. The only real drawback is lack of CCNUMA/massively parallel shared-memory systems. IIRC, they top off at 8 or 16 processors.
Really big simultions, computational hydrodynamics, etc.: Keeping in mind my previous disclaimer, I would still have to suggest SGI Origin 2000 systems for this type of task. The out-of-box performance on a fully populated Origin 2000 is awe-inspiring. Another option might possibly be linked AS/400s or RS/6000s or even one of the Cray T3Es for vector oriented codes. A bit pricy, but if it's not your money...
It's not only the processor, it's the OS (Score:3)
In fact I believe that if you simply need a calculator that is completely devoted to you, you should build your own very task specific operating system that is optimized for the task at hand. Half a year ago when I was implementing different parts of an OS, I remember that the most difficult challenges that stood in front of my group were the system optimizations for memory accessing, multitasking (paging), security and multitasking the IO devices.
If all I wanted was a simple system for a single user I wouldn't need all those complicated algorithms, I could simply write a memory manager, and a simple IO and interrupt support and that would be the fastest way for a single user to operate. In a sence even DOS was too sofisticated for what you are asking, DOS had some TSR support and paging. (However its memory management was awfull.)
So there you go, it's back to C and the Assembler time!
Flamebait?? (Score:3)
This is one of the questions that are in the category of "Which is the best editor?" or "Which is the best Operating System"" or "How many angels can dance on the head of a pin?".
Geeks have argued and fought over these (or at least the first two) for years, and will argue and fight over them until we are Borg.
what "real time" means (Score:4)
Hence, the phrase "real-time mathematical computations" is almost an oxymoron
Basically, I'm attempting to point out that "real time" and "fast" are not synonymous.
Re:I don't think you understand what you are askin (Score:4)
Now, if the poster is referring to mathematical problems (but not 'real time', more like 'solved in a reasonable time'), the above post is right on the money: do not buy an SMP system --at least not an x86 SMP system (I don't have any exprience with Alphas). The problem with x86 SMPs is bus speeds, i.e. communication between CPUs on the same board. Using fast network interconnects (gigabit speeds) you can usually get better performance between boxes than between CPUs for some SMP systems (particularly quads and 8-ways). Duals are not as problematic, so for compactness' sake they might worth the $$$...
Keep in mind though, this is not the way things *should* be, it's just the state of the art --which sucks right now in x86-land. With better motherboards coming up (not to mention better SMP support by all the different OSes --the new Linux kernels seem to have solved context switching problems that were killing SMP machines, for example), eventually SMP machines will be the way to go...
engineers never lie; we just approximate the truth.
Architecture makes the difference (Score:4)
real time, high end?? (Score:4)
High end mathematical computations are unlikely to run in real time on any processor. Do you really mean games?
Athlon Has a Superior FPU (Score:4)
I don't think you understand what you are asking.. (Score:5)
In general, real time apps (even soft real time, like video or audio decode) are concerned more with low latency then high throughput. As a result, you aren't going to want an SMP system. The complex caching systems in SMP's is going to make performance even *less* predictable which is precisely *not* what you want for real-time.
If you just want a really fast media-cruncher, then you don't want to be running x86. If you're serious, you'll go for something like an Alpha, that will smoke any x86 in FPU. Besides, if you really want to get into real time media processing, you are going to need a great deal of bandwidth, and commodity x86 hardware isn't going to get you where you need to go.
If what you really want is a budget box to run games on, then get an Athlon. A quick review of any games site in existence will tell you that Athlons beat Intel's offering in every regard these days. There's no reason to bother Slashdot with such common questions. Any of the DIY gamer sites will have a host of articles with benchmarks running Quake or Unreal or whatever it is kids play these days.
I get the impression that this post is from someone who doesn't understand real time computation, and just through that phrase in there to make their question sound more sophisticated.
--Lenny
PPC 7400 (Score:5)
As always, it all depends on what numbers you're crunching, and for what purpose. The vector processing in the 7400 is pretty sweet if done right, and one of the Linux PPC variants has full support now.
Just a though to get away from x86
Pope
The stupidest question I've ever heard (Score:5)