Why Faster CPUs? What About SMP? 35
Codeine asks: "As we press harder and harder against the physical limitations of speed, why do CPU manufacturers continue with the costly faster single processor model, instead of focussing on multi-processor designs? The new IBM Blue Gene seems to be acknowleging that more/simpler processors is the way to go (very like non-AI, millions of neurons). Why aren't we seeing commoditisation of SMP?"
Re:Not always compatable (Score:2)
What is ACPI? I'm thinking about adding another processor to my machine, but if this is something important, I may just get a faster cpu.
Re:Not always compatable (Score:1)
If I recall correctly, APM support was basically undefined under the SMP specification (MPS) so it's a crapshoot anyway.
Re:Not always compatable (Score:2)
True, mostly, but not always. Some high end compilers can do certain optimizations that will allow certain types of algorithms to run faster on a multiprocessor machine. For example, consider this loop:
Successive iterations of this loop aren't dependent on previous iterations, so in theory, if you had n processors, you could do all n iterations in parallel at once. But of course, most algorithms aren't this parallelizable. For example:
And of course this is the answer to the poster's question. While some problems may be highly parallelizable, others aren't - each step of the algorithm may depend on the results of the previous step. In this case, throwing more processors at it does you zero good. It's like if you're driving across the country and you decide to take 100 of your friends, all in their own cars, to try to get there faster. It's still going to take the same amount of time.
I agree! (Score:1)
Alright, Ill stop being silly now.
Moo!
Re:Not always compatable (Score:2)
The only reason I compile in APM anyway is to get the machine to turn off by itself. Since that's not very often, it's not a big deal if it doesn't work in SMP mode.
Thanks.
Re:Apple and Microsoft (Score:1)
On the other hand, up until quite recently (and arguably not even now), why should Joe Sixpack's home PC do SMP? All he is going to do is write e-mail, surf the web, and maybe crunch the occasional spreadsheet. SMP doesn't make much sense for the great unwashed masses. It's expensive, and the software to support it is more complex, and thus more expensive itself (OSS stuff aside). It's the high-power /. crowd who wonders why SMP isn't more common, but we unfortunately represent a minority in the market.
All good reasons, but they miss (Score:2)
When Joe Sixpack tunes in QVC and sees a "700 MHz computer" that's easy to compare to a "350 MHz computer"--it's twice as fast. But what do you make of a "dual 400 Mhz"? Is that 800?
Once Moore's Law starts pooping out on us we'll see many more multi-processor machines and then Joe will start to understand.
--
Re:Apple and Microsoft (Score:1)
For most desktops, I don't see SMP being all that beneficial, because people don't truly multitask that much. Having efficient multibranch execution in a single CPU I'm sure has much more apparent performance gains to the average user than having multiple CPUs would. In applications where increased parallelization is helpful--for example, 3D rendering--it's being added due to market demand for speed increases that would be difficult to meet otherwise.
I'm not sure continuing increases in MHz will be all that apparent to users except of demanding apps, though, since a 400-500MHz Celeron is plenty for office apps and the like. Intel will continue to up its clock rate, though, as long as it can demand a higher margin on its faster processors. Since Apple cannot compete with the G4 chips Motorola is producing, it must differentiate itself on other ways, including SMP. I've read (on Ars Technica, I think) that Apple is considering moving to the Altivec-less G5s from IBM, which have higher clock rates.
As for the use of SMP on high-end systems, yes, it's true it's being used here. When performance matters most, you must parallelize. But in servers, you get to amortize the higher cost of the boxes across multiple users--that's the point--so having multiple processors makes sense. This still doesn't mean that this will inevitably lead to SMP on the desktop.
I'm curious what Intel will do if the market for the biggest MHz number chips starts to slow down. Of course it can sell other products, but how will it maintain its margins? Will it be successful integrating things like 3D chipsets into a single, high-margin CPU?
Re:Apple and Microsoft (Score:2)
Yeah, I know. If you'd checked other replies to my post before you posted, you'd note that I corrected myself. /. doesn't support post-post editing, unfortunately.
Hey, I'm not saying Joe needs SMP -- I'm also not saying he doesn't. But considering how cheap previous-generation chips get, buying an SMP motherboard with just a single processor, and then upgrading later, allows for almost twice the upgradability (processor speed wise) later.
Although really, IMO, most people would be served just fine by a NeXTStation (33MHz 68040 and 64MB RAM, say).
As for few people needing SMP, consider this: if chip speeds were still in the 100-200MHz range, and CPU vendors had put more effort into improving ISA rather than pumping up speed (and providing incremental improvements in CPU technology), we could be running 4 CPU systems. And since 100-200MHz processors would still be in abundance, it would probably mean more cheap computing for everyone. The stratification of MHz has resulted in there not being any single CPU produced in sufficient quantities to spread computers faster.
Re:Hmmm... (Score:1)
Re:Apple and Microsoft (Score:2)
Ummm... Apple switched to USB at least as much because Macs were losing the peripheral war -- there wasn't ADB, DB-9 serial, or SCSI on most PCs, and that constituted 95% of Apple peripherals. However, for quite some time PCs had been coming with USB, whether manufacturers were supporting it or not. So, suddenly, they only had one mostly incompatible interface, but unlike ADB, SCSI, or DB-9 serial, it was intended to go on all PCs. Now, any new PC comes with USB.
Now Macs also have Firewire/IEEE 1394 and AirPort/IEEE 802.11, two Apple technologies making their way into PCs. Just a few days ago, I saw a pretty new Compaq system at Radio Shack with Firewire, and now Carnegie Mellon is installing 11Mbps wireless networking (you know, 802.11?) on campus. So, to recap: standard technology on Powermacs is: USB (on PCs too), Firewire (on PCs too), Airport (coming close to standard on laptops), and, now, SMP.
Take a look at your average Windows PC (yes, just pretend Windows9x could benefit from SMP for the sake of my argument). Now, take a look at all the little icons in the tray, and the desktop, and the taskbar itself, and then, finally, the one application our Hero, Joe Sixpack, is running. Suddenly, Joe doesn't need much CPU for any particular reason, but keeping all the little processes happy while he loads some Microsoft bloatware, and having a snappy system requires a good bit of CPU. Voila! Joe Sixpack could benefit from 2 200MHz CPUs, rather than 1 400-500MHz CPU.
I'll say it again: Apple seems to be leading the PC pack. Now that Apple has put out SMP machines, labelled them "fit for general consumption," and then gone off on how cool they are, I am quite confident that PC manufacturers will follow suit.
Re:Apple and Microsoft (Score:1)
4,500 processors in that baby.
http://developer.intel.com/technology/itj/q1199
http://developer.intel.com/technology/itj/q1199
(don't get too excited, it's really just a hyped up ultra cluster)
Re:Hmmm... (Score:1)
1. Drivers - There are quite a few drivers under Linux (I don't know about NT - I don't use it myself) which have had trouble with SMP. They include the 2940 SCSI driver, many of the sound drivers (I still have the occasional problem with my AWE64) and the USB drivers (which are, admittedly, not meant to be stable yet).
2. Cost - Well, as I said, SMP requires better quality of components. That's the cause of the higher cost, not the effect.
5. I'm not saying you'd have to go to a full NUMA architecture, but the Alpha's bus isn't that much more difficult to produce, and economies of scale would quickly bring the price down.
Re:Hmmm... (Score:2)
I wouldn't be so quick to blame everything on "badly written" drivers if I were you. Applications are fairly simple, locking-wise. They have one entry point and full control over when new threads enter or exit. Apps that benefit at all from SMP usually do so trivially; anything that's a pain in the ass to handle in parallel just gets a huge mutex slapped around it, and apps rarely need to hold two locks at once. For drivers, it's very different. Drivers have multiple entry points, any of which can generally be invoked at any time even when something else is already going on. Single-threading requests is generally not an option for performance reasons. Drivers tend to develop deeper locking hierarchies and more complex locking behaviors than almost any app, so it's no surprise that locking errors - race conditions, deadlocks, etc. - are so common. Yes, a driver that has such errors in it is still broken, but it may still be "better written" than the trivial SMP code app writers can get away with.
It may not be the driver's fault, anyway. The OS itself may have SMP problems that get triggered by specific perfectly-legal driver behavior. For example:
I've seen this kind of crap happen on a dozen OSes, in cases where the driver had every right to call that OS function under those conditions but the OS screwed up. I've seen cases where the OS-provided synchronization facilities had subtle bugs (usually SMP-specific bugs) that caused starvation or missed wakeups under some conditions. Drivers are hard to write under the best of conditions, and when they have to be written while avoiding all of the OS bugs it's sometimes amazing that they ever work at all.
My answer (Score:3)
There are two main reasons SMP isn't more pervasive:
The first of these is pretty self-explanatory. I'll try to expand a little on the second.
Multiprocessor (MP)hardware is a lot more complex than uniprocessor (UP) hardware, with extra latency in the memory subsystem to deal with potential cache issues - even if no sharing is occurring at that particular moment. Code running on multiple processors needs to do locking, and the locking itself can be pretty costly (especially since it uses bus-saturating interlocked memory instructions). This is why running an MP kernel on a single processor is even slower than a UP kernel. Lastly, not all code parallelizes well; much of it contains major sequential dependencies. In the end, all of the extra work that's done to make MP behave correctly may end up costing more than it's worth even for small numbers of processes.
As the number of processors increases, all of these effects increase exponentially. The memory system starts to get pretty hideously expensive, cache warming and memory locality issues become more complex as efforts are made to reduce the strain on the memory system, and all the while it becomes harder and harder to keep all of the CPUs busy enough to make the whole thing worthwhile...and this is even for a mere couple of dozen processors.
When you're looking at something like Blue Gene, look not at the amount of CPU power involved but at the incredible memory/communications bandwidth - multiple communicating processors on a single chip, multiple chips on a board, boards arranged into modules, etc. The key to Blue Gene is that they have this phenomenal bandwidth coupled with a specialized application which is almost uniquely able to take advantage of how the memory/communications system is structured.
x86 SMP (Score:1)
http://www.serverworks.com
they made a very reliable SMp capable chipset, used by some big brands and now some taiwanese like ASUS.
cheap also, while ignoring the troubles and low capabilities of those "original" Intel chipshits.
Hmmm... (Score:4)
1. Drivers
SMP causes a lot of badly-written drivers to fail, although they might work reasonably well under a single CPU.
2. Cost
SMP on x86 requires more expensive motherboards, a larger-capacity power supply, and overall better quality of components, all of which costs more (not to mention the cost of the second CPU itself).
3. Competition
x86 vendors have to keep their prices down in order to be competitive, and with the current "MHz = Better speed" idea firmly implanted in the minds of most people, it's going to be harder selling a dual-CPU 700MHz system (for example) if there are 800MHz single-CPU systems available.
4. Lack of OS support
Like it or not, the majority of users are still stuck on Win95/98, neither of which support SMP. WinNT/2000 does, but how many computers for home use are sold with those installed?
5. Bad architecture
The x86 platform's SMP, quite frankly, sucks. A lousy bus/cache architecture means that you won't get 2x the performance you would from a single CPU for any application which hits main memory a lot.
6. Difficulty of programming for SMP
If you want to get the benefits of SMP from within a single application, you basically have to use threads, which are a real pain to debug properly.
That's all I can think of off the top of my head...
Re:stupid slashdot, you make work! (Score:1)
Re:Not always compatable (Score:1)
ACPI is not SMP safe under linux...
Recently, I've heard several people say Linux SMP is superior to BSD's SMP yet my FreeBSD box is running fine with ACPI. Why isn't ACPI working with Linux?
Re:Not always compatable (Score:3)
FreeBSD can run ACPI because their SMP is poor. FreeBSD (Note that 5.x will probably change this) using the big giant lock mythod of getting at the hardware. Thus when you acess hardware on one CPU the other cpu is stoped. Generally this is bad, but it means that ACPI works - the system looks like a single processor to ACPI.
I love freeBSD, and have run it in SMP since the pre-3.x days.
Not always compatable (Score:1)
Re:Not always compatable (Score:1)
Apple and Microsoft (Score:5)
First of all, to some extent SMP is being commoditized -- Apple, for instance, is now selling SMP as being a simple one-step upgrade from UP in their PowerMac G4s. Apple is also the computer vendor that brought us widespread use of USB, the focus on industrial design as a consideration buying computers, etc. Expect other vendors to follow that lead, insofar as they can load operating systems that can take advantage of SMP.
Microsoft should probably credited with holding systems back to single processors with Win9x/ME, and yes even WinNT. With NT, IIRC, processes, not threads, were spread across processors -- so you saw very little benefit running a single, multi-threaded app on an SMP system. I would hope W2K does something more reasonable, as in something that virtually every other SMP implementation does (notably, except MacOS pre-X), and spread threads across processors.
Finally, in the x86 arena, only intel can support SMP currently -- and considering that AMD has been providing a much better price/performance ratio for some time, and is even generally ahead in performance right now. That makes it more difficult to justify going with lower-performing, more expensive processors to increase performance, although of course the difference between dual 800MHz P3's and a single 1.1GHz Athlon should be quite noticable if you're running a well-threaded application (or lots and lots of processes).
All that is for PC systems (including Macs as Personal Computers, if not Wintel PeeCees :). For other architectures (alpha, sparc/ultrasparc, MIPS, PA-RISC for instance), SMP is alive and well. SGI's highest-end workstations-that-could-be-servers, Octanes and Octane2s, support two processors, and their servers support a lot of processors. Sun has SMP workstations and ridiculously SMP servers as well; I've seen a lot of SMP alpha motherboards, but since alpha's are almost as commodity as PCs I haven't checked out what sorts of systems [c|o|m|p|a|q] sells. Hewlett-Packard also sells SMP workstations and servers, but my experience with them is with the old HP 9000/7xx series that are largely, if not completely, uniprocessor.
Algorythms. (Score:4)
SMP is not always faster. If you are running two completley independent CPU bound programs, then SMP is faster, but then why not have two comptuers? As soon as your threads need to interact SMP slows down. Depending on your algorythm this might or not be a big deal.
Or to put it anouther way, the best SMP code will in the general case be slower on a 2 cpu system as the smae program for one processor that is twice as fast. (ie a SMP program for two P3-500 will run slower then a single processor only program for one P3-1000. Cache cohearancy issues and the like. Of course two P3-500s might be cheaper by enough to make it worthwhile.
Re:Apple and Microsoft (Score:1)
Re:Apple and Microsoft (Score:1)
Hmmm. According to anectdotal evidence posted elsewhere in this article, single multi-threaded apps do see improvement in NT. I stand corrected.
SMP != dual processor (Score:1)
Two things: first, why limit yourself to two processors? Compare the performance (of an easily threaded algorithm) on a single 500MHz Athlon against 8 200MHz PPro's (I would use a more updated example, but I can't remember if the PII or PIII support more than 2-4 processors). Second, why compare slower processors in an SMP box with faster uniprocessor systems? Compare the performance of a dual 800MHz PIII system against a single 800MHz PIII, and then take a look at when comparable uniprocessor performance will be possible. The point of SMP, quite simply, is what to do once you can't get a "processor twice [or four times, or...] as fast."
Of course, there are some operations that we don't have parallel algorithms for -- yet. Then again, there are some operations we don't have recursive algorithms for yet, either -- this is an area of research (and in the current age of cross-disciplinary research, it's being done as much by people who need the parallel algorithms as the people who know a lot about parallelism).
wrong assumptions (Score:3)
--
Re:wrong assumptions (Score:1)
For example, the Riken/Columbia/BNL supercomputer at Brookhaven National Laboratory uses a bazillion cards with RAM, a TI DSP, and a custom gate connecting it to its neighbors in the x, y, z, and t directions of the calculation. (It was custom-built for doing quantum field theory.) Best described as "massively parallel processing" (MPP).
--
LoonXTall
SMP shouldn't have to be coded for (Score:1)
Use a language that supports parallelism (Score:1)
A functional language or something like parlog (parallel prolog) might do the trick. Parallelism should happend automagically, but with today's most used languages (C/C++/Java) it ain't happenin'. It's all manual and quite error-prone (at least if you want it fine-grained).
Furthermore existing software won't benefit from multiple processors... and the existing compiler technology doesn't know how to optimize for multiple CPUs. Making applications multi-threaded can help, but a far better solution imo would be to have the compiler optimize for multiple processors and have fine-grained parallelism.
Also note that today's processors already have some built-in parallelism in the guise of MMX/SSE/3DNow!-instructions...
In addition to all above... (Score:2)
An alternative that is being explored is finding a different, parallelizable algorithm that can solve the same problem, but that research also requires work and resources. If you spent those resources on building a faster cpu, then not only would you have a faster solution to your particular problem, but also a faster solution to other problems as well.
So...in most cases the best choice as to where to spend those resources is to spend them building a faster cpu.
Re:Hmmm... (Score:2)
- Mike
Re:Hmmm... (Score:1)
Why, yes I do, if by "hardware driver" you mean "anything that runs in kernel space". Specifically, I write filesystems nowadays, though I've also written network and physical-device drivers. In the past I've worked on the kernel itself, on middleware, and - a long time ago - on applications. I even wrote a tiny bit of embedded code once. Of all of these types of programming, writing drivers seems to leave one at the mercy of others' design or interface decisions the most. The tools also tend to be the worst. It can be very difficult and frustrating.
So why do I, and others, put up with it? Why don't we do something else? Well, I admit that there's a certain satisfaction in seeking out and overcoming the greatest challenges available, proving oneself, etc. It's a motivation similar to the one that drives some percentage of soldiers to become SEALs or Rangers or Airborne. Mostly, though, I think people do this stuff because they realize it's necessary or it's The Right Way to do something they consider interesting - in my case, distributed and cluster filesystems.
Nobody likes to hear bitching from people who don't understand what they do. What programmer, of any stripe, would not resent a PHB badgering them about adding some stupid feature that conflicts with a product's original design? Well, to a kernel guy (and it is unfortunately the most male-dominated programming specialty) an app programmer is almost indistinguishable from a non-programmer. Practically every kernel programmer started out doing apps at some point, and understands what's involved, but 99% of app writers have absolutely no idea what's involved in writing kernels or drivers. Therefore, any app writer who resents having non-programmers critique their own work should likewise refrain from critiquing kernel folks' work.
Re:Apple and Microsoft (Score:1)
Look, that's why I wrote "currently." As in, "for now -- but not so in the future [when SMP Athlon motherboards are available]." As for "14 (!)," that's baby stakes. The Pentium Pro could, as I recall, support 64. Which is nothing to the 512 that an Onyx3000 can support.