Linux on an Intel PIII vs. G4? 47
An anonymous submitter sent in: "I'm currently looking into purchasing a new laptop. This machine will run SuSE linux and I will be developing some pretty processor intensive applications(genetic algorithms, mathmatical simulations,etc.) so raw speed is the major factor. I've been searching for information on the relative speeds of an 850Mhz P3 vs a 500Mhz G4 but all tests I've seen are on the 'native' OS (OS9/X vs WinMe/2000). Has anyone out there done some tests running the same OS (linux/openBSD)?"
NASA study on G4 processor power (Score:1)
Re:Why laptop if raw speed matters? (Score:1)
right you are (Score:1)
very amazing thing, lower latency and higher data thouroughput.
too much of that was from memory to be all that accurate...
Re:Processor features (Score:2)
in this discourse by alpha i mean 21264 and will make distinctions between
p3/p4 and k7 where applicable, i am uncertain on most of the numbers for
sparc (UltraSPARC III) chips.
processer frequency:
x86's strongest point, followed by alpha, then sparc, then g4
(well, that might be a little out of order, and don't put too much stock in
just the frequency anyway, it's simply one component of the system speed)
system bus width:
most processors share this bus with the memory bus but not with the cache
bus. It is usually 64bit wide but at differing frequencies on different archs.
The p3 and the g4 have a 100Mhz bus, the K7 has a 133Mhz DDR(266 effective),
the alpha has a 333Mhz bus, and i can't find relevant literature of the
UltraSPARC III.
to the best of my knowledge all of these chips have a 64bit system bus
the system bus is where disk drive controllers and pci/agp etc reside.
memory bus width:
P3/P4/K7, G4, and alpha share this bus with the system bus, the sparc chips,
i believe, do not. one thing of note about the alpha, it has 4 seperate
memory controllers that talk down the same bus, so even if though it uses
100 MHz SDRAM, it can completely fill the 333Mhz bus.
a lot of crazy stuff comes in to play in the memory bus if you have an
excessively SMP machine, sparcs have on chip memory controllers and can
access the memory easily, and the chips with bigger cache size don't need
to read as often from the main memory. the cache size makes a staggering
difference since it is often at the same frequency as the CPU.
cache bus width:
Everything but the alpha has a 64bit cache bus, the alpha's is 128bit and
error checking to boot!
cache frequency:
Most chips have 2 seperate chip caches, most pc cpus have them on the same
die as the CPU and running at full speed. The 'L1' cache is usually only
about 8k-64k is always is at full speed. The 'L2' cahce is usually much
bigger, although the P4 has a very small (64k) one. The speed of the L2
is as follows:
P3/P4/K7(thunderbird) full speed, G4 200Mhz-350Mhz, Alpha 333Mhz
dunno on the sparc.
the frequency is not only a contributor to the cache bandwith, but also the
cache latency. if your cache is half speed you'll have to wait another cycle
to pull data from it.
cache size:
k7 512kb, p3 256kb, p4 64kb, p3 xeon 512-2048kb, alpha up to 8MB, g4 512kb
memory latency:
memory subsystems are another level of wait on the data you're after in the
cpu. it usually takes a few cycles to get data from memory, how long
is determined by CAS and RAS latencies, usually between 2 and 3 on each.
memory frequency:
RDRAM (some p3 and all p4) has 400-800Mhz.
athlon has 133Mhz ddr (266Mhz effective)
g4 has 100Mhz
alpha has 100Mhz but 4 controllers
memory bandwidth:
64bits, the alphas have 4 simultaneous memory controllers, the HeSL P3 chipset
has 2. i think sparcs have it controlled on a per chip basis. all others
have 1 64bit path.
well folks, there are some numbers that have nothing to do with the way the
cpu works or the benefits of multiple instructions per clock, but the system
architecture surrounding the chip is just as, if not more importanct, to the
system's performance than the operation of the chip itself.
CPU architecture:
ok, here's where my (half-hearted) research breaks down,
branch prediction, pipeline length, concurrent instructions/instructions per
cycle, fetches per cycle, and a bunch of other factors come in to play with
assessing the CPU architecture efficiency.
The g4 really stands out because of its super short pipeline on the 500Mhz
and lower models at like 5(?) stages, the p4 on the otherhand is at a
staggeringly high 20+ pipeline. the shorter the pipeline the shorter cache
and memory delays are, and the smaller the misprediction penalty is. on the
down side, it's usually hard to reach high clock speeds. most chips are in
the 9-15 range for cpu pipeline.
concurrent instruction is the realm of MMX, 3dnow, SSE2, and altivec.
the g4's altivec unit gives the largest improvement, but the use of
concurrent instructions is mostly useful in the context of 3d graphics, and
much of the work is now being offloaded to the graphics chips.
but back to the question, for a laptop, p3 is your only real option, even
though it's only real strong point is its clock frequency, its clock *is* twice as
high any of your options, which is certainly enough to make it the notebook
cpu champ. maybe, just maybe, if your specific applications lend themselves
to optimization for the altivec unit the g4 500 would be dethrown the p3.
if i were you, i would lie to myself and say the g4 was my best bet and then
i would have a great excuse to pick up a titanium powerbook.
Laptop screamers (Score:2)
So, is your data int or float, 8, 16, 32, or 64 bit, and can you work on several chunks at a time. If it is in 32 or smaller bit chunks, and you can do several at once, the mac is likely to rule suprem. It has 32 vs 8 128bit registers, and can do 2 instructions per clock tick vs 1 every other for the P3, for 4 times the speed, and better opps to boot.
Once again, what exactly are you doing?
Hey
Re:For raw speed, ditch gcc. (Score:2)
Now I find myself wondering about a few things.
The fact that SGI and Compaq (Digital) have such good compilers may be explained that their machines are being used in scientific establishments where CPU performance is key, while Suns machines are the favourites of dotcom farmers requiring massive amounts of IO (databases, etc). When an uni needs a new super computer they'll look to SGI, Compaq (Alpha), Intel (they've got very good compilers) and maybe even IBM (SP2). But I've never heard of an uni using a Sun for a super computer (cluster of UE10000's anyone?)
SARA, a dutch institution that maintains and houses several of Hollands super computers, is housing mostly SGI/Cray, Alpha and IBM hardware (and even some beowulf clusters). They do have a lot of Sun hardware, but most of it is being used as a web or database server.
My point? Well, maybe compiler (gcc and vendor) performance is influenced by heritage. In a scientific setting people will use the vendor supplied compiler, demanding and paying for premium performance. They don't really feel the need to contribute a very good code optimizer to the gcc project. However, in the dotcom world everything must be done as cheap as possible with maximum (ahem) performance. Hence, there are a lot of people tinkering with gcc for Intel (and maybe even SPARC).
Whatever the case the may be, the day gcc generates working 64-bit code I'll drink a few beers for the guys working on gcc. As it stands now, gcc can't generate a decent (maybe I should say working) 64-bit binary for both the SGI and SPARC platforms :( (I haven't tried it on an Alpha yet.)
And yes, I'm one of those CS drop-outs (web farmer) being forced to accept a fairly large amount of cash for trivial work while I would prefer doing research work for a minimum wage. Oh well, we can't all be brilliant.
Memory speeds (Score:3)
We've got a couple of Dell Inspiron laptops that do about 280 MB/sec (according to SiSoft Sandra 2001se), while we've also got some noname laptops that only do about 160-170 MB/sec. The Dells got a 500 MHz Pentium III (100 MHz bus), the noname laptop a 500 Celeron (66 MHz bus). rc5des runs about the same speed on both types of laptops, but seti@home is quite a bit faster on the Dell (seti@home is much more memory intensive than rc5des). This speed difference can be explained by the fact that the Dell uses a 100 MHz bus and faster RAM.
My noname desktop (Athlon 650 MHz) does about 420 MB/sec and runs rc5des and seti@home about 60-80% faster.
Just some useless numbers...
Re:Processor features (Score:1)
Re:For raw speed, ditch gcc. (Score:2)
Aye. I know you want SuSE, but I'd recommend at least benchmarking your code with Watcom C/C++ compiler on Windows NT or 2000. Great numerical code generation, and this really can make a big difference.
Ask SuSE Folks (Score:4)
FWIW, OS X server on a PPC outperformed Linux on an Intel 450 PII by 23%, according to osOpinion [osopinion.com]. (YMMV, read the fine print, etc., etc.)
-Waldo
For raw speed, ditch gcc. (Score:4)
For the algorithm:
One word. Cache.
Main memory is up to an order of magnitude slower than the cache. Make your algorithms cache-friendly. This means optimizing row vs. column accesses and doing checkerboarding for things like matrices, and other optimizations for vectors. For things like linked lists and trees, try to keep nodes contiguous with other nodes in memory where possible (or even just the key and linkage pointers, since that's all you'll be accessing most of the time when doing a search).
It takes a while to fully zen into this, but it will pay off in spades.
For the compiler:
The following applies to the gcc C/C++ compiler. I'm assuming that you'll get similar performance results for the g77 Fortran compiler. You're on your own for hand-optimizing Fortran (I don't know the language).
Gcc is a nice tool; it's free, and it works well. Unfortunately, even with -O3 -funroll-loops, it can't optimize for beans. I had to study this in detail as a project for one of my grad courses, and I was appalled when I found out just how many potential optimizations it wouldn't catch.
If you're at the point where you're ready to optimize core algorithm code without worrying about it staying simple, then either replace it with inline assembly or (for better portability) write "pseudo-assembly" C code, with temp variables with the "register" keyword instead of registers, and statements only performing operations that can be easily mapped to machine code. Hand-unrolling and hand-software-pipelining worked wonders. Gcc will do the unrolling for you, but not the pipelining (I think) and it won't move even obvious candidate variables to registers.
Using a chip with a large register set (like the PPC) makes this a bit more scalable, but it still works well on x86 chips (to a point). I tested on x86 and Sparc architectures.
Lastly, bear in mind that you might, if you're lucky, get a factor of 10 out of all of this. Make sure that your algorithm is of a well-behaved order, and consider using a cluster of PCs for anything really power-hungry (though that involves optimizing communications, too).
Re:For raw speed, ditch gcc. (Score:1)
SPARC has always been the slowest family of processors, and there's no sign that this'll change.
(jfb)
Re:Processor features (Score:1)
A task doesn't need to be 'embarrassingly parallel' to make use of Altivec - many tasks (particuarly computation bound ones) lend themselves to use of Altivec (sometimes with some clever coding required). The obvious exception is that Altivec is that Altivec doesn't support double precision floating point.
Sometimes it just good to be able to sling around 128 bits at once.
Roy Ward.
Re:For raw speed, ditch gcc. (Score:2)
I think you overlook a more obvious answer. 'Exotic' architectures have less users, and therefore less developers who are knowledgable enough to contribute to gcc so that it optimizes better. It's commonly accepted that gcc is best on x86, which is unsurprising considering how widely used the platform is.
Considering expensiveness is certainly a factor in this too - after all, it was because x86 hardware was cheap and Minix expensive that Torvalds created Linux.
Re:Processor features (Score:1)
Correction on the cache sizes:
The P3 (all mobile, all FC-PGA versions, and most slot-1's) have only 256 kB of Level 2 cache, but it runs at the CPU speed (albiet with a relatively high latency). They also have only 16 kB each data and instruction for L1 cache.
The PowerPC G4 (as used in the PowerBook G4) has 1024 kB of L2 cache at 1/2 the CPU speed. They have 32 kB each data and instruction for L1 cache.
Besides the superiority in cache, the G4 has the advantage of Altivec, which, if taken advantage of, makes the chip a real screamer. Intel's SSE-2 doesn't come close. A Photoshop filter optimised for the G4 and Altivec often finishes twice as fast as the same filter optimised for the P3 and SSE-2. And that's on a G4 with half the clock speed of the P3. Obviously, clock speed means little these days.
Check out MacOS X first... (Score:1)
But as the subject says, check out MacOS X first if you choose to buy a PowerBook, it might very well be all you need since it comes with a BSD layer that as far as I understand is compatible with FreeBSD, the GNU toolkit etc.
Hope this helps a bit, I am for one looking to try MacOS X first before I install a Linux distribution on my iMac.
cya
bBob
--
Re:For raw speed, ditch gcc. (Score:2)
As to Alphas, and 64-bit code: Like another postedr, I have great success on my Alpha DP264 system and gcc/g++. gcc has NO problem with 64-bit code. One of the problems that is probably confounded with perceptions of gcc performance is that the Linux kernel has only recently (e.g., 2.2.18 and beyond) been reasonably bug-free in 2GB+ memory. Again, it's a byproduct of more developers (and end-users/testers) having access to 64-bit CPUs on large memory machines.
Re:For raw speed, ditch gcc. (Score:1)
I haven't tried 64-bit code with GCC on an SGI, but it works OK on a Sun [assuming you have 64-bit Solaris or Linux on it]. And *no* worries on the Alpha, I've been building code on those guys for a while and it works fine. It's not as fast as Compaq's compiler, but I guess that's not really a suprise.
Re:For raw speed, ditch gcc. (Score:1)
In my experience it's pretty touchy. Enabling -march=i686 on my machine (with some gcc 2.95.3 snapshot that I should probably upgrade) slows down some parts of my code while speeding up other parts.
Also, AFAIK, there isn't any real support for P-III, just PPro/P-II and I think K7 in the latest snapshots.
I suspect the main reason there is a lot of explicit code for the Pentium (besides it's popularity, obviously), is that it is the only CPU in common use today that isn't a RISC machine with a ton of registers. Optimizing for an Alpha is a lot like optimizing for a MIPS, which is a lot like optimizing for a SPARC, PowerPC, and so on (yes, there are often fairly significant differences, but nothing like the diff between any of those and x86).
Re:For raw speed, ditch gcc. (Score:2)
So true. I've benchmarked crypto code [code that can take great advangage of pipelining and good register allocation] I've written with gcc and a few commerical compilers (all running on Linux on the same system), and in some cases I would see 3x-4x speed increase. And if you have gcc dump the asm, you'll see many silly things, even with full optimizations. This is totally from memory, but gives you an idea of what I'm talking about:
add esp,-4
[some instructions that just use registers, don't read or write to esp]
add esp,-8
I hand optimized the code (removing the second instruction and changing the first to add esp,-12), and it worked fine. This is of course a trivial example (yeah, I saved on cycle!), but in a large program things like this could mean tens of millions of cycles (think inner loops of long running programs).
If your algorithm is already in pipeline-friendly form, you'll generally be OK, but AFAIK, you're right about GCC not rearranging instructions to handle pipelining (but I haven't looked into this too carefully).
Of course I figure for heavy numerical work G4 will kick an x86's butt, just on the basis that a G4 has a reasonable number of registers. I'm amazed that Intel hasn't added a new extension like MMX or SSE that gives programs a few more GPRs; it would really be useful to a wide variety of programs.
some lateral thoughts (Score:1)
Reason is I'm wondering if a "luggable" is an option. Presumably you can get those with absurd power levels in them...hell, checkmout what the overclocking freaks do in making PC cases with handles for gamers...
Or, how about some kind of thinner client? Can you use some kind of cheapo laptop with an editor for the portable part of it, and remote control a proper machine when you need to do some heavy processing?
In essence, you will take a hideous speed hit in moving to a laptop, so if you can think of a way out of it....
Compilers matter most for G4's (Score:2)
ultrasparc laptops Re:Processor features (Score:1)
I think there are ultrasparc laptops being made, probably by the RDI folks that made the old Sparcbooks. But a) I don't know for sure and b) even if they are being made they'll be stunningly expensive.
--
News for geeks in Austin: www.geekaustin.org [geekaustin.org]
Re:Processor features (Score:1)
It seems that the poster wanted to run (genetic algorithms, mathmatical simulations,etc.) but neglected to mention if this is off the shelf software or if he can write/optimize the code for a G4. Obviously a G4 has the technical superiority with the larger cache and altivec but won't mean anything if you can't get optimized code for it. It's like driving a Ferrari but only able to drive in first gear (good but way below what it could do). Is anyone aware of any Altivec code in any compilers outside of Mac OS? Maybe you could cross compile with decent results.
If you can get the software and be reasonably optimized then I would go for the G4 laptop. I would also do the G4 if the apps were extremely ram intensive because you can get the G4 laptop with a Gig of ram but x86 laptops max out at a 1/2 Gig. I didn't see any mention of the 1Ghz laptops but it would be an acceptable alternative if this is the platform you can get the software you want.
Re:Answer (Score:1)
Re:check your numbers (Score:1)
1Ghz CPU
15.0" @ 1400x1050 or 15.4" @ 1280x1024
128MB SDRAM memory
10GB Ultra ATA drive
DVD-ROM
32mb ATI Rage Mobility 128
10/100BASE-T Ethernet
56K internal modem
Two USB ports
One FireWire port
The most avid user would have to concede that a 1Ghz PIII would out perform a 400Mhz G4
What about price? (Score:1)
If you're lucky, then sky's the limit and you might as well buy two of those new G4 titanium laptops.
--
Re:littlebrain, check your numbers (Score:1)
Actually, I got an even better deal on my laptop. Considering:
Considering that, the results are quite similar. I only wanted to run Unix, and I did look into the new powerbook; as well as some small sony's and others. My homegrown was a better choice and my only sacrifice is that this machine is not ultra light or ultra small ... but as long as it fits in my back pack, it's fine.
I wasn't bringing up the price issue as FUD, I find it very relevant when you consider how much bang you get for the buck. Keep in mind that you rarely can put together the perfect system.
--
Re:Processor features (Score:2)
As for the majority of apps, how many of them actually use the massive caching abilities of an Alpha (or UltraSparc, which you negelected to mention)? That's why they are used on database server, development machines (code compilers), and video systems (UltraSparcs + Sun graphics cards = playing several videos on several screens with real-time decoding of compressed and uncompressed video).
Anything further that you'd like to add?
I can't be karma whoring - I've already hit 50!
Re:Processor features (Score:2)
The UltraSparc(III) info can of course be found somewhere in Sun's website (www.sun.com). Keep in mind, however, that UltraSparc II, IIe, and others are in full force still. Also, the key area that makes a G4 be considered a 64 bit chip and and a Pentium a 32 bit is that while both access the PCI bus and RAM at 64 bits, only the G4 does internal calculations at 64 bits.
Also, for DNA modeling, etc., you'll be able to use larger data sets on the G4 than on the other laptop available chips. And most important: the Titanium laptops look pretty damned cool!
I can't be karma whoring - I've already hit 50!
Re:For raw speed, ditch gcc. (Score:2)
I can't be karma whoring - I've already hit 50!
Processor features (Score:5)
Overall, go for an Alpha first, then the UtlraSparc (interchangeable). Obviously you can't really use these in a laptop, but they are there. Next shoot for a G4. You get more for your money at the lower speeds. Athlons are next. They ARE hard to find in laptops, but worth it (I think). Else, get a PIII.
I can almost bet that any benchmarks you do will follow my suggestions.
I can't be karma whoring - I've already hit 50!
littlebrain, check your numbers (Score:1)
$2,599
1" thick
5.3 pounds
Slot load DVD
5 hour battery (lets figure it is over and call it 4 hours)
15.2 " screen
802.11b (Airport) ready, and cheaper 802.11b cards
400MHz PowerPC G4
1MB L2 cache
128MB SDRAM memory
10GB Ultra ATA drive
DVD-ROM w/DVD-Video
ATI Rage Mobility 128
10/100BASE-T Ethernet
56K internal modem
Two USB ports
One FireWire port
That said, I can come up with many PC laptops that beat the pants off that price wise. However, if the original poster is looking for speed at your home-grown number of $2400, he is better off to spring for the extra $200 for the speed and extra battery life and the big screen.
Personally, I can see several reasons to go with the PC, but the release of OSX might change that.
The OSX debate aside, the price issue is tired FUD, and not nearly as relevant as it used to be.
Figure you are getting a really cool case for an extra $150 and that would be about right.
Re:check your numbers (Score:1)
"I get a lot more bang for the buck by buying a $2400 PIII system instead of buying a pricier G4 laptop."
Then you go on to present an arguably inferior system; to wit:
Ethernet: You had to buy an external PCMCIA card, you say $35 for a good 10/100 card??
Screen: You do not have the 15.2" screen, the g4 is the only beast with it (that said, I think I like the taller ones better than the wide aspect screens, but that doesn't change a thing)
Battery: If you get 8 hours on a single battery, then the g4 is likely to get the same or better. No matter how you slice it, the G4 is a lower power chip than the PIII, which is why the g4 and especially the g3 are used in embedded systems
Wireless: You would have to add a more expensive card if you wanted wireless 802.11b
Modem: The mac modem is not a winmodem. (lord help me if it has been changed and I am wrong, but AFAIK the only win-like- modems on macs were the old internal geoport things).
FireWire: You have none, to add via PCMCIA would be expensive, bringing the price of your laptop up.
Right now you are probably saying " but I don't NEED firewire or a modem or cheaper wireless", but that is based on _your_ desired featureset, and doesn't necessarily represent the best value for the money for everyone.
This is the essence of FUD, you say the pc "is better bang for the buck" even though it is not necessarily true. It may be better bang for YOUR buck, given your needs, but not in all cases. Now that you have itemized your system, the differences are comparable, and assessable.
By all accounts Apple makes really nice hardware. You pay a premium for their superior cases and motherboards. As I said in my first post, I figure you add an extra 150-250 depending on the model for the case and hardware. If you read my entire post, you will note that I specifically said that I could beat the price/performance with a pc vs. a G4; what may not have been clear was "at a cost of features"
Finally the 800PII vs. the G4: this is basically the topic of this whole question; does RISC outperform quasi-CISC. I figure that the original question lacked enough detail to accurately assess the questioner's needs, so I don't have any kind of answer :)!
Re:Laptop screamers (Score:1)
Compilers (Score:1)
Re:littlebrain, check your numbers (Score:1)
Battery life, Speedstep and work/mhz (Score:1)
Furthermore the Pentium III uses Speedstep-technology. It runs 100 to 200 mhz slower in battery-mode. So the advertised mhz's are quite deceptive, G4's always run at full speed.
Lastly you should realise the G4 does more per mhz. An 700 mhz G4 is certainly faster than an 1 ghz P3. And if you run Altivec-apps, they will usually run circles around P3's.
And the new powerbook is damn pretty
Re:Answer (Score:1)
G4 vector processing info (Score:1)
Use Altivec And You Go Fast! (Score:1)
http://www.jdkoftinoff.com/eqtest.tar.gz
works on the G4 with OS-X and Linux-PPC. Can compile and run on non-altivec processors, but not optimally since the algorithm is still focussed on altivec.
Run it on all your boxes and see for yourself
--jeff
Re:Use Altivec And You Go Fast! (Score:1)
That's why I said it wasn't optimized for non-altivec, silly!
Download the code and add the non-altivec optimized version then. It's GPL.
BTW You can compare the non-altivec tests on G4 as well. A 450 Mhz G4 without using altivec performs faster than a P-III 667.
See for yourself.
Re:Answer (Score:1)
I wouldn't worry about it (Score:1)
One consideration is that GNU C is probably not as good for the PPC as it is for the Pentium, so published benchmarks may not quite apply. Also, you can't get the Apple without an OS, so you pay a little extra for that. Furthermore, there is quite a bit of s/w that doesn't exist for Linux PPC and would be difficult to port.
In short, for running Linux, I'd go with a Pentium. I'd consider buying the Mac with MacOSX as an alternative, though.
Re:Mouse buttons (Score:1)
Re:Answer (Score:1)
Re:Check out MacOS X first... (Score:1)