The Sacrifices of Portablility? 95
hackwrench asks: "There is lots of talk about writing portable programs, but this pursuit has resulted in a lot of processor features going unused. One example is being able to write a program that purposely uses a combination of 16-bit and 32 bit. I know there are arguments that writing solely in one or the other is a performance advantage, but what are the factors involved? Is the slowness of such a combination inherent in its design or is it a result of current hardware. We are beginning to replace systems and programs designed primarily to run in pure 32-bit mode with systems designed to run in pure 64-bit mode, so I ask: Is such purity really worth it?"
Think memory usage, not size... (Score:1)
Re:Think memory usage, not size... (Score:3, Insightful)
Both 1 and 2 registers at the same time (Score:1)
The problem with writing portable code as things now stand is that it is oblivious to fitting things into cache, as it must remain cache-size independent. Since current tools are built with that sort of attitude about portable code, the designers refuse to implement features to allow the coder to code to cache sizes.
Re:Both 1 and 2 registers at the same time (Score:2)
There are a number of problems with partial registers. At the CPU level, it comes when trying to figure out instruction dependencies. Supporting half registers makes things a lot more complicated when you see that instruction 1 writes to EAX, while instruction 2 reads from AX. Second, it makes register allocation a lot more complicated.
The problem wit
Re:Cache-friendly (Score:1)
Re:Cache-friendly (Score:2)
Re:Cache-friendly (Score:2)
I don't think you should hard-code for a cache size unless your target is very specific. One place where you would code to particular cache sizes and layouts is the embedded space. (I should know, I wo
EAX, AX, AH and AL. (Score:1)
Re:Think memory usage, not size... (Score:2)
Yes. Most compilers let you optimize for size, or speed--that is, they are mutually exclusive. What you are suggesting is hand optimizing for size. This isn't necessarily bad, but is pointless for structures that exist in 1 or 2 places (what, you're saving 2 bytes total?). In a huge multi-megabyte array it can make a dramatic size difference. But it can also slow the crap out of your code in certain situations, so
Ever heard of playing just to see what will happen (Score:1)
That's not true at all. There is nothing inherently wrong with hand-optimizing just because you feel like it.
You also say that size and speed are mutually exclusive. While that is generally the case on current x86 architectures, that doesn't always have to be the case. I don't know what causes the penalty for unaligned reads, but Intel could redo its architecture to grab 32 or 64 bits at a time from
Re:Ever heard of playing just to see what will hap (Score:2)
Have you ever heard the saying, "premature optimization is the root of all evil?"
The problem is that when you profile a piece of code, quote often the slowest routine is something you never would have expected. Hand optimizing a routine that gets run 1% of the time is pointless. Quite often this sort of hand optimization makes code ugly, and why make your code ugly for no reason?
Another argument is
Re:Ever heard of playing just to see what will hap (Score:1)
Yes, and the first hit for [cookcomputing.com] "premature optimization is the root of all evil" [google.com] demonstrates my point exactly. To paraphrase, a good software developer will have developed a feel for where performance issues will cause problems. Making it easy to hand optimize can only help one to develop the feel.
You say, "The point of C is to let the compiler do the stupid little architecture optimizations for you." and you also say "Quite o
Re:Ever heard of playing just to see what will hap (Score:3, Insightful)
Yes, I totally agree with you and the linked essay on this.
I disagree here. Read the page you linked to again. The point is that you have to have a feel for the overall design of the program you are making and how
Re:Ever heard of playing just to see what will hap (Score:1)
Here is a classic example of changing algorithms vs. optimizing your existing algorithms.
Re:Think memory usage, not size... (Score:2)
In 16-bit "mode", the x86 lets you access the upper and lower halves of the [abcd]x registers as [abcd]l and [abcd]h.
While the registers were extendex to the 32-bit e[abcd]x registers, and the lower 16 bits are still accessible via [abcd]x, there is not, TTBOMK, any way to access the upper 16 bits.
The same goes for the 64-bit r[abcd]]x registers.
The registry situation (Score:1)
Re:The registry situation (Score:2)
Ok, yeah, you're right.
Still, there's no way to access the high 16-bits.
Compiler Optimizations (Score:3, Informative)
The industry (Score:2, Interesting)
Re:The industry (Score:3, Insightful)
-David
64 / 32 bit: it depends for what use (Score:1, Informative)
16 bit is often slower than 32 bit (Score:5, Informative)
How Protected Mode works. (Score:1)
Re:16 bit is often slower than 32 bit (Score:1)
you are right that mixing 32 bit and 16 bit variables is a recipe for slowness.
Re:16 bit is often slower than 32 bit (Score:2)
Re:Wth? (Score:1)
That's my point. They're rare only because the tools to make code are designed to make them rare.
Converting a 32bit application to 64 but will mean nothing, unless it's a special purpose program that can take advantage of the expanded address space.
Accesses to hard drives make 64-bit addressing more useful. It's too early for exploration of 64-bit architecture to have yielded applications that run best in 64-bit mode.
Re:Wth? (Score:1)
Re:Wth? (Score:1)
It has to do with the fact that the 64 bit instructions clean up a lot of the mess (including removing the 16 bit ones) and add extra pipelines.
Ideally, your code is clean enough (Score:5, Interesting)
My personal experience with this was Linux on Alpha, where certain programs assumed a 32-bit environment, rather than querying the system they were built on for size of int, pointer, etc. As a result many programs were funky on the Alpha, and the 'pc-isms' (what we once would have called Vaxocentrisms) caused great waste of time as they had to be tracked down an eliminated.
Your code, if you've been worrying about anything other than 32-bit PCs, should already be 64-bit clean, as you've had 15 years of Alpha, SGI, Power, Itanium, and Sun 64-bit systems to support. If it isn't, hopefully it's something such as user interface which will still run in the 32-bit environment, though not necessarily optimally.
Personally, I think that writing robust, portable, code is worth the effort. Unless you're talking about running on an embedded system where every byte counts, it doesn't hurt you at all to design clean algorithms and data structures, and put in checks to actually determine the size of ints, longs, pointers, etc, rather than just assuming that everyone will run x86 (or MIPS-64 or whatever) from now until the end of time. I have research programs that were written in the 70s (in their original form), on Cyber 205 and similar long-gone architectures, which still work because they were written in a mostly portable manner, with only the most critical nasty bits tied specifically to that machine. Your code is going to be in use longer than you think; be nice to your successors and make it portable now.
I'm looking at it for experiment and isolation, (Score:1)
Re:I'm looking at it for experiment and isolation, (Score:2)
I think that in RISCs, memory access is word aligned, so if you do a load 16, what the HW will do is fetch a 32bit word and then putting 16bit in your register.
I'm not sure how writes are handled though.
Re:I'm looking at it for experiment and isolation, (Score:2)
-David
Re:I'm looking at it for experiment and isolation, (Score:1)
If your instructions don't find in RAM completely, then you're screwed. Buy more RAM.
Attacking problems from a "every byte counts" perspective can help you decide what you want to do when every byte doesn't count.
I don't see how.
Besides, all things being equal, why not go for the smaller code size?
Because, all things are generally not equal. Worrying about this stuff makes sense if you're
be nice to your successors (Score:2)
Re:Ideally, your code is clean enough (Score:2)
I hope your manager does, too. What does he say if you're already late with the project, but you tell him you'd like to test it on another architecture?
Re:Ideally, your code is clean enough (Score:4, Insightful)
Re:Ideally, your code is clean enough (Score:2, Insightful)
I understand the fact that you can at least prepare for portability. However, I would always want to run it through an alpha, beta (and maybe acceptance) environment before saying it'll work.
Detailed Reponse to Cliff and HackWrench (Score:2, Interesting)
What is the inherent "slowness" of "16 bit code" WTF is "16 bit code" anyway? Sounds like has been duped by the marketing droids...
So-called "32-bit" processors are typically designed to perform (up to) 32-bit arithmetic efficiently. For integer operations, 8bit, 16bit and 3
Re:Detailed Reponse to Cliff and HackWrench (Score:1)
Answers to your question. (Score:1)
Re:Answers to your question. (Score:1)
Ah, NO. The Intel x86 ISA allows non-aligned memory accesses... (It is probably one of the few commonly used ISAs that do this).
That may be, but are you still refering to the Intel x86 ISA???? It uses variable-length instructions. These are a nightmare decode (for hardware) but ar
Re:Answers to your question. (Score:2)
Re:Answers to your question. (Score:2)
Yes, it does, but they're significantly slower than optimally aligned accesses. Why do you think good C and C++ compilers on Intel boxes still add padding to structures, even where it's not strictly required to access the members concerned?
Re:Detailed Reponse to Cliff and HackWrench (Score:1)
As for "anyone", there's this bunch of meteorologists, biologists and astrophysicists I'd like you to
meet...
Re:Detailed Reponse to Cliff and HackWrench (Score:2)
Re:Detailed Reponse to Cliff and HackWrench (Score:2)
Agreed. There's still room for improvement when it comes to floating point formats. While 2^63-1 is a ridiculously large number for integer calculations, with floating point, you will still see the benefit going from 64-bit to 256-bit. And then, there's also funkier th
Re:Detailed Reponse to Cliff and HackWrench (Score:1)
Also Most processors have a carry/extend flag (there are exceptions) so a 64 bit add with a 32 bit registers can be done with 2 adds.
>>In a 32-bit RISC processor, most of the instruction bits are reserved to allow large immediate operands for memory offsets, jump targets, and arithmetic/logic operations
Re:Detailed Reponse to Cliff and HackWrench (Score:4, Insightful)
I don't know. Then again, ten years ago, if you'd told me that an e-mail client or web browser would require tens of megabytes of memory just to load, or it would require over 100MB just to store the quick start-up code for an office application, I'd have laughed. Right now, that's exactly what Firefox, Thunderbird and OpenOffice 2.0 are claiming on the PC where I'm writing this.
Actually, I'm still laughing, because that says more than words about the design of those applications and the tools used to compile them. But the applications have expanded to fill the space nevertheless.
Re:Detailed Reponse to Cliff and HackWrench (Score:1)
However,think about how large 2^64 is... Isn't like the same order as the number of atoms in the universe or something like that??
Let's guess that in the near future, there will be 10 billion people in the world (~10^10). Let's say we wanted a single computer that could store something about ev
Re:Detailed Reponse to Cliff and HackWrench (Score:2)
It depends (Score:5, Insightful)
For instance, consider a video game. The faster it is the more likely it is that players will like it. But there are many more important factors including is the game just plain fun. So in video games, there is really a basic threshold of speed that needs to be met and after that is met, other factors are more important.
Next consider a real time system for trading stocks. This system is all about speed and reliability. You can control the deployment hardware and it is economically worthwhile to spent a lot in development if it makes more money in the long run. So coding your own memory pooler that uses the size of the pointer and a specific struct to make the code allocate and deallocate memory in constant time (it is very possible) is worthwhile because it can save alot of time per transaction.
But all of these issues come down to what exactly you are writing and both the technical and business requirements of your project. Without knowning those in advance, we can't really answer your question.
Re:It depends (Score:5, Interesting)
Basically, the book covers the major vector instruction sets: Altivec, PS2, SSE, etc. Naturally, a program written with hand optimised SSE assembly won't run very well on a PowerMac G4. So, the approach the author used was to start by coding a vector math function in plain C. He only calls this function by a function pointer. So, instead of calling sw_vector_foo directly, he calls vector_foo. He then goes on to write altivec_foo, and sse_foo, and gamecube_foo. With some simple #ifdefs at compile time, the function pointer is assigned to the most optimal code path for the platform.
So, the result is that by thinking about portability going in, he doesn't have to do hardly any work to have fairly optimal hand-tuned vector routines for a new architecture.
In general, code written to be portable is also much cleaner, and better commented, and whatnot, just because the author was forced tos pend an extra few minutes thinking about how things ought to be put together. I really can't think of any normal case where portability shouldn't be a consideration. On some obscure embedded systems, you might really want to optimise to a super specific piece of hardware, but it is seldom worth it.
Think about writing GUI apps for a Palm pilot before the switch to ARM CPU's. A programmer could have said, "hey, I'm using the Palm OS API's, and they only run on Coldfire CPU's, so I have no reason to make anything portable." Then, a little while later, Palms OS starts running on ARM. If he had invested a smidgen of extra effort to write his code in a portable way, he could easily start to take advantage of the ARM stuff right away. Since most of the issues of portability are in the planning phase, and get handled at compile time, the difference in memory footprint need not be appreciably larger. (Like a bunch of hand coded ASM for a different platform, which get's #ifdef'd away, or sizeof() operators...)
The problem I have with portability: (Score:1)
Re:It depends (Score:2)
Everything that's don e in the book is perfectly understandable to somebody who knows C, but it's not something I usually see done in that way, all together. I've written plenty of software that works on MacOS/Linux/Windows/PPC/x86/SPARC
Re:It depends (Score:2)
Does it matter? (Score:5, Insightful)
So, unless these systems have performance critical portions, like high-speed digital signal processing where every FLOP counts, it really isn't worth the extra effort to optimize your code for the platform - you'll just end up having to hand-tweak (or even worse, un-tweak) it again on the next hardware upgrade.
Re:Does it matter? (Score:5, Insightful)
For most applications, the potential performance gains from hand optimization for a specific platform aren't enough to matter. (And, as I think Brian Kernighan said, trying to outsmart the compiler defeats the purpose of using one.) Big performance gains come, in most cases, from figuring out a better way (~algorithm) to solve the problem, not from tweaks.
There's another aspect of portability that doesn't get mentioned too much: the portability of the programmer. If you are in the habit of writing portable code, it's much easier to shift to working on a different platform. (I'd also say, from my own experience, that it makes your work less error-prone.) That versatility is potentially of significant value to your employer, and of course is of value to you personally.
I don't really want my compiler to be very smart. (Score:1)
preserveargs funct1(arg1, arg2,arg3)
preserveargs funct2(arg1, arg2,arg3)
preserveargs funct3(arg1, arg2,arg3)
flushargs funct4(arg1, arg2,arg3)
and be able to call any combination of funct1,2,3 in any order and finalize with 4 instead of depending on whether or not the compiler will figure out that doing this will result in faster code.
It doesn't hurt for the compiler to pass speculations up to me, or even to generate potentially more efficient sample source code,
Re:I don't really want my compiler to be very smar (Score:3, Insightful)
Source code annotations, pencil and paper... (Score:1)
Re:Source code annotations, pencil and paper... (Score:2)
Re:I don't really want my compiler to be very smar (Score:1)
Once again, from the real world, I have moved a quarter of a million line parallel Fortran program to a new 64-bit architecture
This is the compiler's job. (Score:4, Insightful)
This is the compiler's job. If your compiler targets a particular processor poorly, get a better compiler.
There is no such thing as portable code:
When most developers talk about portability they are talking about OS portability. The portable-to-other-processors debate has long since left the building largely due to incredible speed increases in processors. There's no reason, apart from esoteric algorithm tweaking, to code something in a processor specific manner.
Code porting to another OS is only an issue because operating systems and the hardware they run on are still changing at a dramatic pace. There is no standardized language that covers all the common aspects of a modern operating system, because they are aiming at a moving target. Even the ultra-portable Java has to be extended outside of the official specification to cover serial ports, complex sound, complex graphics, etc.
Portability hasn't been about processor speed for a very long time, and at this point it shouldn't be - a better compiler or a faster processor is a *ton* cheaper (time, money) than writing processor specific code in all but a few extraordinary cases.
-Adam
Re:This is the compiler's job. (Score:2)
Re:This is the compiler's job. (Score:2)
Hear hear. You are 100% correct here.
Agreed, but it sometimes takes some thought to even realize you are coding in a processor specific manner. For instance, if you've ever programmed for a Mac or written networking code on a PC you realize that all binary data formats were created with a cert
But a compiler is only as good as its language (Score:2)
That's true, of course, but the compiler can only be as good as the language it's compiling permits.
In higher level languages, you can express design intent more completely than you can in lower level languages. C isn't a high level language, it's a portable assembly language. That's a role it plays very well, but as long as programmers are writing in C, the compiler will have to deal with aliasin
What Makes An Operating System "Portable"? (Score:2)
``As an introduction the properties of a "hardware platform" are described, and it's showen that getting the same behaviour of software on different hardware platforms isn't "portability". After repeating the tasks of an operating system, it is explained what an operating system needs to provide in the lower
That's a close issue to my point (Score:1)
Re:That's a close issue to my point (Score:2)
Trying to remain binary compatible between all those platforms is just too much of a PITA.
The performance question (Score:5, Insightful)
1) Premature optimization is evil. Everybody says this, but so many people do not take it to heart. I'd rather have software that works, than software that is fast but crashes. As a programmer, its nice to work on non-buggy software, even if its not as fast as it could be.
2) Target-specific optimization is generally evil, unless you're sure your code will not live very long (eg: a game). The thing is that micro-optimizations generally tune for a particular processor, and actually pessimizes the code in the long run. In comparison, if you write good general code, it'll still be fast ten years from now when processors look very different.
3) The bottlenecks that people, especially C/C++ programmers worry about, are usually not the bottlenecks that usually matter. If you worry that your code could be faster/more memory efficient if you use a 16-bit field here or there instead of a 32-bit one, your algorithms better be absolutely perfect. Most code does not use perfect algorithms. That's why so much software is still so slow. Most programmers just don't get the time to use the best algorithms, much less get down to the level of micro-optimizations.
That's why I always find language performance debates entertaining. C/C++ programmers will freak out if you tell them language X is very productive, but is maybe two-thirds as fast as C (something that is true of a number of high-level, but compiled, languages). Meanwhile, they will write code that runs at maybe 1/3 of what the machine is capable of, because they spend so much time writing the code they have little time to optimize it.
Sure, but premature pessimization is evil, too (Score:2)
Not quite everyone says that. While I agree with the general principle, premature pessimization is the root of naff code, particularly when insufficient allowance is made for fixing it up once the code is working correctly but slowly.
Consider, for example, passing a large bit of data as a parameter to a function. In languages that use pass-by-reference semantics, this will typically be cheap. In languages t
Re:Sure, but premature pessimization is evil, too (Score:3, Insightful)
I would tell a C++ programmer that worrying about a bit of extra
Re:Sure, but premature pessimization is evil, too (Score:2)
The thing is, that's not true. In isolation, it might be, but the overhead of passing a data structure that is a few words of memory by value
Re:Sure, but premature pessimization is evil, too (Score:2)
If its in a critical area, then the profiler will point it out. If its not in a critical area, then it doesn't matter. Plus, do you have any idea what the overhead really is? It's tiny. I just tried a benchmark calling a very simpl
Re:Sure, but premature pessimization is evil, too (Score:2)
I understand what you're saying, but I think you're still missing my point. This is just one trivial but routine efficient coding practice, and you've just demonstrated that if a lot of your functions are simple things working on moderately complex data, the overhead of not doing it can be as high as 10%, all because of a stubborn insistence that no optimisation should
Huh? (Score:2)
Name one such processor feature. What on earth are you talking about?
One example is being able to write a program that purposely uses a combination of 16-bit and 32 bit.
Huh? You are not making sense. What does this have to do with portability? Are you talking about memory models or sizes of variables holding data. In either case it doesn't make any sense. Nobody "purposely
I don't know, ask the Itanium team... (Score:2)
Gibberish (Score:1)
If performance still is not adequate (don't guess, ask the profiler), isolate the
You don't have a choice in the mainstream (Score:2)
Odds are you do not have a choice. x86-64 is coming fast. Microsoft has Windows running on it, and is likely to make it mainstream sometime soon. They have promised they will anyway.
Apple is moving from PPC to x86 (no word that I know of on 32 or 64, but I would assume both).
Linux runs on so many systems that anything other than portable code will get you flames if you are open source. If it runs on linux it better run on at least all 4 BSDs, and Solaris, if not more.
This is good. In my experienc
Portability (Score:1)