Optimizations - Programmer vs. Compiler? 1422
Saravana Kannan asks: "I have been coding in C for a while (10 yrs or so) and tend to use short code snippets. As a simple example, take 'if (!ptr)' instead of 'if (ptr==NULL)'. The reason someone might use the former code snippet is because they believe it would result in smaller machine code if the compiler does not do optimizations or is not smart enough to optimize the particular code snippet. IMHO the latter code snippet is clearer than the former, and I would use it in my code if I know for sure that the compiler will optimize it and produce machine code equivalent to the former code snippet. The previous example was easy. What about code that is more complex? Now that compilers have matured over years and have had many improvements, I ask the Slashdot crowd, what they believe the compiler can be trusted to optimize and what must be hand optimized?"
"How would your answer differ (in terms of the level of trust on the compiler) if I'm talking about compilers for Desktops vs. Embedded systems? Compilers for which of the following platforms do you think is more optimized at present - Desktops (because is more commonly used) or Embedded systems (because of need for maximum optimization)? Would be better if you could stick to free (as in beer) and Open Source compilers. Give examples of code optimizations that you think the compiler can/can't be trusted to do."
Security (Score:0, Interesting)
You MUST trust the compiler more and more to protect the code from buffer overflows and other trivial, but hard-for-humans-to-detect mistakes.
Clear & Concise Code (Score:5, Interesting)
Also the clear code will result in fewer misinterpretations, which will mean fewer bugs (especially when the original author is not the one doing maintenance years later), further reducing costs in dollars, man hours, and frustration.
Optimize at the interpreter/compiler level... (Score:3, Interesting)
Beware of habits. (Score:5, Interesting)
Ended up writing my programs in assembler...
Tight complex recursive loops (Score:2, Interesting)
I could write sloppy code which appears to be significant, but then realise that holding this here, and keeping that register there, then I can do such a thing just a fraction quicker than before.
Its not so much going to the assembler level anymore, but a tightly coded loop tuned by human intuition will almost always still be faster than anything an optimiser can give.
I recently had an issue with sorting collections containing thousands of none trivial objects. Every time I adjusted the adready fast quicksort, I gained a little more speed.
Its always been the way, and until genetic compilation and optimisation comes along trying every combination, it will continue to be the case.
Check out the LLVM demo page (Score:5, Interesting)
One of the nice things about this is that the code is printed in a simple abstract assembly language that is easy to read and understand.
The compiler itself is very cool too btw, check it out.
-Chris
"if (0 != variable)" is for wimps (Score:2, Interesting)
Those that advocate the constant first in if boolean comparisons are the feeble minded who believe ever bit of propaganda and PC reporting they read.
It's especially annoying to see it done in programming languages like Java where assignments are not even allowed in if statements. When I see java code like this:
if (0 == x)
I'm thinking "loser".
Back to C. And besides gcc since C-99 has issued warnings for assignments in the "if" expression when not embedded in parenthesis.
example...
if (x = 3)
if ((x = 3))
Re:Clear Code (Score:3, Interesting)
In fact, working at Research In Motion has shown me how to write better code... sometimes, the most efficient or clever solution are the worst. I would rather see longer variable names, descriptive control structures and a total lack of goto statements. If you write sensible code, and use an optimization setting of 4 or 5, then you will have better programs in the long run. Also, the more complex and "optimized" your code is, the less chance the compilier would be able to optimize it and may even slow it down a little.
But the biggest thing is to make it readable... I think writing code that executes SO FAST would be useful only in real time systems and large servers.
Now... back to my realtime system... gotta make those blade servers smoke!
Re:Neither are correct. (Score:3, Interesting)
Also, your statement that "relying on NULL to be compatible with all pointer types is risky" is just plain wrong.
Re:Clear Code (Score:1, Interesting)
'Since' implies a temporal relationship while 'because' more clearly shows a causal relationship.
Re:You should always... (Score:2, Interesting)
Re:Algorithms, Not Stupid Processor Tricks (Score:5, Interesting)
One of the finest moments in my programming career was when my boss asked me to see if I could gain a speed improvement in a program that surveyed a huge datastore and generated volumes of text from it. This program had to run once a month, and deliver its result in the same month. The program that was originally written, unfortunately, took three months to run (it started out OK, but the data store had grown considerably). They had asked one of our "best programmers" to create a faster version of the program. He did that by reprogramming the entire thing in assembly (you may now understand why managers thought he was one of the best programmers). It took him six whole months to finish the new version. The resulting program completed the task in just about one month. However, my boss was afraid that when the datastore would grow a bit more, we would again be in trouble. That's when he asked me to look it over. I started by investigating the problem, which at first glance looked like a network traversing problem. I soon realised it could be solved by a nested matrix multiplication (which is, of course, a standard way to discover paths in a network). It was a matrix with about a million rows and columns, but since it contained only zeroes and ones (with a couple of thousands times more zeroes than ones), the multiplication was easy to implement in a fast way. Within half-a-day, I had built a prototype program in a high-level language which did the whole job in a few hours.
While I am still pleased with this result, I really think it came off so well not because I was so smart, but because the assembly programmer was not really worthy of the name. Still, I often use this as an illustration for students who are writing illegible code and argue that it is so very fast.
Re:Clear Code (Score:5, Interesting)
I wish I could put my hands on an article I read a couple years back on the code in the Space Shuttle. They go at that code base with an attitude that makes the average paranoid look happy-go-lucky. In fact, they approach software engineering kind of like other engineers do -- as if lives depended on it. It's old, it's slow, and it works. (Oh, wait, here it is. [fastcompany.com]) That's how code is maintained.
Re:Write C for C programmers (Score:4, Interesting)
But please, oh, please, don't write this:
if (!strcmp(x,y))
Intuitively, that looks exactly backwards from what it's testing (equality).
On the "optimize for the compiler" issue, I think what's been said already is right: don't do it unless it's in a critical spot, write code for readability (and big-picture efficiency) first, then worry about local optimizations only if there's a problem.
Re:Algorithms, Not Stupid Processor Tricks (Score:3, Interesting)
Every once in a while it really is worth careful instruction level optimization when you have a loop that must execute more than 100 million times, or some such. But the biggest gains in computational speed are always about figuring out the best approach to a problem.
Get your priorities straight (Score:2, Interesting)
If the program must be fast your first concern should be getting the right answer. I can make any program lightning fast as long as it doesn't have to return the right answer. This may seem trivially obvious but you'd be surprised by how many times optimization attempts end up optimizing away the right answer.
Pick the right algorithm and implement it clearly.
If it's too slow, break out the profiler and optimize.
If it's still too slow, you screwed yourself by not including performance requirements at the very beginning. Maximum performance must be designed in from the start (e.g., look at high performance matrix multiply libraries)
The question was full of bad examples too (Score:3, Interesting)
If you are going to do this, it is better to do if(NULL == ptr) so that if you omit one of the equal signs, it won't compile.
In the end we can argue these things forever. People see code differently and may find one idea elegant and easy while the other unmaintainable and unreadible. Sort of like the thread v process debate (ongoing for now what... 30 years?).
Re:Wrong, wrong, wrong (Score:3, Interesting)
Re:Clear Code (Score:4, Interesting)
Re:Clear Code (Score:3, Interesting)
Well, in Java this may speed things up. All instance variable references won't be cached locally. It's been a while since I checked this, but I believe that it's still true. Just something to think about if you're looping a few million times.
Sometimes it helps, sometimes it doesn't. The only way to really know what your compiler does is to look at the output of the compiler, and understand what all those switches do.
Now there are some things you can do to optimize your code while you're writing it, but a lot of that is heavily dependent on your platform, runtime, and environment. In general, you should write your code in a straightforward way, then write an optimized version of it later (if you determine it's necessary).
Of course, the best way to help your optimizer is to write things in the simplest manner possible.
Re:Clear Code (Score:3, Interesting)
It has to make sure it uses the copy in memory everywhere if anyone else gets a copy of the pointer. That function might keep a copy, pass it around there's no way to know.
If you use "const" in the function decleration, it can assume no one elsewhere will be changing it.
Of course, in my always humble opinion using a compiled language at all is a premature optimization most of the time.
Re:Tight complex recursive loops (Score:1, Interesting)
Beware of trying to "tune" quicksort. The stock quicksort that comes with your stdlib is already tuned for maximum *average* performance.
Keep in mind that quicksort will always have a worst case performance of O(n^2); however, on average it runs about 10% faster than the fastest possible heapsort and mergesort algorithms, which are both O(log n). The main reason for using quicksort is the fact that it's 10% faster *on average* than the pure O(log n) algorithms.
Any 'optimizations' you make to quicksort may boost performance for degenerate data sets that hit the 'worst case' on several recursions of quicksort, but they might hurt average performance. If you end up doing more than 10% damage to the average, then you might as well be using a strictly O(log n).
If you're not willing to get your modifications to quicksort peer reviewed, don't bother touching it.
-my 2 cents
Re:That's not as funny as you think (Score:3, Interesting)
That would then be because your commented the wrong thing. Comments should have nothing to say on what is being done, that should be obvious by the code, comments have much more value, become obsolete less often and make code cleaner when they explain why things are done: comments give the big picture, the code the details. Steve McConnel's "Code Complete" is a wonderful treatise on this and other good things.
As a trivial example:
int a = b + 1;
Re:Clear Code (Score:5, Interesting)
There is a whole are of study involved in correctness checking, which is related to the SAT (Satisfiability) problem.
The operating system choice is also interesting. Linux doesn't even come close to what they need. Having device drivers in the kernel is just not a good idea. It needs to have a separation kernel, at least that is the goal. I presently think they use the INTEGRITY operating system by Green Hills, but I could be wrong.
Just optimize for the "big picture" (Score:4, Interesting)
As for simple code optimizations, here's what a modern compiler (Microsoft Visual C++
Re:Clear Code (Score:3, Interesting)
no compiler I've sean can optimize an algorithm
Although I definitely agree with your conclusions, the above sentence is not correct. Most implementations of Scheme, for instance, does a lot of algorithmic optimisations, basically by turning tail-recursion into (functional) iteration when possible.
These kind of optimizations are extremely effective in some scenarios because it significantly reduces memory use and read/write of memory, and can be illustrated quite easily by turning optimization off in any good Scheme interpreter (like DrScheme).
Re:Not always. (Score:3, Interesting)
In C, this is because 0 != NULL. NULL = (void*) 0. This makes a difference in function calls, where calling myfoo(int a) with myfoo(NULL) is a compiler error/warning, but myfoo(0) is legal.
In C++, NULL = 0. Really. Here's the header from gcc (well, really the kernel):
This is because C++ has points to members, I believe (not sure, don't use void* pointers). Instead I use this struct, which IMHO should be included in STL (slightly shortened in the interest of brevity, and written from memory, so may not compile):
The first member ensures that my_null can be converted to any pointer other than pointer-to-member, and the last one takes care of those. This method makes the C case above work as it should, i.e. give an error.
As for clarity, it separates two concepts: The NULL pointer (a pointer pointing to nothing) and 0 (an integer).
As an aside, I actually prefer !p, since it say to me "If p not a valid pointer....". But I can handle both :)
Trusting the compiler will become very important (Score:3, Interesting)
Further, consider the Parrot system to be used by Perl, Python and Ruby. There's a strong similarity here to the cell system. All three languages are to be compiled to a common register-based representation. That representation is to be optimized by the compiler before execution. They chose this model because w we have decades of research on optimizing code for register based computers (as opposed to Java's stack based computer).
In short, some very large, very important projects already have a lot of faith in these optimizers. They are not going away. I suggest the best approach is to work with them.
So how do you cooperate with your optimizer? Write cleanly and clearly. Don't try to outsmart the optimizer, because if you do so, your code will most likely be slower, not faster. And don't do any work until you need to. Write the project correctly and clearly, then profile. If you need to modify things, then you have a working baseline to compare your optimizations with.
Finally, when you get the faster version, check it in, then refactor the design to something reasonable, rechecking the speed as you do so. Ideally, for a small performance hit, you can end up with fast, efficient and easy to maintain code.
Example: do _not_ optimize yourself! (Score:1, Interesting)
unsigned char *origimg = malloc (width * height * 3);
unsigned char *imgdest = malloc (horpix * vertpix * 3);
unsigned char **mapping =
malloc (horpix * vertpix * sizeof (unsigned char *));
with some formula to morph the image, getting a pointer to the input pixel for every output pixel:
mapping[y * horpix + x] =
origimg + somey() * width * 3 + somex() * 3;
Now the interesting part. To do the mapping itself (repeatedly, think movie), this looks about the worst you can come up with:
for (y = 0; y < vertpix; y++)
for (x = 0; x < horpix; x++)
{
imgdest[(y * horpix + x) * 3] = *mapping[y * horpix + x];
imgdest[(y * horpix + x) * 3 + 1] = *(mapping[y * horpix + x] + 1);
imgdest[(y * horpix + x) * 3 + 2] = *(mapping[y * horpix + x] + 2);
}
YET: Try ANY hand-optimization that performs the exact same function, and gcc-2.95.2 -O9 -funroll-loops on i386 will produce SLOWER code! Even having a single var for (y*horpix+x) is SLOWER! If you don't believe me, try it. I did, and I was _very_ surprised.
The fastest version I ended up with actually optimizes differently by not copying 3, but 4 bytes at once:
for (y = 0; y < vertpix; y++)
for (x = 0; x < horpix; x++)
{
*((unsigned int*) (imgdest+((y * horpix + x) * 3))) =
*((unsigned int *) mapping[y * horpix + x]);
}
But this is only a few percent faster than the original, and again, do NOT optimize further!
-- JAB
(Code donated to the public domain)
Re:Clear Code (Score:3, Interesting)
> correctness checking, which is related to
> the SAT (Satisfiability) problem.
Knuth:
"Please don't assume that the above code is error-free. I have only proven it correct, not tested it." (or words to that effect)
Re:Wrong, wrong, wrong (Score:2, Interesting)
No, "implementation defined" means that it is up to the compiler, OS (if present), and possibly the hardware to determine what will be used to represent the NULL address.
This is addressed in the C-FAQ [eskimo.com] where systems from Prime, Data General, and Honeywell-Bull are noted for having non-zero null pointers (at least for their C-compilers).
This issue brings up one of my pet peeves with C++. The designers (I don't think Stroustroup deserves all the blame)went all out in adding weak type safety to the language by eliminating automatic casts but then they determined that "((void)(*0))" violated they safety model and decided to invoke special "spooky" behavior when comparing pointers against the literal zero. This of course leads to the annoying practice of leaving out the Boolean operator altogether and using "if(foo)" and "if(!foo)".
I imagine the C++ architects were at least partially motivated by their "macros are evil" mantra (why they didn't do anything about the need for header guard macros is beyond me). They should have had the guts to introduce a new "null" keyword. This would not conflict with existing ANSI C code using NULL and code could easily be ported with substitution or:
I can't believe that this wasn't proposed at some time. It must have been shot down because someone complained that it would break their legacy C code. Only a fool would put case variants of standard macros and keywords into their C code and porting older pre-standard-NULL K&R code would require revisions anyway. They deserve any breakage if they tried to port such crap to C++.Re:If you're not willing to TIME it... (Score:3, Interesting)
As such, I'd say: if something is too slow, benchmark it, then try to write a more simple version and benchmark that. Many times the result will be faster (and rest of the time you probably need a better algorithm).
As funny as it sounds, because compilers do fancy optimizations, writing code to "explain the logic to the compiler" might well enable optimizations that a compiler couldn't use on "more optimized" code. One obvious (and simple) example is the "const"-modifier in C, which helps clarify extent of usage to both other programmers AND the compiler; the result in performance can be quite dramatic in truly CPU-bound code.
Once we have dealt with #1... (Score:1, Interesting)
As an example, I had some terrain rastering program starting with isoheight lines to optimize. Some computer scientists had already optimized the searches, cutting the running time for jobs with in the order of 100000 line segments in half. From every raster point, searches were done in eight directions to find the next isoheight line, then a distance-weighted average was taken.
The code had become unreadable with the optimizations and still ran a day. I decided to rewrite it, matching the data structures the other way round. Instead of going through the raster points and looking for the lines, I went through the lines and registered every of their crossing with the straight and diagonal grid (Bresenham). Then I sorted the grid crossings and worked out the stuff.
It took me two hours of debugging after 10 days of implementation to find out why my program would exit unexpectedly after less than a minute. I finally got it. It was finished.
So here is rule #1: design your algorithm well. Here is #2: when fitting regularly spaced data with irregularly spaced data, walk through the irregular stuff and find the corresponding regular stuff: that can be addressed immediately by indexing without requiring a costly search.
This is macro optimization. There comes a point of time when you actually need to optimize at the small level. Don't go there until your profiler tells you.
And here is the number one rule for optimizing C code: don't use pointers if you can avoid them.
There is a clever trick to implement twodimensional arrays as pointer arrays pointing to the actual rows of the array. This saves you a multiplication for pointer arithmetic.
Making use of this clever trick with modern processors and compilers will cause your performance to drop by about a factor of 5. I am not kidding you. The compiler's aliasing and parallism detection passes are made useless by this technique, strength reduction fails, the out of cache line accesses cause thrashing in memory accesses, and the additionally necessary memory accesses are much slower than address calculation which happens at full processor speed.
Use C99's variable dimensioned arrays instead: those are not half as confusing to the compiler, and much more efficient, anyway. And, sad to say: think of using numeric subroutines in Fortran when having to deal with multidimensional arrays. C99 has not been around for so long, and almost no code making use of its features exists.
Finally, a story that my signature works well with (Score:4, Interesting)
Re:Not always. (Score:2, Interesting)
> well written it doesn't NEED comments"
They are lazy, and lack attention to detail. This is a mental escape on their part that assuages their small-frog egos.
Re:Ask the compiler... (Score:2, Interesting)
Now a processor that zero-checked a pointer on memory deference, auto-built to skip the deference, combined with a language that supported it, would have the check in hardware...
Re:Write C for C programmers (Score:1, Interesting)
> if (!strcmp(x,y))
#define strings_match(str1, str2) (!strcmp(str1, str2))
Much clearer.
If that's true.... (Score:3, Interesting)
Because IMO nowadays writing in C is usually a premature optimization.
With all the GHz CPUs, higher level languages are often more than fast enough. It's usually the design and algorithms that you have to get right.
The advantage of writing in a higher level language first is you write far fewer lines of code - esp if the Customers keep changing their minds every month.
Optimizing at low levels only gives you linear improvements in speed. Doesn't help you if the system slows exponentially.
What I mean by linear is:
Say you use the same algorithm.
#1: fast low level language, optimized.
#2: in a fast low level language.
#3: high level language.
If #3 is 4 times slower than #1 in most cases it'll always be 4 times slower. Same if #2 is 10% times slower than #1.
Whereas a high level optimization can gain you lots more.
On a fair size complex system it's probably best to write stuff/modules in a high level language first and then replace them with C or hand assembly later if necessary.
Maybe it is wasteful. But at least you can say to the people doing it - the informal spec is: "it's got to work exactly like what you are replacing - only faster". After all that module is already working
Call the stuff in the high level language your pseudocode. The equivalent of the plastic/clay model or prototype.
Re:Readibility vs null values (Score:3, Interesting)
The concept of NULL is an extension to the domain of a variable. It means "no value". In C/C++, we often think of the pointer as the object itself, and as such we do the little trick of using two different operations to get that valuable extension to the domain: we either read the value of the pointer (a memory address) and compare it to zero or we load its content ( the -> operator ). However, why doesn't one usually compare pointers to static values other than zero( eg, ptr != 3 )? In not doing it, we bring to light the fact that we're just using part of the domain of the pointer variable.
To make this clear, think of what you'd have to do to have a null integer. In C you'd likely do:
int * val = malloc( sizeof(int) );
And then you'd use "val" for the null comparison and "*val" for the integer. Funny how in doing that you're spending more memory than if you did:
typedef struct { int value; bool isnull; } NullableInt;
NullableInt a;
With this you'd spend 5 bytes per variable. With the other approach you spend 4 bytes per variable plus 4 bytes for the pointer, for a total of 8 bytes (on 32 bit architectures).
Obviously, these considerations are a bit pointless, and one benefits little from them in the context of C/C++ programming.
On the other hand, some languages have specific support for nullable variables, without having to resort to pointers (at least syntactically). NULLs are famous in SQL, although they are also infamous for being used incorrectly many of the times.