Is Source-Code Optimization Worthwhile? 33
nwetters asks: "I'd like my programs to run as fast as possible, and there are some great books (e.g. High Performance Computing) that give examples of loop optimizations and unit-stride memory use. However, I've heard that modern compilers will automatically optimize even the worst of my cludge. Should I bother with manual tweaking, or leave it up to the compiler? "
What type of code is it? (Score:1)
The type of problem to be solved plays an important role in optimization of the code.
"It Depends" (Score:3)
That doesn't mean you should ignore stuff like memory stride and loop invariants, etc while coding, just because the compiler can fix it. For one thing, who knows what compiler/machine you're going to be using tomorrow? Code optimizers are stupid and conservative, and it doesn't take much to make one "give up" and go with "slow but correct". Another thing is that a lot of code that does confuse the optimizer is likely to confuse a future maintainer as well.
depends on the optimization (Score:2)
As for some of the more esoteric optimizations, like loop unrolling, you may or may not get a compiler that can do good loop unrolling, so it may pay to unroll loops manually. Still, for the first attempt, you should probably write the loops without unrolling them so that you can think about the rest of the program, rather than getting caught up in minutia.
I routinely perform some small optimizations that rely on very large scale knowledge of what the code does and how it will be used. For example, I have an AVL tree library that caches the result of the last search and checks the cache variable before each lookup. That kind of optimization couldn't easily be done by the compiler (since it optimizes accross calls to different functions and it relies on what the code is being used for by the caller) but could have a big impact on code that might perform multiple lookups on the same key in sequence. (of course I would be better served by a splay tree in this circumstance, but that's beside the point here)
The kind of optimizations that it really pays to do are more along the lines of smart coding and good design than they are like traditional code optimization. In general, if you make your code readable and don't use stupid algorithms you can rely on the compiler to fix the small stuff.
Probably not. (Score:1)
So second for second, you're better off writing VB programs on contract and using the money to buy a faster processor than you are trying to figure out how to reimplement printf in assembler...
Depends (Score:2)
But for the most part, it's best to just write clean, readable code. It doesn't matter that the code is fast if it can't be read, since someone will just have to redo it when it needs changes. Also it usually helps the compiler optimize more if it can figure out what's going on. For instance, the KAI C++ compiler has a very very good optimizer (my code runs 2-3 times as fast over gcc 2.95.2 on the same machine), and the docs recommend you write plain, readable code so the optimizer can see what's going on. One time I replaced some code with inline asm, and it got significantly slower. I took it back out.
Just my 2% of a $.
Only if you do a few things first... (Score:3)
Sure, go ahead! But before you do, you should do a few other things:
Re:Depends (Score:2)
Optimisation (Score:2)
Inline asm (Score:1)
I remember Watcom C++, a Windows/DOS compiler, supporting pure inline assembly by providing them as inline functions. Thus instead of using an inline C++ function, you could write an inline assembly function, callable by C. The programmer was also responsible himself to save and restore the proper state of registers. So nothing was lost, everything was gained. Although, you had to know what you were doing.
Despite Watcom C++ being superior in all areas, it was still beaten by MS Visual C++ in "fair and square" marketing.
- Steeltoe
Re: Source Code Optimization? (Score:1)
In fact, some famous tweaks can make things worse: the (in)famous "Duff's Device" creates what's called an "irreducible flow graph" (a pathological sort of loop) that makes a lot of optimizations impossible--the best a compiler can do is to replicate some of the code to make the flow graph reducible, thus undoing the supposed benefits. (In Duff's defense, the device was the Right Thing to do if he had to use a compiler that didn't do data flow analysis.)
Profile!!! (Score:2)
>~~~~~~~~~~~~~~~~
Optimization (Score:1)
Machine Dependencies (Score:1)
I used to do source code optimization when C compilers were not good at optimization and the code could run much faster if the source code was tweaked to effectively use registers and avoid slow address computations.
Modern compilers have much better optimizers and it has become more difficult to optimize code for the CPUs of today. Counting cycles in a program is much more difficult than it used to be. When you have families of processors, the optimal code for one processor may be slow on another.
I used to use pointers whenever possible since experience had taught me that array references were always slower than pointer references. Then I discovered that array references were substantially faster than pointer references on some code that I was writing for a Pentium. The code was cleaner, easier to understand, smaller and faster when I used arrays.
If you look at the output of a good compiler, you will often be surprised at the optimization tricks that you would have never thought of.
Source code optimization is useful in some cases (Score:1)
My codes run for weeks/months at a time. 5% can mean a few extra days of run time. Basically the rules of thumb to decide whether to optimize are
a) is it worth the effort (e.g. is what I get out of the optimization worth the time I am going to invest)
b) can the code be optimized? This is difficult, as many codes (far too many IMO) use poor choices of algorithms.
c) does the code need to scale in a particular problem without an exponential increase in required CPU time? This can be a simple case of localizing the processing to a subset of data, parallelizing the code to work on chunks at a time, or increasing the value of some variable sufficiently large such that others that depend upon it cause O(n**3) or worse factors to crop up.
d) is the code using too much memory? I know, there is never "too much memory". If you are concerned at all about performance, you want to avoid using swap/paging memory. If your algorithm is so expensive on memory that it forces you to swap, you can clearly demonstrate to dumbfounded CS gradstudents/profs everywhere that an O(n**2) algorithm can indeed scale/run faster than an O(n log n) algorithm for reasonable values of n. This bothers many people, but that might be because they don't understand a heirarchical memory system.
etc.
There are many reasons to optimize. Good design of code tends to be one of the best optimizations. You could do things like unroll loops by hand, but that gets to be a little painful if you have many loops. It is better to be cognizant of the hardware you will be running on, such as a cache based RISC/CISC machine, with 3 different memory latencies and bandwidths, and write your code with that in mind. Eventually the compilers will be smarter, but today you still need to adapt to the vagaries of the machine if you need real performance.
5 years ago no. Today yes. (Score:2)
In general, optimization is a waste (Score:1)
Better design == orders of magnitude improvement
Optimization == incremental improvement
This is merely a rule of thumb, and there are significant exceptions to this rule.
Re:Source code optimization is useful in some case (Score:1)
Optimizing for memory usage will buy you much more than instruction optimization. Don't use too much memory. To quote Seymour Cray, "Memory is like orgasms, it's much better when you don't have to fake it."
Remember there is a memory speed hierachy, registers -> cache -> physical memory -> virtual (paged memory). A lot of instructions can be executed while waiting for a page of memory to be swapped in. Also, in SMP environments do not discount the affects of memory invalidation. The cache can be your friend or your worst enemy. Be smart about memory layout and usage.
In laying out structures, place the most heavily used field first. This way accesses to this field are direct, i.e. *ptr instead of *(ptr+24). This can help tremendously in some usage models.
Place fields that are used together in the same cacheline. Also try to group fields that are read-only together.
Also, when writing code to handle errors/special cases, design the code to test for cases togther. For example, if COND1 and COND2 are rare, it is better to write:
as:
}In the normal case, only one test is performed to handle exceptions. Not two.
In C code, all too often people don't take advantage of the return codes of sprintf(). I've seen use of several consecutive strdup()'s. This is something that could be coded a lot more efficient with sprintf()
Get to know your compiler. Some compilers code the if clause of an if-then-else statement inline and jump out of line for the else portion, some do the opposite. Look at the assembly that you compiler puts out and code around that. Inline execution will help.
Using const can be a winner. Most compilers are smart enough to not save const variables on the stack. It also helps in interprocedural optimization, if your compiler supports it.
Make sure that your code is lint clean. This too can help with register allocation.
Get to know the pragmas you compiler supports. A lot of times you can provide hints to the compiler to help it optimize the code better.
There is the old rule of thumb that states that functions should be no more than one page of source code long. This is especially helpful when all else fails and you need to hand code things in assembly. Hand coding a function 500 lines long is a lot more difficult than one 50 lines long. Usually only a small portion of the function really needs the optimization anyway.
Measure. Measure. Measure. The minimum number of instructions does not always mean the faster execution time. Measure to make sure that your changes are actually improving the execution speed.
Also, when you find something that you feel needs hand coding, leave the high-level code in place behind ifdef's. This will allow porting to another architecture to be easier. Also, one platform may have different semantics that allow the code to run reasonably without hand coding.
All to often people jump in head first and start tweaking pieces of code that could be solved at a higher level. Hand coding an O(n**2) function in assembly is not going to change the fact that it is O(n**2).
The best piece of advice is to optimize the problem not the solution. I was once asked to look at optimizing a program that was taking too long to run. Simplified, it sorted a ten-million-line file then selected lines that matched certain criteria. The output was on the order of ten-thousand lines. Reversing the order of the operations, select then sort, speed the program up more significantly than any sorting algorithm ever could have.
For some codes it makes a major difference (Score:1)
For most applications however source level optimization is not necessary, either because processing speed is not the bottleneck, or because the code does not require a lot of time to run.
Re:Optimizing for size (Score:1)
Look at what resourseful programmers have achieved on the Amiga: Quake springs to mind straight away. Playable on a CPU which is substantially slower than the current generation of Wintel CPUs? You better believe it!
Cache reused values (Score:2)
Two experiences with a client this spring show just how much this can help. The client needed to do some basic GIS functions, so there were a *lot* of trig functions off of the same values related to the lat-lon to display mapping. The original code did not cache values which were constant for each mapping; this was costly since it involved about half of all trig calculations in the code! Caching those values cut the running time in half, and no compiler would know to extend a data structure to include those cached values.
The second optimization involved a modest change to the code so that the data was organized in a rough hierarchy - the 360*180 1-degree square cells were clustered in 3-degree supercells, 15-degree super(2)-cells, and 60-degree super(3)-cells. If the area of interest wasn't in the super(3) cell, I didn't bother to check any of the 1-degree cells within it. Again, no compiler would have caught this, and few algorithm books discuss n-ary trees of spatial data. If you squint, this is also a "cache" approach, although the cache is hardcoded via the nesting factors.
Put the two modifications together and the program ran about 2 orders of magnitude faster! That was so much faster that the client started to compute the values on the fly - it was "cheaper" than precomputing the information and maintaining *that* cache!
Re: dumbfounded grad students & profs (Score:2)
However, when all things are equal there's no doubt that many of us will recommend the more efficient algorithm *even when it is modestly slower for the task at hand*, but that's because experience shows that problems inevitably get larger and the the more efficient algorithm *will* be faster in the near future. That's why a new 500 MHz P-III running Windows 2000 is often *slower* than a 75 MHz 486 running Windows 3.1, your system may be approx. 10x faster, but the problems you're trying to solve are also 10x larger so the O(n^2) algorithms are much slower. (Even O(n lg (n)) is slightly slower, but by a smaller amount.)
Finally, never overlook the raw stupidity of most people who blindly dismiss the more efficient algorithms. Just last month I saw a bubble sort that had a comment stating that a "bubble sort was fast enough since the cost was in the swap, not the comparison." The author obviously never considered the fact that you would never do a comparison and *not* do the required swap. I replaced that code with a call to qsort() and cut the number of comparisons from 1,000,000 to 30k. The performance improvement came to something like 20 seconds.
I'm sure that there are some clueless profs who don't know how costly it is to hit the swap, but in my experience there are *far* more programmers who do a half-assed implementation of the faster algorithm, generally bitching the entire time about how their clueless boss doesn't understand the "real world," and they then complain that the faster algorithm is actually slower. Obviously I don't know if that happened to you, but I've seen it happen enough times that I always take such criticism with a grain of salt.
yep. please optimise by hand. (Score:1)
this works ok until the db grew to 50,000 records and several hundred tables for each group of records. on a sun sparcserver with 4 400MHz ultrasparc2 cpu's it took *5* *minutes* to return *1* request.
needless to say i had to write a 4 level 250 slot cache routine to return it to a sane time (under 1/2 second)...in other words kiddies - optimise manually dammit.
hmmm.. oh yeah. (Score:1)
squirt.
-r.
Yes and no... (Score:1)
where the emphasis is on writing READABLE code
so it can be debugged by someone with less than
perfect knowlege. On the other hand, we'd been
having problems with our scripts taking down the
ISP's server before we optimized them
have to say that optimization is situation
specific. Sometimes you will need it, sometimes
you won't.
yes it is always a good idea (Score:1)
send flames > /dev/null
there is more (Score:1)
How easy is it to replace your software (is it embedded vs. 'notepad').
Are other factors (power usage, lower hardware costs, possible extra features) of importance.
I've heard that commercial compilers (vs. university ones) are slow to incorporate new techniques. I've been told by my lecturer that for example global variables are not stored in registers. Well, that sounds pretty stupid to me (but who am I to judge). Maybe it is possible to compare these two sorts to figure out the hard parts of compiling. Have phun,
Ofcourse (Score:1)