Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
News

Is Source-Code Optimization Worthwhile? 33

nwetters asks: "I'd like my programs to run as fast as possible, and there are some great books (e.g. High Performance Computing) that give examples of loop optimizations and unit-stride memory use. However, I've heard that modern compilers will automatically optimize even the worst of my cludge. Should I bother with manual tweaking, or leave it up to the compiler? "
This discussion has been archived. No new comments can be posted.

Is Source-Code Optimization Worthwhile?

Comments Filter:
  • What type of code is it? Is it a huge computation intensive program that will benefit from minor optimizations, or ... is it some program to say ... display a calendar

    The type of problem to be solved plays an important role in optimization of the code.
  • by V. Mole ( 9567 ) on Tuesday November 23, 1999 @06:18PM (#1509153) Homepage
    I'm sure there are innumerable ways to screw the compiler. As far as I know, no optimizer yet written converts bubble sorts into quicksorts. That said, you shouldn't worry about optimizing your loops before you've designed you program and picked appropriate algorithms. My theory is to design and code primarily for correctness, reliability, flexibility and maintenance. If the result is fast enough/small enough, then I'm done. If it's too slow (or too big, or whatever), then put it under a profiler and start picking. But definitely don't waste hand optimizing code until you've profiled it to make sure you're optimizing the *right* code!

    That doesn't mean you should ignore stuff like memory stride and loop invariants, etc while coding, just because the compiler can fix it. For one thing, who knows what compiler/machine you're going to be using tomorrow? Code optimizers are stupid and conservative, and it doesn't take much to make one "give up" and go with "slow but correct". Another thing is that a lot of code that does confuse the optimizer is likely to confuse a future maintainer as well.
  • As V.Mole said, you can't expect a compiler to convert bubble sorts to quick sorts for you, so you certainly want to use appropriate algorithms, but you should not worry about things like precomputing common expressions or specifying what variables should be placed in registers: the compiler is, invariably, going to do a better job of sorting out these minor optimizations than you will, and it will be more portable. (the kind of optimizations that work well on a PowerPC or Alpha won't be so good on an x86 and vice versa)

    As for some of the more esoteric optimizations, like loop unrolling, you may or may not get a compiler that can do good loop unrolling, so it may pay to unroll loops manually. Still, for the first attempt, you should probably write the loops without unrolling them so that you can think about the rest of the program, rather than getting caught up in minutia.

    I routinely perform some small optimizations that rely on very large scale knowledge of what the code does and how it will be used. For example, I have an AVL tree library that caches the result of the last search and checks the cache variable before each lookup. That kind of optimization couldn't easily be done by the compiler (since it optimizes accross calls to different functions and it relies on what the code is being used for by the caller) but could have a big impact on code that might perform multiple lookups on the same key in sequence. (of course I would be better served by a splay tree in this circumstance, but that's beside the point here)

    The kind of optimizations that it really pays to do are more along the lines of smart coding and good design than they are like traditional code optimization. In general, if you make your code readable and don't use stupid algorithms you can rely on the compiler to fix the small stuff.
  • Unless you're John Carmack, the compiler is substantially smarter than you are, and is trained, tweaked, and optimised to deal with GOOD coding practice.

    So second for second, you're better off writing VB programs on contract and using the money to buy a faster processor than you are trying to figure out how to reimplement printf in assembler...
  • For must stuff, I've really got to say it just doesn't matter. For instance, anyone who tries to optimize GUI code for speed should be repeatedly hit on the head. OTOH, some areas still require good optimiztation: graphics (I mean the low-level libs here), real time stuff, crypto, numerical work, things like that. However, most of that is not getting fast code, but fast algorithms, and finding "tricks" for doing those algorithms even faster. That is to say, worry about high level, first order effeciency, and let the compiler handle the hard stuff (like pipelining).

    But for the most part, it's best to just write clean, readable code. It doesn't matter that the code is fast if it can't be read, since someone will just have to redo it when it needs changes. Also it usually helps the compiler optimize more if it can figure out what's going on. For instance, the KAI C++ compiler has a very very good optimizer (my code runs 2-3 times as fast over gcc 2.95.2 on the same machine), and the docs recommend you write plain, readable code so the optimizer can see what's going on. One time I replaced some code with inline asm, and it got significantly slower. I took it back out. :)

    Just my 2% of a $.

  • by blahedo ( 24332 ) on Tuesday November 23, 1999 @09:42PM (#1509157) Homepage

    Sure, go ahead! But before you do, you should do a few other things:

    • First, tidy up your code. Fix the indentations, elabourate the variable names, make it generally more readable.
    • Then, comment it well.
    • That done, write out your maintainer-level documentation, which includes all of your high-level algorithms.
    • Verify that these algorithms are correct.
    • Check the complexity of these algorithms. The real place to shave off time is if you can lower your running time by an O(n) or two.
    • Once you've fixed your algorithms, update your documentation.
    • And now, after you've done every other improvement you can think of... save a copy of your code.
    • Run a profiler on your code.
    • If you really still feel like it's too slow, then go ahead and optimise it in the places the profiler picks out as taking a lot of time.
    The upshot is, optimising is never a bad thing. It's just an extremely low priority thing. I can't tell you how many of my students I've seen performing low-level optimisations on O(n^3) loops that could have been reduced to O(n lg n) or less. Don't waste your time optimising in the wrong places, and really, make sure you've done everything else first....
  • Actually, GUI code is something that deserves optimization... consider how annoying a slow GUI can be, for example doing 'make menuconfig' on an older PC. GUI responsiveness is a major factor in user perception of the speed of a computer, and a lot of GUIs I've had to use wouldn't have been hurt by a bit of performance tuning. That said, other things still come first, of course....
  • I can think of at least one optimisation that an optimising compiler isn't likely to be able to make - namely, processing a large data set split into-self contained chunks that are small enough to fit into the cache memory of whichever system you're using. ie, if you have 512kb of L2, if you can stick to working on a total code + data size of less than 512k in one go, you should get a performance win, especially with SMP.
  • It all depends on the compiler. Most compilers push and pop all registers to/from the stack when executing inline assembler code. So you get a HUGE performance hit if you don't do most of the looping inside the assembler code.

    I remember Watcom C++, a Windows/DOS compiler, supporting pure inline assembly by providing them as inline functions. Thus instead of using an inline C++ function, you could write an inline assembly function, callable by C. The programmer was also responsible himself to save and restore the proper state of registers. So nothing was lost, everything was gained. Although, you had to know what you were doing.

    Despite Watcom C++ being superior in all areas, it was still beaten by MS Visual C++ in "fair and square" marketing.

    - Steeltoe
  • In general, I'd say no, it's not worth trying to tweak source code to optimize it.

    In fact, some famous tweaks can make things worse: the (in)famous "Duff's Device" creates what's called an "irreducible flow graph" (a pathological sort of loop) that makes a lot of optimizations impossible--the best a compiler can do is to replicate some of the code to make the flow graph reducible, thus undoing the supposed benefits. (In Duff's defense, the device was the Right Thing to do if he had to use a compiler that didn't do data flow analysis.)

  • I would have to agree with most people here that optimization should be a secondary concern after correctness, readability, etc. The thing I wanted to stress is how important it is to profile before you optimize. I've read in numerous places that 80% of exectution time is spent in 20% of code, so it REALLY doesn't make sense to spent lots of time optimizing all of your code. Find out what the important 20% is, and concentrate on that. Also, something that compilers can't do is decide on container sizes. For example, if you have a hash table that you know is going to contain 6000 items, don't make it have only hash values (I had a real problem with this once. Increasing the hash table size to 2000 made the algorithm run 200% faster, stupid RogueWave with default hash table size of 64). Similarly if you are using vectors, strings, etc, that you know will have many items, don't construct them to have a size of one, becuase you will spend all of your time allocating memory, and copying the contents of your container.
    >~~~~~~~~~~~~~~~~
  • by Anonymous Coward
    I have written a lot of code for imbedded systems,and I know that you can do a lot to make your code better than the compiler might. The best training is to compare your source code to what the compiler generates. You can get answers to questions like this; When is "switch" better than "if". How much difference does the type make for switch etc. What affect does the order of math have on the code size. Whats the real difference between while and until? Lots of useful stuff. Compulsive? Yes. But then thats what makes good coders.
  • I wouldn't bother, make the code readable and obvious.

    I used to do source code optimization when C compilers were not good at optimization and the code could run much faster if the source code was tweaked to effectively use registers and avoid slow address computations.

    Modern compilers have much better optimizers and it has become more difficult to optimize code for the CPUs of today. Counting cycles in a program is much more difficult than it used to be. When you have families of processors, the optimal code for one processor may be slow on another.

    I used to use pointers whenever possible since experience had taught me that array references were always slower than pointer references. Then I discovered that array references were substantially faster than pointer references on some code that I was writing for a Pentium. The code was cleaner, easier to understand, smaller and faster when I used arrays.

    If you look at the output of a good compiler, you will often be surprised at the optimization tricks that you would have never thought of.

  • Obviously there are groups of people for whom setting -O3 is all the optimization they will ever need. That is fine for them. But not for me.

    My codes run for weeks/months at a time. 5% can mean a few extra days of run time. Basically the rules of thumb to decide whether to optimize are

    a) is it worth the effort (e.g. is what I get out of the optimization worth the time I am going to invest)

    b) can the code be optimized? This is difficult, as many codes (far too many IMO) use poor choices of algorithms.

    c) does the code need to scale in a particular problem without an exponential increase in required CPU time? This can be a simple case of localizing the processing to a subset of data, parallelizing the code to work on chunks at a time, or increasing the value of some variable sufficiently large such that others that depend upon it cause O(n**3) or worse factors to crop up.

    d) is the code using too much memory? I know, there is never "too much memory". If you are concerned at all about performance, you want to avoid using swap/paging memory. If your algorithm is so expensive on memory that it forces you to swap, you can clearly demonstrate to dumbfounded CS gradstudents/profs everywhere that an O(n**2) algorithm can indeed scale/run faster than an O(n log n) algorithm for reasonable values of n. This bothers many people, but that might be because they don't understand a heirarchical memory system.

    etc.

    There are many reasons to optimize. Good design of code tends to be one of the best optimizations. You could do things like unroll loops by hand, but that gets to be a little painful if you have many loops. It is better to be cognizant of the hardware you will be running on, such as a cache based RISC/CISC machine, with 3 different memory latencies and bandwidths, and write your code with that in mind. Eventually the compilers will be smarter, but today you still need to adapt to the vagaries of the machine if you need real performance.
  • 5 years ago the standard prodecure was to recompile your program for the new Pentium architecture and instantly get a 10x speed boost. Today the standard prodecure is assembly language so it depends on where we are in the punctuated evolution. After the next recession they'll work their assess off to speed up processors and we'll once again benefit from recompiling more than optimizing. If you're doing database and web applications don't bother optimizing but write a game instead.
  • In my experience, any optimizations that made real differences (i.e. orders of magnitude of improvement) are more likely attributed to improvements in the basic design of the algorithm. In other words, these sorts of "optimizations" are actually improvement in design. Generally, optimizations tend to offer no more than a 20% or 30% improvement--at most, perhaps a 100% improvement. The time invested during the optimization process would be better spent in the design stage; the money invested during the process would be spent on additional hardware.

    Better design == orders of magnitude improvement
    Optimization == incremental improvement

    This is merely a rule of thumb, and there are significant exceptions to this rule.
  • Optimizing for memory usage will buy you much more than instruction optimization. Don't use too much memory. To quote Seymour Cray, "Memory is like orgasms, it's much better when you don't have to fake it."

    Remember there is a memory speed hierachy, registers -> cache -> physical memory -> virtual (paged memory). A lot of instructions can be executed while waiting for a page of memory to be swapped in. Also, in SMP environments do not discount the affects of memory invalidation. The cache can be your friend or your worst enemy. Be smart about memory layout and usage.

    In laying out structures, place the most heavily used field first. This way accesses to this field are direct, i.e. *ptr instead of *(ptr+24). This can help tremendously in some usage models.

    Place fields that are used together in the same cacheline. Also try to group fields that are read-only together.

    Also, when writing code to handle errors/special cases, design the code to test for cases togther. For example, if COND1 and COND2 are rare, it is better to write:

    if (flag & COND1) {

    handle_cond1();
    }
    if (flag & COND2) {
    handle_cond2();
    }

    as:

    if (flag & (COND1|COND2)) {
    if (flag & COND1) { handle_cond1();
    }
    if (flag & COND2) {
    handle_cond2();
    }
    }

    In the normal case, only one test is performed to handle exceptions. Not two.

    In C code, all too often people don't take advantage of the return codes of sprintf(). I've seen use of several consecutive strdup()'s. This is something that could be coded a lot more efficient with sprintf()

    char buf[SIZE];

    char *pc;

    pc = buf;
    pc += sprintf(pc, "%s", str1);

    /* other stuff here */

    pc += sprintf(pc, "%s", str2);

    Get to know your compiler. Some compilers code the if clause of an if-then-else statement inline and jump out of line for the else portion, some do the opposite. Look at the assembly that you compiler puts out and code around that. Inline execution will help.

    Using const can be a winner. Most compilers are smart enough to not save const variables on the stack. It also helps in interprocedural optimization, if your compiler supports it.

    Make sure that your code is lint clean. This too can help with register allocation.

    Get to know the pragmas you compiler supports. A lot of times you can provide hints to the compiler to help it optimize the code better.

    There is the old rule of thumb that states that functions should be no more than one page of source code long. This is especially helpful when all else fails and you need to hand code things in assembly. Hand coding a function 500 lines long is a lot more difficult than one 50 lines long. Usually only a small portion of the function really needs the optimization anyway.

    Measure. Measure. Measure. The minimum number of instructions does not always mean the faster execution time. Measure to make sure that your changes are actually improving the execution speed.

    Also, when you find something that you feel needs hand coding, leave the high-level code in place behind ifdef's. This will allow porting to another architecture to be easier. Also, one platform may have different semantics that allow the code to run reasonably without hand coding.

    All to often people jump in head first and start tweaking pieces of code that could be solved at a higher level. Hand coding an O(n**2) function in assembly is not going to change the fact that it is O(n**2).

    The best piece of advice is to optimize the problem not the solution. I was once asked to look at optimizing a program that was taking too long to run. Simplified, it sorted a ten-million-line file then selected lines that matched certain criteria. The output was on the order of ten-thousand lines. Reversing the order of the operations, select then sort, speed the program up more significantly than any sorting algorithm ever could have.

  • For most programs source level optimization does not make a major difference, but for some problems it is an absolute must. I have hand tweaked numerical codes and observed improvements of 500+ percent, just by interchanging loops, intelligent dereferencing, linear memory access and other minor tweaks . Yes, that's more than 5 times faster - and thats a lot. Think of reducing the running time from 1h to 10min.

    For most applications however source level optimization is not necessary, either because processing speed is not the bottleneck, or because the code does not require a lot of time to run.

  • So just because 90% of ppl have more horsepower in their boxes than they know what to do with it means that programmers can be lax in optimization? I don't think so.
    Look at what resourseful programmers have achieved on the Amiga: Quake springs to mind straight away. Playable on a CPU which is substantially slower than the current generation of Wintel CPUs? You better believe it!
  • Besides the general rules (use better algorithms, use profilers to find slow code and test-Test-TEST!), something I've found very useful is caching reused values in low-level functions.

    Two experiences with a client this spring show just how much this can help. The client needed to do some basic GIS functions, so there were a *lot* of trig functions off of the same values related to the lat-lon to display mapping. The original code did not cache values which were constant for each mapping; this was costly since it involved about half of all trig calculations in the code! Caching those values cut the running time in half, and no compiler would know to extend a data structure to include those cached values.

    The second optimization involved a modest change to the code so that the data was organized in a rough hierarchy - the 360*180 1-degree square cells were clustered in 3-degree supercells, 15-degree super(2)-cells, and 60-degree super(3)-cells. If the area of interest wasn't in the super(3) cell, I didn't bother to check any of the 1-degree cells within it. Again, no compiler would have caught this, and few algorithm books discuss n-ary trees of spatial data. If you squint, this is also a "cache" approach, although the cache is hardcoded via the nesting factors.

    Put the two modifications together and the program ran about 2 orders of magnitude faster! That was so much faster that the client started to compute the values on the fly - it was "cheaper" than precomputing the information and maintaining *that* cache!

  • No competent grad student or professor will be "dumbfounded" that an O(n lg(n)) algorithm that hits the swap will be slower than an O(n^2) algorithm that does not. They *will* ask you why the first algorithm is so much more memory expensive, though, since most such algorithms only require a modest amount of additional memory. A fast algorithm, poorly implemented, will still run slowly.

    However, when all things are equal there's no doubt that many of us will recommend the more efficient algorithm *even when it is modestly slower for the task at hand*, but that's because experience shows that problems inevitably get larger and the the more efficient algorithm *will* be faster in the near future. That's why a new 500 MHz P-III running Windows 2000 is often *slower* than a 75 MHz 486 running Windows 3.1, your system may be approx. 10x faster, but the problems you're trying to solve are also 10x larger so the O(n^2) algorithms are much slower. (Even O(n lg (n)) is slightly slower, but by a smaller amount.)

    Finally, never overlook the raw stupidity of most people who blindly dismiss the more efficient algorithms. Just last month I saw a bubble sort that had a comment stating that a "bubble sort was fast enough since the cost was in the swap, not the comparison." The author obviously never considered the fact that you would never do a comparison and *not* do the required swap. I replaced that code with a call to qsort() and cut the number of comparisons from 1,000,000 to 30k. The performance improvement came to something like 20 seconds.

    I'm sure that there are some clueless profs who don't know how costly it is to hit the swap, but in my experience there are *far* more programmers who do a half-assed implementation of the faster algorithm, generally bitching the entire time about how their clueless boss doesn't understand the "real world," and they then complain that the faster algorithm is actually slower. Obviously I don't know if that happened to you, but I've seen it happen enough times that I always take such criticism with a grain of salt.
  • i wrote a small search routine for an oracle db that ran pretty quick on a small database (400 records). because i anted to abstract the db, the routine was : read request -> select table_name from all_tables -> search for correct table thru the whole list -> generate select statement -> return read info.
    this works ok until the db grew to 50,000 records and several hundred tables for each group of records. on a sun sparcserver with 4 400MHz ultrasparc2 cpu's it took *5* *minutes* to return *1* request.
    needless to say i had to write a 4 level 250 slot cache routine to return it to a sane time (under 1/2 second)...in other words kiddies - optimise manually dammit.
  • all this big O talk is making me randy...
    squirt.
    -r.
  • I work as a PERL programmer for a small company,
    where the emphasis is on writing READABLE code
    so it can be debugged by someone with less than
    perfect knowlege. On the other hand, we'd been
    having problems with our scripts taking down the
    ISP's server before we optimized them :) So I'd
    have to say that optimization is situation
    specific. Sometimes you will need it, sometimes
    you won't.
  • As one person said clean up your code first. Make it readable, and well documented. Then go ahead and if you have the time optimize it. The problem today that there are so many slow crappy applications out there is that people say let the compiler do th ejob of optimizing, and do not bother optimizing at all. A cluge here and a cludge there add up, especially when you start getting into more than 5000 lines of code. If optimizing your code can shave off 20 lines of code each time you optimize and you optimize 5 times then you have shaved off 1000 lines of code. You can then use that 1000 lines of code to add features. A good design is usually better than optimizing though. If you write your detailed design with pseudo code in it you can check the pseudo code before even codeing to make sure the algorythms are good. Then you will not even need to optimize. I have seen people write the crappiest of code, that after being optimized it went from 72 days of runtime on the dataset to 2 hours. Besides how much do you know about how your compiler and its optimizations?

    send flames > /dev/null

  • The above post mention cleaning up code first, then maybe do, maybe do not consider optimizing. The considerations I would make are:

    How easy is it to replace your software (is it embedded vs. 'notepad').

    Are other factors (power usage, lower hardware costs, possible extra features) of importance.

    I've heard that commercial compilers (vs. university ones) are slow to incorporate new techniques. I've been told by my lecturer that for example global variables are not stored in registers. Well, that sounds pretty stupid to me (but who am I to judge). Maybe it is possible to compare these two sorts to figure out the hard parts of compiling. Have phun,
  • When it comes faster and faster CPUs and systems i dont think its very smart to start wasting power on bugs and loops. The presumption that it always come faster computers so you dont have to optimize isnt all true, a slow program is never a good program. Easy math also tells us that the more code the more likely will there be bugs. Look at windows, if it was really tweaked i cant even imagine how it could be that slow and buggy. Then again, it depends very much on what compiler and what level of language you use. If you use JAVA or VB its probably a good thing to really check the code a couple of times extra. A cycle is a terrible thing to waste!

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...