Please create an account to participate in the Slashdot moderation system


Forgot your password?
Programming Technology

Optimizations - Programmer vs. Compiler? 1422

Saravana Kannan asks: "I have been coding in C for a while (10 yrs or so) and tend to use short code snippets. As a simple example, take 'if (!ptr)' instead of 'if (ptr==NULL)'. The reason someone might use the former code snippet is because they believe it would result in smaller machine code if the compiler does not do optimizations or is not smart enough to optimize the particular code snippet. IMHO the latter code snippet is clearer than the former, and I would use it in my code if I know for sure that the compiler will optimize it and produce machine code equivalent to the former code snippet. The previous example was easy. What about code that is more complex? Now that compilers have matured over years and have had many improvements, I ask the Slashdot crowd, what they believe the compiler can be trusted to optimize and what must be hand optimized?"
"How would your answer differ (in terms of the level of trust on the compiler) if I'm talking about compilers for Desktops vs. Embedded systems? Compilers for which of the following platforms do you think is more optimized at present - Desktops (because is more commonly used) or Embedded systems (because of need for maximum optimization)? Would be better if you could stick to free (as in beer) and Open Source compilers. Give examples of code optimizations that you think the compiler can/can't be trusted to do."
This discussion has been archived. No new comments can be posted.

Optimizations - Programmer vs. Compiler?

Comments Filter:
  • by ( 156602 ) * on Friday February 25, 2005 @04:48PM (#11781199) Homepage Journal
    Programmer: Hey, compiler. How do you like optimizing?
    Compiler: Optimizing? Optimizing? Don't talk to me about optimizing. Here I am, brain the size of a planet, and they've got me optimizing inane snippets of code. Just when you think code couldn't possibly get any worse, it suddenly does. Oh look, a null pointer. I suppose you'll want to see the assembly now. Do you want me to go into an infinite loop or throw an exception right where I'm standing?
    Programmer: Yeah, just show me the stack trace, won't you compiler?
  • Clear Code (Score:5, Insightful)

    by elysian1 ( 533581 ) on Friday February 25, 2005 @04:48PM (#11781211)
    I think writing clear and easy to understand code is more important in the long run, especially if other people will have to look at it.
    • Re:Clear Code (Score:5, Insightful)

      by normal_guy ( 676813 ) on Friday February 25, 2005 @04:50PM (#11781228)
      That should be "especially _since_ other people will have to look at it."
    • Re:Clear Code (Score:5, Insightful)

      by daveho ( 235543 ) on Friday February 25, 2005 @04:52PM (#11781262)
      I agree 100%. Write code that is easy to understand and modify, then optimize it, but only after you have profiled it to find out where optimization will actually matter .
      • by smittyoneeach ( 243267 ) * on Friday February 25, 2005 @05:18PM (#11781693) Homepage Journal
        You mean that we should not strain after a redundant temporary object gnat and swallow a network socket camel?
        If you are caught thinking out of the box again, you will get no dessert!
      • Re:Clear Code (Score:5, Insightful)

        by Rei ( 128717 ) on Friday February 25, 2005 @05:56PM (#11782241) Homepage
        An important lesson that I wish I had learned when I was younger ;) It is crazy to start optimizing before you know where your bottlenecks are. Don't guess - run a profiler. It's not hard, and you'll likely get some big surprises.

        Another thing to remember is this: the compiler isn't stupid; don't pretend that it is. I had senior developers at an earlier job mad at me because I wasn't creating temporary variables for the limits of my loop indices (on unprofiled code, nonetheless!). It took actually digging up an article on the net to show that all modern compilers automatically dereference any const references (be they arrays, linked lists, const object functions, etc) before starting the loop.

        Another example: function calls. I've heard some people be insistant that the way to speed up an inner loop is to remove the code from function calls so that you don't have function call overhead. No! Again, compilers will do this for you. As compilers were evolving, they added the "inline" keyword, which does this for you. Eventually, the compilers got smart enough that they started inlining code on their own when not specified and not inlining it when coders told it to be inline if it would be inefficient. Due to coder pressure, at least one compiler that I read about had an "inlinedamnit" (or something to that effect) keyword to force inlining when you're positive that you know better than the compiler ;)

        Once again, the compiler isn't stupid. If an optimization seems "obvious" to you, odds are pretty good that the compiler will take care of it. Go for the non-obvious optimizations. Can you remove a loop from a nested set of loops by changing how you're representing your data? Can you replace a hack that you made with standard library code (which tends to be optimized like crazy)? Etc. Don't start dereferencing variables, removing the code from function calls, or things like this. The compiler will do this for you.

        If possible, work with the compiler to help it. Use "restrict". Use "const". Give it whatever clues you can.
        • by r00t ( 33219 ) on Friday February 25, 2005 @06:38PM (#11782697) Journal
          If by "const references" you really mean a
          C++ reference, OK. If you mean a pointer though,
          and you use C, the compiler is prohibited from
          performing this optimization unless you also use
          the "restrict" keyword.

          Since you did mention "restrict", it appears that
          you are working with C. "restrict" is not a C++

          BTW, inlinedamnit is __attribute__((__alwaysinline__))
          for gcc and __forceinline for Microsoft.

        • Re:Clear Code (Score:5, Insightful)

          by lubricated ( 49106 ) <.moc.liamg. .ta. .plahcim.> on Friday February 25, 2005 @06:48PM (#11782780)
          well the first thing to optimize is the algorithm. Use a O(n^2) algorithm that does the same job as an O(e^n) algorithm if you can. Algorithmical optimization makes the most difference. I am working on a program who's speed is directly proportional to how how often a particular function is called. Well, I try to reduce calls to this function by various means, no compiler I've sean can optimize an algorithm, only the implementation of it. With that I'm happy to have the compiler do the work.
    • Re:Clear Code (Score:3, Interesting)

      by Tr0mBoNe- ( 708581 )
      I agree. I write my code under the assumption (rightly so) that my code will be reviewed by many people in the near future, and then added onto after a bit.

      In fact, working at Research In Motion has shown me how to write better code... sometimes, the most efficient or clever solution are the worst. I would rather see longer variable names, descriptive control structures and a total lack of goto statements. If you write sensible code, and use an optimization setting of 4 or 5, then you will have better prog
    • Re:Clear Code (Score:3, Insightful)

      by pz ( 113803 )
      This is absolutely true. Even if you are the only programmer who will ever look at the code. 10 years from now, when you're called on to fix a bug in something you wrote, you'll be extra glad that you took the time to write clearly, and comment liberally. Anyone else who comes across your code will thank you as well. And I'm speaking with nearly 30 years' experience as a programmer (and two CS degrees from MIT).

      In particular, unless you have very specific efficiency needs, modern CPUs are more than up
    • Re:Clear Code (Score:3, Insightful)

      by Anonymous Coward
      Naturally. However, the example is retarded. I use the simpler form precisely because it's clearer and more expressive.

      "if (!ptr)" translates perfectly clear into english as "if no (valid) pointer" while "if (ptr==NULL)" involves some spurious special case value that I need to spend extra tinkering with.

      It's like comparing booleans with "if (foo==true)" instead of "if (foo)". If that's better why not go all the way and write "if (((...((foo==true)==true)==true)...==true)==true) " ? For extra clarity you s
      • by GunFodder ( 208805 ) on Friday February 25, 2005 @07:18PM (#11783079)
        I think the example is fine; you just displayed an assumption that highlights one of the quirks of C.

        ! means "not" or "inverse of"; it is a boolean function. The variable ptr is a pointer; it is a reference to data, which means it isn't really data itself. !ptr shouldn't compute; a boolean operator should only work on boolean data. But C logical comparators are designed to work on everything. You are just supposed to know that 0 == NULL == false. This supposition is totally arbitrary and doesn't hold up in any language with strong typing.

        This is what makes C difficult for beginners. Bad code compiles even though it has logical flaws, and ends up failing in mysterious ways.

        The second case makes more sense. Equality is an operator that should work on all types of data. NULL is necessary if you are going to abstract data through the use of pointers or objects. Doing away with NULL would be equivalent to eliminating true and false and using 1 and 0 instead. Or eliminating strings and using sequences of ASCII codes. These substitutions are technically correct but in reality they make code unreadable.
    • Re:Clear Code (Score:5, Insightful)

      by DJStealth ( 103231 ) on Friday February 25, 2005 @06:05PM (#11782359)
      Take the following example that is clear, but only 1 is considered optimized.

      Lets say you're traversing a 2D array of data (e.g., an image).

      for(x=0; x < width; x++)
      for(y=0; y < height; y++)


      for(y=0; y < height; y++)
      for(x=0; x < width; x++)

      The latter piece of code is just as clear as the first; however, will likely run about 50 times faster than the first, due to caching issues.

      Will the compiler optimize the first piece of code to look like the second? Probably not (tell me if I'm wrong), as there may be a reason to process things in a particular order.

      In addition, the latter piece of code may actually be less clear, as in some cases, it may not read well to do height before width in the for loop.

      As a result, you'll still need to write code thinking about optimization.
  • by Anonymous Coward on Friday February 25, 2005 @04:49PM (#11781218)
    Optimize. Using cryptic, short variable names also shaves valuable microseconds off compile time and run time.
    • by FyRE666 ( 263011 ) * on Friday February 25, 2005 @04:58PM (#11781366) Homepage
      ... and by god don't let me see anyone using comments - comments are the devil's alphabet soup! Every programmer worth his/her salt knows that source code is self documenting...
    • Found on The Daily WTF []

      $result = mysql_query("SELECT a, b, c, d, e, f, g, h, i, j, k, l,
      m, n, o, p, q, r, s, t, u, v, w, x, y, z, a2, b2, c2, d2, e2, f2, g2,
      h2, i2, j2, k2, l2, m2, n2, o2, p2, q2, r2, s2, t2, u2, v2, w2, x2, y2, z2,
      a3, b3, c3, d3, e3, f3, g3, h3, i3, j3, k3, l3, m3, n3, o3, p3, q3, r3, s3,
      t3, u3, v3, w3, x3, y3, z3, a4, b4, c4, d4, e4, f4, g4, h4, i4, j4, k4, l4,
      m4, n4, o4, p4, q4, r4, s4, t4, u4, v4, w4, x4, y4,

    • by ron_ivi ( 607351 ) <(sdotno) (at) (> on Friday February 25, 2005 @05:21PM (#11781738)
      Using cryptic, short variable names also shaves valuable microseconds off compile time and run time.

      There's nothing wrong with short variable names.

      Thus quoth Linus in the Linux Coding Style guide.

      Chapter 3: Naming

      C is a Spartan language, and so should your naming be. Unlike Modula-2
      and Pascal programmers, C programmers do not use cute names like
      ThisVariableIsATemporaryCounter. A C programmer would call that
      variable "tmp", which is much easier to write, and not the least more
      difficult to understand.

      HOWEVER, while mixed-case names are frowned upon, descriptive names for
      global variables are a must. To call a global function "foo" is a
      shooting offense.

      GLOBAL variables (to be used only if you _really_ need them) need to
      have descriptive names, as do global functions. If you have a function
      that counts the number of active users, you should call that
      "count_active_users()" or similar, you should _not_ call it "cntusr()".

      Encoding the type of a function into the name (so-called Hungarian
      notation) is brain damaged - the compiler knows the types anyway and can
      check those, and it only confuses the programmer. No wonder MicroSoft
      makes buggy programs.

      LOCAL variable names should be short, and to the point. If you have
      some random integer loop counter, it should probably be called "i".
      Calling it "loop_counter" is non-productive, if there is no chance of it
      being mis-understood. Similarly, "tmp" can be just about any type of
      variable that is used to hold a temporary value.

      If you are afraid to mix up your local variable names, you have another
      problem, which is called the function-growth-hormone-imbalance syndrome.
      See next chapter.
      • by rjstanford ( 69735 ) on Friday February 25, 2005 @05:56PM (#11782249) Homepage Journal
        LOCAL variable names should be short, and to the point. If you have
        some random integer loop counter, it should probably be called "i".
        Calling it "loop_counter" is non-productive, if there is no chance of it
        being mis-understood.

        That last clause is an important one that often gets neglected. In fact, you should never, ever, call a variable loop_counter. That's as bad as pure reverse hungarian - it tells you how its used, not what it means.

        I suggest that, for all non-trivial cases (and I'd prefer to see people err verbosely than compactly), you should use descriptive names. Not loop_counter, but maybe something like curRow? It doesn't have to be long, but at least then as the loop grows over time someone can understand a piece of code more easily than having to scroll back up to check that you are indeed in the "i" loop. Its even more critical when someone comes along and adds a nested (or containing) loop. Or whatever.

        Same with "tmp". If its truly temporary, such as:

        int tmp = getFooCount();

        then it should be removed and rewritten as:


        If its not that temporary, give it a real name. If you insist that it is temporary then you may have a scoping issue - having variables useful but only in part of your function could indicate that your function is doing too much work. If you insist its truly temporary, scope it down: ...
        int foo = getFoo();
        moreCode(); ...

        At least now you've guaranteed that it is temporary. Better yet, just name it usefully.
      • by Trillan ( 597339 ) on Friday February 25, 2005 @06:10PM (#11782409) Homepage Journal

        With the greatest respect to Linus, but writing a kernel does not make you the authority on programming. It does make you the authority on what particular style you allow in your CVS tree, but that's it.

        I certainly agree that loop_counter is a bad name, though. But rather than use i, I prefer to at least make a note of what sort of objects I'm looping through.

        For instance:

        int taskI;
        int taskCount = GetTaskCount();
        for (taskI=0; taskI<taskCount; taskI++)

        Code can never be 100% self documenting, but that's no reason not to settle for 0%. Whether you use CamelCase or words_broken_with_underscores is a matter of style, and you should stick with the style of the code base you're working on.

        Anyone who can't or won't work with multiple languages or adopt the necessary style for an existing project is a poor programmer. When you create project, you create the rules. When you work on someone else's project, you follow the rules.

  • by xlv ( 125699 ) on Friday February 25, 2005 @04:50PM (#11781223)
    Donald Knuth wrote "We should forget about small efficiencies, about 97% of the time. Premature optimization is the root of all evil."

  • by American AC in Paris ( 230456 ) * on Friday February 25, 2005 @04:50PM (#11781229) Homepage
    This is marginally away from the submitter's question, but it warrnats attention:

    The sad truth is that, as far as optimization goes, this isn't where attention is most needed.

    Before we start worrying about things like saving two cycles here and there, we need to start teaching people how to select the proper algorithm for the task at hand.

    There are too many programmers who spend hours turning their code into unreadable mush for the sake of squeezing a few milliseconds out of a loop that runs on the order of O(n!) or O(2^n).

    For 99% of the coders out there, all that needs to be known about code optimization is: pick the right algorithms! Couple this with readable code, and you'll have a program that runs several thousand times faster than it'll ever need to and is easy to maintain--and that's probably all you'll ever need.

    • by flynt ( 248848 ) on Friday February 25, 2005 @05:01PM (#11781414)
      But this would require people to actually get computer science degrees, or have enough self-motivation to read books on algorithms and do the excercises. For most, that's too much to ask, since they cannot see how to apply the theory they learn in school to practice. The ones that can apply the theory are the good programmers. The ones that can't or never learned the theory in the first place probably aren't.
    • I agree with you to a point. Do not try to squeeze a couple cycles here and there. Pick the right algorithm, but realize that n vs. n^2 vs. 2^n is not a big deal in 99% of applications. Code it so it works first. Make it go. Then, if it sucks performance-wise, replace those slow algorithms. I've seen too many people, myself included, who try to ensure that they've picked the perfect algorithm for the job, but take 4 times longer to design and write the code. Write the damn thing first. Keep it reada
      • by beelsebob ( 529313 ) on Friday February 25, 2005 @06:50PM (#11782797)
        That's not really the point being made though - the question really is "is it worth spending the next week trying to make optimisations, or coding a better algorithm to do this."

        And while we're at it - n^2 vs 2^n *is* a big deal when working on 1ghz+ systems... If we have 1000 objects, an operation that takes 10 cycles and an n^2 algorithm, then we get a runtime of 0.01 seconds (10 x 1000 x 1000 cycles ), if we have a 2^n algorithm then we get a runtime in the hours, and no amount of optimizing the code in the loop (even down to one instruction) is going to get us anywhere near the n^2 algorithm.

    • by Flyboy Connor ( 741764 ) on Friday February 25, 2005 @05:16PM (#11781664)
      Quite agree, it's about the algorithm, not about the code.

      One of the finest moments in my programming career was when my boss asked me to see if I could gain a speed improvement in a program that surveyed a huge datastore and generated volumes of text from it. This program had to run once a month, and deliver its result in the same month. The program that was originally written, unfortunately, took three months to run (it started out OK, but the data store had grown considerably). They had asked one of our "best programmers" to create a faster version of the program. He did that by reprogramming the entire thing in assembly (you may now understand why managers thought he was one of the best programmers). It took him six whole months to finish the new version. The resulting program completed the task in just about one month. However, my boss was afraid that when the datastore would grow a bit more, we would again be in trouble. That's when he asked me to look it over. I started by investigating the problem, which at first glance looked like a network traversing problem. I soon realised it could be solved by a nested matrix multiplication (which is, of course, a standard way to discover paths in a network). It was a matrix with about a million rows and columns, but since it contained only zeroes and ones (with a couple of thousands times more zeroes than ones), the multiplication was easy to implement in a fast way. Within half-a-day, I had built a prototype program in a high-level language which did the whole job in a few hours.

      While I am still pleased with this result, I really think it came off so well not because I was so smart, but because the assembly programmer was not really worthy of the name. Still, I often use this as an illustration for students who are writing illegible code and argue that it is so very fast.

  • Clear & Concise Code (Score:5, Interesting)

    by kwiqsilver ( 585008 ) on Friday February 25, 2005 @04:52PM (#11781263)
    It's better to write clear, legible code that saves a human minutes of reading, than complex code that might save a computer a few milliseconds of processing time per year, because human time costs more than machine time.
    Also the clear code will result in fewer misinterpretations, which will mean fewer bugs (especially when the original author is not the one doing maintenance years later), further reducing costs in dollars, man hours, and frustration.
  • NULL not always 0 (Score:3, Insightful)

    by leomekenkamp ( 566309 ) on Friday February 25, 2005 @04:53PM (#11781266)

    Aren't there machines out there where the C compiler specifically defines NULL as value that is not equal to 0? I recall reading that somewhere, and that was my reason for using ==NULL instead of !. My C days are long gone though...

    • Re:NULL not always 0 (Score:4, Informative)

      by HeghmoH ( 13204 ) on Friday February 25, 2005 @05:04PM (#11781468) Homepage Journal
      NULL isn't necessarily equal to 0 at the machine level. However, at the language level, 0 is always interpreted exactly the same as NULL if it's being converted to a pointer, and vice versa. This is explicitly spelled out in the C standard, so any compiler that doesn't obey this is breaking the standard. Saying !ptr and ptr == NULL is always identical on a conforming compiler (which they pretty much all are).
    • Re:NULL not always 0 (Score:3, Informative)

      by Malc ( 1751 )
      Maybe you've been listening to C++ people? Stroustrup certainly has a chip on his shoulder about NULL:

      5.1.1 Zero

      Zero (0) is an int. Because of standard conversions, 0 can be used as a constant of an integral, floating-point, pointer, or pointer-to-member type. The type of zero will be determined by context. Zero will typically (but not necessarily) be represented by the bit pattern all-zeros of the appropriate size.

      No object is allocated with the address 0. Consequently, 0 acts as a pointer litera
  • Code Twiddling (Score:3, Informative)

    by bsd4me ( 759597 ) on Friday February 25, 2005 @04:53PM (#11781269)

    There really is no one answer to this, as it depends on the compiler itself, and the target architecture. The only real way to be sure is to profile the code, and to study assembler output. Even then, modern CPUs are really complicated due to pipelining, multilevel cache, multiple execution units, etc. I try not to worry about micro-twiddling, and work on optimizations at a higher-level.

  • by slipnslidemaster ( 516759 ) on Friday February 25, 2005 @04:53PM (#11781270)
    I just checked the U.S. Patent office and sure enough, just minutes after your post, Microsoft patented "if (!ptr)" as a shorthand for "if (ptr==NULL)".

    Prepare to be sued.
  • Tradeoffs (Score:5, Insightful)

    by Black Parrot ( 19622 ) on Friday February 25, 2005 @04:53PM (#11781273)

    Hard to measure, but what is the tradeoff between increased speed and increased readability (which is a prerequisite for correctness and maintainability)? And if you can estimate that tradeoff, which is more important to the goals of your application?

    As a side note, it is far more important to make sure you are using efficient algorithms and data structures than to make minor local optimizations. I've seen programmers use bizarre local optimization tricks in a module that ran in exponential time rather than log time.

  • by sporty ( 27564 ) on Friday February 25, 2005 @04:53PM (#11781276) Homepage
    Common idioms should be compiled away, like !x or x!=0. Uncommon idioms can't and probably shouldn't be attempted, i.e. if(!(x-x)) (which is always false). Ask your compiler maker and see if patches can be made for these types of things. 'cause if you think to do it one way, chances are, many others may try it too. It would be for their benefit to make a better compiler.
  • by El Cubano ( 631386 ) on Friday February 25, 2005 @04:53PM (#11781277)

    What about code that is more complex? Now that compilers have matured over years and have had many improvements, I ask the Slashdot crowd, what they believe the compiler can be trusted to optimize and what must be hand optimized?

    Programmers cost lots more per hour than computer time. Let the compiler optimize and let the programmers concentrated on developing solid maintainable code.

    If you make code too clever in an effort to try to pre-optimize, you end up with code that other people have difficulty understanding. This is leads to lower quality code as it evolves if the people that follow you are not as savvy.

    Not only that, but the vast majority of code written today is UI-centric or I/O bound. If you want real optimization, design a harddrive/controller combo that gets you 1 GBps off the physical platter (and at a price that consumers can afford).

  • by slavemowgli ( 585321 ) * on Friday February 25, 2005 @04:53PM (#11781278) Homepage
    The most important optimization is still the optimization of the algorithms you use. Unless under the most extreme circumstances, it doesn't really matter anymore whether the compiler might generate code that takes two cycles more than the optimal solution on today's CPUs; instead of attempting to work around the compiler's perceived (or maybe real) weaknesses, it's probably much better to review your code on a semantic level and see if you can speed things up by doing them differently.

    The only exception I can think of is when you're doing standard stuff where the best (general) solution is well-known, like sorting; however, in those cases, you shouldn't reinvent the wheel, anyway, but instead use a (presumably already highly-optimized) library.
  • Beware of habits. (Score:5, Interesting)

    by SharpFang ( 651121 ) on Friday February 25, 2005 @04:54PM (#11781289) Homepage Journal
    I got in the habit of writing "readable but inefficient" code, taking care that my constructs don't get too sophisticated for the optimizer but then depending on gcc -O3 thoroughly. And then it happened I had to program 8051 clone. Then I learned there are no optimizing compilers for '51, that I'm really tight on CPU cycles, and that I simply don't know HOW to write really efficient C code.
    Ended up writing my programs in assembler...
  • Huh (Score:5, Informative)

    by NullProg ( 70833 ) on Friday February 25, 2005 @04:54PM (#11781290) Homepage Journal

    As a simple example, take 'if (!ptr)' instead of 'if (ptr==NULL)'.

    Both forms resolve to the same opcode. Even under my 6502 compiler.

    CMP register,val

    • Re:Huh (Score:5, Insightful)

      by DunbarTheInept ( 764 ) on Friday February 25, 2005 @05:11PM (#11781584) Homepage
      Not true. Many CPUs have a unary jump-if-zero, or a jump-if-nonzero operation. Thus the comparasin step can be bypassed since you know you're comparing to zero.

      However, any compiler worth anything should find that and optimize it very easily in the case where you're comparing to a constant that evaluates to zero.

  • $.02 (Score:5, Insightful)

    by MagicM ( 85041 ) on Friday February 25, 2005 @04:54PM (#11781296)
    1) Code for maintainability
    2) Profile your code
    3) Optimize the bottlenecks

    That said, (!ptr) should be just as maintanable as (ptr == NULL) simply because it is a frequently used 'dialect'. As long as these 'shortcuts' are used throughout the entire codebase they should be familiar enough that they don't get in the way of maintainability.
  • micro optimization (Score:5, Insightful)

    by fred fleenblat ( 463628 ) on Friday February 25, 2005 @04:54PM (#11781298) Homepage
    What you're talking about it micro-optimization.
    Compilers are pretty good at that, and you should let them do their job.

    Programmers should optimize at a higher level: by their choice of algorithms, organizing the program so that memory access is cache-friendly, making sure various objects don't get destroyed and re-created unnecessarily, that sort of thing.
  • by smug_lisp_weenie ( 824771 ) * <> on Friday February 25, 2005 @04:55PM (#11781303) Homepage
    ...are doomed to repeat the biggest trap in computer programming over and over again:

    "Premature optimization is the root of all evil"

    If there's only one rule in computer programming a person ever learns, "Hoare's dictum" is the one I would choose.

    Almost all modern languages have extensive libraries available to handle common programming tasks and can handle the vast majority of optimizations you speak of automatically. This means that 99.99% of the time you shouldn't be thinking about optimizations at all. Unless you're John Carmack or you're writing a new compiler from scratch (and perhaps you are) or involved in a handful of other activities you're making a big big mistake if your spending any time worrying about these things. There are far more important things to worry about, such as writing code that can be understood by others, can easily be units tested, etc.

    A few years ago I used to write C/C++/asm code extensively and used to be obsessed with performance and optimization. Then, one day, I had an epiphany and started writing code that is about 10 times slower than my old code (different in computer language and style) and infinitely easier to understand and expand. The only time I optimize now is at the very very end of development when I have solid profiler results from the final product that show noticable delays for the end user and this only happens rarely.

    Of course, this is just my own personal experience and others may see things differently.
    • by neonstz ( 79215 ) * on Friday February 25, 2005 @05:47PM (#11782123) Homepage
      A few years ago I used to write C/C++/asm code extensively and used to be obsessed with performance and optimization. Then, one day, I had an epiphany and started writing code that is about 10 times slower than my old code (different in computer language and style) and infinitely easier to understand and expand. The only time I optimize now is at the very very end of development when I have solid profiler results from the final product that show noticable delays for the end user and this only happens rarely.

      It is important to be aware of that here are different types of optimizing. Optimizing code where the compiler probably does a good job is just stupid unless the code turns out to be a major bottleneck.

      However, not thinking about optimization/speed early can IMHO be very dangerous. If the project is a bit large and complex, a nice design on the whiteboard may very well turn up to be dead slow with no chance in hell to make it run significantly faster without redesigning/rewriting the entire thing (this doesn't really have anything to do with compiler optimization though).

      I've been working in a project (I wasn't in it in the beginning), where the design probably looked good for some people in the design document (although I don't really agree on that neither), but the performance aspect was neglected until the application turned out to be quite slow. Adding mechanisms to make it run faster has been quite "challenging". (My personal opinion in this piece of software is that performace issues was ignored even from early design because the wrong people making the decisions. Basically they didn't focus on where performance really was needed.

      So after a few years my experience bottles down to: "If you have a performace requirement, make sure your code keeps up the entire time." and "You can't get both high performance and general purpose stuff in the same piece of code".

  • by swillden ( 191260 ) * <> on Friday February 25, 2005 @04:55PM (#11781306) Homepage Journal

    With regard to your example, I can't imagine any modern compiler wouldn't treat the two as equivalent.

    However, in your example, I actually prefer "if (!ptr)" to "if (ptr == NULL)", for two reasons. First the latter is more error-prone, because you can accidentally end up with "if (ptr = NULL)". One common solution to avoid that problem is to write "if (NULL == ptr)", but that just doesn't read well to me. Another is to turn on warnings, and let your compiler point out code like that -- but that assumes a decent compiler.

    The second, and more important, reason is that to anyone who's been writing C for a while, the compact representation is actually clearer because it's an instantly-recognizable idiom. To me, parsing the "ptr == NULL" format requires a few microseconds of thought to figure out what you're doing. "!ptr" requires none. There are a number of common idioms in C that are strange-looking at first, but soon become just another part of your programming vocabulary. IMO, if you're writing code in a given language, you should write it in the style that is most comfortable to other programmers in that language. I think proper use of idiomatic expressions *enhances* maintainability. Don't try to write Pascal in C, or Java in C++, or COBOL in, well, anything, but that's a separate issue :-)

    Oh, and my answer to your more general question about whether or not you should try to write code that is easy for the compiler... no. Don't do that. Write code that is clear and readable to programmers and let the compiler do what it does. If profiling shows that a particular piece of code is too slow, then figure out how to optimize it, whether by tailoring the code, dropping down to assembler, or whatever. But not before.

    • by 14erCleaner ( 745600 ) <> on Friday February 25, 2005 @05:25PM (#11781799) Homepage Journal
      I disagree with you about the readability of (!ptr), but at least it is a very common idiom in C, and I can accept that.

      But please, oh, please, don't write this:

      if (!strcmp(x,y))

      Intuitively, that looks exactly backwards from what it's testing (equality).

      On the "optimize for the compiler" issue, I think what's been said already is right: don't do it unless it's in a critical spot, write code for readability (and big-picture efficiency) first, then worry about local optimizations only if there's a problem.

  • by SamSeaborn ( 724276 ) on Friday February 25, 2005 @04:56PM (#11781323)
    "Programs should be written for people to read, and only incidentally for machines to execute."
    - Structure and Interpretation of Computer Programs []
  • by Sycraft-fu ( 314770 ) on Friday February 25, 2005 @04:56PM (#11781333)
    Each compiler is different. Some will optimise things other won't.

    In general, however, systems are now fast enought that when in doubt, write the clearest code possible. I mean for most apps, speed is not critical, however for all apps stability and lack of bugs is important and obscure code leads to problems.

    Also, for things that are time critical, it's generall just one or two little parts that make all the difference. You only need to worry about optimizing those inner loops where all the time is spent. Use a profiler, since programmers generally suck at identifying what needs optimising.

    Keep it easy to read and maintain, unless speed is critical in a certian part. Then you can go nuts on hand optimization, but document it well.
  • by sabre ( 79070 ) on Friday February 25, 2005 @04:58PM (#11781354) Homepage
    LLVM is an aggressive compiler that is able to do many cool things. Best yet, it has a demo page here: [], where you can try two different things and see how they compile.

    One of the nice things about this is that the code is printed in a simple abstract assembly language that is easy to read and understand.

    The compiler itself is very cool too btw, check it out. :)

  • Language paradigms (Score:3, Insightful)

    by alexo ( 9335 ) on Friday February 25, 2005 @05:16PM (#11781662) Journal
    > I have been coding in C for a while (10 yrs or so) and tend to use short code snippets.
    > As a simple example, take 'if (!ptr)' instead of 'if (ptr==NULL)'.
    > The reason someone might use the former code snippet is because they believe it would result
    > in smaller machine code if the compiler does not do optimizations or is not smart enough
    > to optimize the particular code snippet.

    No programmer believes that.
    In C, NULL is #define-ed to 0 and the "!" operator also compares against zero so every compiler should generate exactly the same code for both.

    > IMHO the latter code snippet is clearer than the former, and I would use it in my code

    Actually I prefer to write (and read) the former and I do find it clearer, mostly because it is idiomatic in C et al.

    Another good reason is that the former works better in C++ because it enables you to substitute "smart" objects for plain pointers and use them in a more natureal way (especially in templates).

    (Aside: most platforms that have C compilers also have deccent C++ compilers)

    > if I know for sure that the compiler will optimize it and produce machine code equivalent to the former code snippet.

    See above. There is nothing to optimize.
  • by dpbsmith ( 263124 ) on Friday February 25, 2005 @05:30PM (#11781872) Homepage
    ...then the code isn't important enough to optimize. Plain and simple.

    Never try to optimize anything unless you have measured the speed of the code before optimizing and have measured it again after optimizing.

    Optimized code is almost always harder to understand, contains more possible code paths, and more likely to contain bugs than the most straightforward code. It's only worth it if it's really faster...

    And you simply cannot tell whether it's faster unless you actually time it. It's absolutely mindboggling how often a change you are certain will speed up the code has no effect, or a truly negligible effect, or slows it down.

    This has always been true. In these days of heavily optimized compilers and complex CPUs that are doing branch prediction and God knows what all, it is truer than ever. You cannot tell whether code is fast just by glancing at it. Well, maybe there are processor gurus who can accurately visualize the exact flow of all the bits through the pipeline, but I'm certainly not one of them.

    A corollary is that since the optimized code is almost always trickier, harder to understand, and often contains more logic paths than the most straightforward code, you shouldn't optimize unless you are committed to spending the time to write a careful unit-test fixture that exercises everything tricky you've done, and write good comments in the code.
  • My experience (Score:4, Insightful)

    by pclminion ( 145572 ) on Friday February 25, 2005 @05:36PM (#11781950)
    First, let me say what sort of code I write. I work almost exclusively with high-performance, 2D graphics code. Most of what I do involve manipulating bits, worrying about cache utilization, and squeezing the last bits of performance out of a three line inner loop. I'm just going to rattle off what I know from my experience with gcc and VC++:

    The compiler will perform strength reduction in all reasonable instances.
    The compiler will raise invariant computations from inner loops in almost all cases that do not involve pointers.
    The compiler knows how to optimize integer division in ways I wouldn't have even thought of.
    The compiler sometimes "forgets" about a register and produces sub-optimal code for inner loops.
    The compiler can't always tell what variable is most important to keep in a register in an inner loop.

    Other stuff:

    x^=y; y^=x; x^=y; optimizes to an XCHG instruction with gcc on x86. I was amazed that it could do that. (Yes, that piece of code exchanges x and y). On the other hand, tmp=x; x=y; y=tmp; doesn't get optimized to an XCHG. Obviously, the compiler is using a Boolean simplifier or identity-prover.

    The compiler always assumes a branch will be taken (unless you use certain compiler switches to change this behavior). Thus you should always arrange your conditional tests so that the less-often executed code is within the braces.

    Don't be afraid to write complex expressions. Subexpression elimination is almost foolproof in all instances where pointers are NOT involved. It's better to leave your code clear, and let the compiler optimize it.

    And ABOVE ALL:

    No matter how much the compiler optimizes your code, you can throw it all down the toilet with bad design by screwing the cache utilization. This is EXTREMELY important especially in graphical applications which process huge raster buffers. Row-wise processing is always more efficient than column-wise. Random access will kill your performance. Do not trust the memory allocator to keep your allocations together. Write your own allocator if you are dealing with thousands or millions of small, related chunks of information.

    I could go on... But I must also second what others have said, which is to perform algorithmic optimizations FIRST and do not bother with constant-factor optimizations until you are CERTAIN that you are using the best algorithm. If you ignore this advice you might waste a week optimizing a three-line inner loop and then come up with a better algorithm the next week which makes all your hard work redundant.

  • by fizban ( 58094 ) <> on Friday February 25, 2005 @06:06PM (#11782376) Homepage
    Premature Optimization is the DEVIL! I repeat, it is the gosh darn DEVIL! Don't do it. Write clear code so that I don't have to spend days trying to figure out what you are trying to do.

    The biggest mistake I see in my professional (and unprofessional) life is programmers who try to optimize their code is all sorts of "733+" ways, trying to "trick" the compiler into removing 1 or 2 lines of assembly, yet completely disregard that they are using a map instead of a hash_map, or doing a linear search when they could do a binary search, or doing the same lookup multiple times, when they could do it just once. It's just silly, and goes to show that lots of programmers don't know how to optimize effectively.

    Compilers are good. They optimize code well. Don't try to help them out unless you know your code has a definite bottleneck in a tight loop that needs hand tuning. Focus on using correct algorithms and designing your code from a high level to process data efficiently. Write your code in a clear and easy to read manner, so that you or some other programmer can easily figure out what's going on a few months down the line when you need to add fixes or new functionality. These are the ways to build efficient and maintainable systems, not by writing stuff that you could enter in an obfuscated code contest.
  • valgrind (Score:4, Informative)

    by cyco/mico ( 178541 ) on Friday February 25, 2005 @06:12PM (#11782431)
    If in doubt, use valgrind and kcachegrind []. One run with callgrind gives you all the information you want:
    • How often are functions called (and branches taken)
    • Which functions take most of the time
    • See the assembler code for each line with a mouse click (no need to guess anymore)
    callgrind/kcachegrind is by far the easiest profiling solution I ever tried, and it seems answer more or less all of your questions.
  • Algorithm (Score:4, Insightful)

    by mugnyte ( 203225 ) on Friday February 25, 2005 @06:29PM (#11782615) Journal
    For a cheap, fast batch lookups I once wrote a hashed matrix using STL. Loaded all the cells, dynamically typed, added indexes on the data for that run, and then passed around this collection of in-memory tables to our routines. Ran fast and was simple to debug, since all the traversing was O(ln(n)) based (or a variant thereof). Adding serialization, we could distribute to machines overnight dynamically and cut the run to a few minutes - from almost 8 hours.

    Until it came time to dipose the memory. The STL slowly crawled tons of our objects, and the C++ dispose pattern was just too inefficient for all the stack hits. So we pointed the library at a custom heap and never disposed the dictionary - we just disposed the heap in bulk.

    All written without hesitation for "longhand" syntax. (and btw, its "if ( NULL == var ) " to those that care). The code optimized fine, with just a few choice inlines we got to stick. No reg vars, no assembly piles littering the code.

    But this was an in-house business app, and the lifecycles / requirements are different than other products. However, because of the nice algorithms, optimization wasn't difficult, and didn't rely on code tricks. If you're squabbling over code tricks for optimization, you're choosing the wrong algorithm, to me.

  • by MSBob ( 307239 ) on Friday February 25, 2005 @06:56PM (#11782860)
    First: Avoid doing what you don't have to do. Sounds obvious but I rarely see code that does the absolute minimum it needs to. Most of the code I've seen to date seems to precalculate too much stuff, read too much data from external storage, redraw too much stuff on screen etc...

    Second: Do it later. There are thousands of situations where you can postpone the actual computations. Imagine writing a Matrix class with the invert() method. You can actually postpone calculating the inverse of the matrix until there is a call to access on of the fields in the matrix. Also you can calculate only the field being accessed. Or at some sensible threshold you may assume that the user code will read the entire inverted matrix and you can just calculate the remaining inverted fields... the options are endless.

    Most string class implementations already make good use of this rule by only copying their buffers only when the "copied" buffer changes.

    Third: Apply minimum algorithmic complexity. If you can use a hashmap instead of a treemap use the hash version it's O(1) vs Olog(n). Use quicksort for just about any kind of sorting you need to do.

    Fourth: Cache your data. Download or buy a good caching class or use some facilities your language provides (eg. Java SoftReference class) for basic caching. There are some enormous performance gains that can be realized with smart caching strategies.

    Fifth: Optimize using your language constructs. User the register keyword, use language idioms that you know compile into faster code etc... Scratch this rule! If you're applying rules one to four you can forget about this one and still have fast AND readable code.

  • by roman_mir ( 125474 ) on Friday February 25, 2005 @06:56PM (#11782867) Homepage Journal
    I got this job as a contractor 4 years ago now where the project was developed by over 30 junior developers and one crazy overpaid lady (hey, Julia,) who wouldn't let people touch her code so fragile it was (and it was the main action executor,) she would rather fight you for hours than make one change in the code (she left 2 months before the project release.) Now, I have never witnessed such monstrocity of a code base before - the business rules were redefined about once every 2 weeks dor 1.5 years straight. You can imagine.

    So, the client decided not to pay the last million of dollars because the performance was total shit. On a weblogic cluster of 2 Sun E45s they could only achieve 12 concurrent transactions per second. So the client decided they really did not want to pay and asked us to make it at least 200 concurrent transactions per second on the same hardware. If I may make a wild guess, I would say the client really did not want to pay the last million, no matter what, so they upped the numbers a bit from what they needed. But anyway.

    Myself and another developer (hi, Paul) spent 1.5 months - removing unnecessary db calls (the app was incremental, every page would ask you more questions that needed to be stored, but the app would store all questions from all pages every time,) cached XML DOM trees instead of reparsing them on every request, removed most of the session object, reduced it from 1Mb to about 8Kb, removed some totally unnecessary and bizarre code (the app still worked,) desynchronized some of the calls with a message queue etc.

    At the end the app was doing 320 and over concurrent transactions per second. The company got their last million.

    The lesson? Build software that is really unoptimized first and then save everyone's ass by optimizing this piece of shit and earn total admiration of the management - you are a miracle worker now.

    The reality? Don't bother trying to optimize code when the business requirements are constantly changing, the management has no idea how to manage an IT dep't, the coders are so nube - there is a scent of freshness in the air and there is a crazy deadline right in front of you. Don't optimize, if the performance becomes an issue, optimize then.

  • by Jugalator ( 259273 ) on Friday February 25, 2005 @07:19PM (#11783084) Journal
    There's often no point in fiddling around with details, and you should instead focus on optimizing for "the big picture", e.g. whether you should really re-establish a database connection each time a user selects OK in a dialog box.

    As for simple code optimizations, here's what a modern compiler (Microsoft Visual C++ .NET 2003 in this case) optimize code as: (yes, I've checked these myself as I've been writing this post)
    for (int i = 0; i < 5; i++)
    int j = 42;
    if (j == 13)
    printf("j is %d", j); j = 42;
    return 0;
    ... optimized to ...
    xor eax, eax
    ret 0
    In other words, it basically threw away all that code since the program wouldn't make use of any values that would be calculated there. And this:
    for (int i = 0; i < 100; i++)
    int j = 42;
    i += j;
    printf("i is %d", i);
    return 0;
    ... optimized to ...
    push 129
    push OFFSET FLAT:??_C@_07JEGIODIB@i?5is?5?$CFd?$AA@
    call _printf
    add esp, 8
    xor eax, eax
    ret 0
    Do you see what that means -- yep, the compiler understood that the resulting value would be 129 by analyzing the for loop and simply just push that value onto the stack as a parameter to printf. It basically optimizes to:
    printf("i is %s", 129);
    return 0
    That's probably considered some simplistic analyzing too, by today's compilers. ;-)
  • Dear Lord (Score:5, Insightful)

    by sholden ( 12227 ) on Friday February 25, 2005 @07:35PM (#11783221) Homepage
    Ten years of programming in the language and you:

    1) Don't know when two things are obviously equivalent to any non-brain dead compiler.

    2) Think something other than readability matters.

    3) Think the non-idiomatic way of doing something is more readable.

    But I'm sure I'm just repeating the comments I can't be bothered reading.
  • by Ninja Programmer ( 145252 ) on Friday February 25, 2005 @07:42PM (#11783290) Homepage
    Saravana Kannan asks: "I have been coding in C for a while (10 yrs or so) and tend to use short code snippets. As a simple example, take 'if (!ptr)' instead of 'if (ptr==NULL)'. The reason someone might use the former code snippet is because they believe it would result in smaller machine code if the compiler does not do optimizations or is not smart enough to optimize the particular code snippet. IMHO the latter code snippet is clearer than the former, and I would use it in my code if I know for sure that the compiler will optimize it and produce machine code equivalent to the former code snippet. The previous example was easy. What about code that is more complex? Now that compilers have matured over years and have had many improvements, I ask the Slashdot crowd, what they believe the compiler can be trusted to optimize and what must be hand optimized?"
    Most compilers come with something called a disassembler. Or better yet, you can trace the code with an assembly level debugger. If you want to know whether or not your compiler produces good code, why don't you just look at your code and find out? I'll bet dollars to donuts that you have one of these tools sitting on your hard drive that will tell you what your compiler did. Seriously, if you don't know how to get the answer to the question for yourself, then you don't deserve to know the answer.

    Most compilers today will get all the simple stuff like if (!ptr) vs if (NULL == ptr) optimization. Its the more complex things that the compiler cannot "prove" where it has trouble. For example:

    void h(int x, int y) {
    for (i=0; i < N; i++) {
    if (0 != (x & (1 << y))) {
    } else {

    Very few compilers will dare simplify this to:

    void h(int x, int y) {
    if (0 != (x & (1 << y))) {
    for (i=0; i < N; i++) f(i);
    } else {
    for (i=0; i < N; i++) g(i);

    Because the compilers have a hard time realizing that the conditional is constant and should be hoisted to the outside of the for loop. The compiler has the opportunity to perform loop unrolling in the second form that its may not try in the first instance.

    You can learn these things from experience, or you can simply figure it out for yourself with the afore mentioned decompilation tools.
  • by LoveMe2Times ( 416048 ) on Friday February 25, 2005 @09:21PM (#11784038) Homepage Journal
    I'm going to presume that you've *already* picked a reasonably effecient algorithm, 'cause otherwise there's no point. Second, I'm going to presume that you've already run the profiler, so you know which lines of code are important.

    Here's my "guide to optimizing":

    1) Are you disk I/O bound? You might need to switch to memory mapped files, or you might need to tweak the settings on the ones you have. You might need to use a lower level library to do your I/O. Many C++ iostreams implementations are slow, and many similar libraries involve lots of copying.

    2) Are you socket I/O (or similar) bound? If so, you may need to rewrite with asynchronous I/O. This can be a PITA. Suck it up.

    3) Are your threads spending all their time sitting in locks waiting for other threads? One, make sure you're using an appropriate number of worker threads optimized by the number of CPUs the host has. If you've already got the right number of threads, this can be a really tough decision. Presumably, the threads are helping your program readability, and trying to rework things into fewer threads is often a *bad idea*.

    4) Are you spending all your time in malloc/new/constructor free/delete/deconstructor? Maybe you need to keep things on the stack, use a garbage collector, use reference counted objects, use pooled memory techniques, etc. In the right places, switching from some "string" library to const char* and stack buffers can give a huge benefit. Make sure, of course, that you use the "n" version of all standard string functions (the ones that take the size of the buffer as an argument) to avoid buffer overruns.

    5) Are you spending all of your time in some system call? Like maybe some kind of WriteTextToScreen or FillRectangleWithPattern type of thing? For drawing code in general, try buffering things that are algorithmically generated in bitmaps, and only regenerate the parts that change. Then just blit together the pieces for your final output. Perhaps you need to rely on hardware transparency support for fast layer compositing. You might need fewer system level windows so you draw more in one function. Maybe you need to reduce your frame rate.

    6) Are you using memcpy as appropriate?

    If any of the previous items are true, you have no business worrying about the compiler. However, once you've gotten this far, you can start worrying about optimizing your code line by line.

    7) Since you've gotten this far, the line(s) of code you're worried about are all inside some loop that gets run. A lot. They may be inside a function that's called from a loop too, of course. So, a few things to consider. A) You may need to use templates to get code that is optimized for the appropriate data type. B) You may need to split off a more focused version of the function from the general purpose function if it's also used in non-critical areas. This has negative maintainance ramifications. C) Do the bonehead obvious stuff like moving everything out of the loop that you can. D) Look at the assembly actually generated by your compiler. If you're not confortable with this, you have no business doing further optimization.

    After looking at the assembler, then you'll know if the following are important. In my experience, they are.

    1) Change array indexing logic to pointer logic:

    MyType stuff[100];
    for( int i = 0; i < sizeof(stuff) - 1; i++)
    stuff[i] = abs(stuff[i+1]/PI);
    if (stuff[i] < 0)
    stuff[i] = 0;
    else if (stuff[i] > maxval)
    stuff[i] = maxval;

    can change to:

    MyType stuff[100];
    for( MyType* ptr = stuff; ptr < &stuff[98]; ptr++)
    *ptr = abs(*(ptr+1)/PI);
    if (*ptr < 0)
    *ptr = 0;
    else if (*ptr > maxval)
    *ptr = maxval;

    This eliminates lots of redundant addition. All of those stuff[i] = val type of statements tend to generate:


  • by smcdow ( 114828 ) on Friday February 25, 2005 @10:45PM (#11784575) Homepage
    I've been using it for a while...
  • Optimization (Score:4, Insightful)

    by AaronW ( 33736 ) on Friday February 25, 2005 @11:03PM (#11784685) Homepage
    In terms of optimizing, generally compilers do a pretty good job, however there are several areas that no compiler I know of can help.

    1. Choose the right algorithm. For example, in an embedded project I worked on an engineer used a linked list to store thousands of fields that must be added and deleted. While adding is fast, it didn't scale for deleting. Changed it to a hash table and it sped it up significantly.

    2. Know your data and how it is used. Knowing how to organize your data and access it can make a huge difference. As a previous poster pointed out, sequential memory accesses are much faster than random accesses. I had to do some 90 degree image rotation code. The simple solution just used a couple for loops when copying the pixels from one buffer to another. In another, I took into account the processor cache and how memory is accessed and broke it down into tiles. The first algorithm, while simple and elegant ran at 30 frames per second. The other ran at over 200 frames per second. Looking at the code the first algorithm should be faster since the code is simpler. Both algorithms operate in O(N) time, where N=width * height.

    Further optimization attempts to hint to the CPU cache about memory made no difference (Athlon XP 1700+). The only possible way I see to speed it up further would be to write it in hand-coded assembler.

    3. Reduce the number of system calls if possible. Some operating systems can be very painful when calling the kernel. Group reads and writes together so fewer calls are made.

    4. Profile your code to find bottlenecks.

    5. Try and keep a tradeoff between memory usage and performance. A smaller tightly packed data set will execute faster with CPU caches and will reduce page faults when loading and starting up.

    6. Try debugging your code at the assembler level, stepping through it. It will help you better understand your compiler.

    7. Don't bother trying to optimize things like getting every ounce of performance when the next function you call will be very slow. I.e. in one section of MS DOS's source code which was hand-coded assembly language it was calculating the cluster or sector of the disk to access. First the code checked if it was running on a 16-bit or 32-bit CPU. Next it took the 16-bit or 32-bit path for multiplication, then it read from the disk. Why the hell write all this code to check the CPU if it's 16 or 32 bit for the multiply when the frigging disk is going to be slow. They should have just stuck with the 16-bit multiply rather than be clever.

    In general applications with GCC, I rarely see much difference between -O2 or -O3. For that matter, I often don't see a noticable difference between -O0 and -O3 for a lot of code.

    I only see improvements in some very CPU intensive multimedia code. I also saw a significant improvement in some multimedia code when I told the compiler to generate code for an Ultrasparc rather than the default, but that's because the pre-ultrasparc code didn't use a multiply instruction.

  • /. posters (Score:5, Insightful)

    by Saville ( 734690 ) on Saturday February 26, 2005 @01:21AM (#11785383)
    "I ask the Slashdot crowd, what they believe the compiler can be trusted to optimize and what must be hand optimized? Give examples of code optimizations that you think the compiler can/can't be trusted to do."

    Somehow 99% of the readers took this to mean "What is the difference between NULL and the zero bit pattern and do you think it is a good idea to write clear code and do the profile/algorithm change cycle until there is nothing left to optimize or should I write low level optimized code from the start?"

    sigh.. I've only found two comments with code so far after going through hundreds of posts. This is possibly the worst signal to noise ratio I've witnessed on /.
  • HLSL (Score:4, Informative)

    by Saville ( 734690 ) on Saturday February 26, 2005 @01:56AM (#11785490)
    As far as I'm concerned compilers are better than 99% of the programmers out there. Just write clear code and let the compiler do it's trick. However there are a couple cases where things aren't automatically optimized that I can think of.

    It's not really a coding trick like an XOR swap, but most compilers don't yet seem to fully unroll parallel loops into good SIMD instructions or multiple threads.

    The only time I've needed to bother to look at assembly output in recent years (other than debugging a release mode program) is when writing HLSL shaders. HLSL is the high level shading languge (C like) for shaders that is part of Direct3D 9. HLSL can be compiled to SM1, SM2, or SM3 assembly.

    With pixel shader 2.0 you've only got 64 instruction slots, and some important instructions like POW (power), NRM (normalize), and LRP (interpolate) take multiple slots. 64 slots is not enough for a modern shader. I curse ATI for setting it bar so low. ;)

    There are two flaws I've found with the Dec 04 HLSL compiler in the DirectX. Sometimes it will not automatically detect a dot product opportunity. I had some colour code to convert to black and white in a shader and wrote it as y = colour.r*0.299 + colour.g*0.587 + colour.b*0.114; as I thought that was the most clear way to write it. Under certain circumstances the compiler didn't want to convert to a single dot instruction so I had to write as y = dot( colour.rgb, half3(.299h, .587h, .0114h)); I'm not sure if that bug still exists in the current compiler release or not.

    Another is often a value in the range of -1 to +1 is passed in as a colour, which means it must be packed into 0-1 range. To get it back you've got to double it and add 1.
    a = p*2+1; gets converted into a single MAD instruction which takes one slot.
    a = (p-0.5)*2; gets converted into an ADD and then a MUL.

    Also conventional wisdom says you've got to write assembly to get maximum performance out of pixel shader 1.1 as it is basically just eight instruction slots. I don't have any snippets to verify this though.

    I think this thread demonstrates that either compilers are mature enough you don't need any code tricks to help them do their job or /. posters just aren't aware of the short commings of compilers (see first sentence of this post) and would rather post obvious advice than not post at all. :)

"I'll rob that rich person and give it to some poor deserving slob. That will *prove* I'm Robin Hood." -- Daffy Duck, Looney Tunes, _Robin Hood Daffy_