Forgot your password?
typodupeerror
Programming Technology

Arrays vs Pointers in C? 308

Posted by Cliff
from the optimizations-may-not-be-portable dept.
UOZaphod asks: "A recent sub-discussion on Slashdot (in which, I confess, I was involved) piqued my curiosity because of several comments made about C compiler optimizations. I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?"
"Here are my own thoughts on the issue:

For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.

The first function uses array indexing:
void reversestring_array(char *str)
{
int head, tail;
char temp;
if (!str) return;
tail = strlen(str) - 1;
for (head = 0; head < tail; ++head, --tail)
{
temp = str[tail];
str[tail] = str[head];
str[head] = temp;
}
}
The second function uses pointers:
void reversestring_pointer(char *str)
{
char *phead, *ptail, temp;
if (!str) return;
ptail = str + strlen(str) - 1;
for (phead = str; phead < ptail; ++phead,--ptail)
{
temp = *ptail;
*ptail = *phead;
*phead = temp;
}
}
While there are obvious optimizations that could be done for both functions, I wanted to keep them as simple and semantically similar as possible.

Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.

My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.

Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...

The array function:
forloop:
mov bl,byte ptr [esi+edx]
mov al,byte ptr [ecx+edx]
mov byte ptr [ecx+edx],bl
mov byte ptr [esi+edx],al
inc esi
dec ecx
cmp esi,ecx
jl forloop
The pointer function:
forloop:
mov bl,byte ptr [ecx]
mov dl,byte ptr [eax]
mov byte ptr [eax],bl
mov byte ptr [ecx],dl
inc ecx
dec eax
cmp ecx,eax
jb forloop
While this example appears to prove the claim that compiler optimizations eliminate the differences between array and pointer usage, I wonder if it would still be true with more complicated code, or when indexing larger structures.

I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."
This discussion has been archived. No new comments can be posted.

Arrays vs Pointers in C?

Comments Filter:
  • Why do you care? (Score:4, Insightful)

    by Julian Morrison (5575) on Wednesday October 12, 2005 @05:48PM (#13777402)
    For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?

    If it isn't, then don't wank around optimizing for single cycles on a machine that probably bleeds off a million cycles every time you raise a window.
    • I agree. Both of those pieces of code happen so fast that it doesn't matter. Unless all your program does is reverse lists over and over and over...
      • by aled (228417) on Wednesday October 12, 2005 @07:30PM (#13778122)
        Both of those pieces of code happen so fast that it doesn't matter.

        Unless of course that someone writes a compiler so optimizing that the code ends before it begins, causing a paradox that will end the universe. To prevent that imminent danger all programmers must start programming in TI-99/4a Basic right now.
    • Re:Why do you care? (Score:5, Interesting)

      by thegrassyknowl (762218) on Wednesday October 12, 2005 @07:25PM (#13778087)

      then don't wank around optimizing

      Dude, best use of the word wank. Ever!

      for single cycles on a machine that probably bleeds off a million cycles every time you raise a window

      Computers have become more powerful and programmers have become more lazy. It's not strictly true because instead of focussing a lot of time writing efficient code programmers are now focussing a lot of time writing a lot of code to fill bigger machines. That million cycles is wasted doing crap and probably half of doesn't need to be done anyway...

      I can still remember the days when machines ran in the sub 10MHz range (yes, 10MHz is 400x slower than today's 4GHz). Software was generally responsive, functional and minimal. Adding a zillion features and eye candy was not considered necessary. Programs were easy to use and intuitive, and did I mention functional and minimal? In the days where "nobody will ever need more than 640k" software was designed and optimised to be small and chew up few cycles.

      Now, with RAM and Gigahertz available for next to nix software has just bloated out. It's nice to see a programmer thinking about efficiency/size even if it is purely academic. We should be encouraging that; I know I'd like my applications to work faster and carry less crap than they do currently.

      • You are officially my hero now.

        I'd like to point out another related issue. Inefficiency adherents such as Java lovers always talk about how hardware improvements make the inefficiency go away. Well not if you waste it on bloatware. And at any rate, it doesn't improve the hardware I own. Efficiency will cease to matter as soon as they start buying my hardware and electricity.
        • I'd like to point out another related issue. Inefficiency adherents such as Java lovers always talk about how hardware improvements make the inefficiency go away.

          Not a dig at the OP rather than an observation:

          Faster hardware will not make the inefficiency go away - it's still the same amount inefficient. If something wastes 30% of its time wanking on, it doesn't matter how fast you make it go, it's still wasting 30% of its time wanking on.

          Faster hardware just makes minor inefficiencies less noticibl

          • by psavo (162634)

            Faster hardware just makes minor inefficiencies less noticible, so programmers add more minor inefficiencies and apply the same "but faster hardware will make it ok" attitude instead of fixing the real problem!

            None of the discussion I've seen so far has touched the real problem, not the wanking on the micro-optimisation (these compiler optimizations are all O(n) in gain) but that so few have any sort of a clue on what algorithms to use and when.

            I'm not saying that it's bad to optimize those sort of thing

      • by NonSequor (230139) on Thursday October 13, 2005 @02:10AM (#13779879) Journal
        You don't understand the problem. A chemical reaction is only as fast as its slowest step. Catalyzing the other steps will not yield an improvement in reaction rate.

        For computer programs, you won't gain anything worthwhile by optimizing code that the computer only spends 1% of its time executing. That's not to say you should do a sloppy job, but you should focus on what matters. Microoptimization techniques (those techniques that involve choices of instructions and their orders rather than changing the algorithm that is used) typically yield very small gains. Microoptimization can yield substantial benefits when used properly in heavily used sections of code, but the time involved in trying to microoptimize everything could be better used to work on macrooptimizations or organizing the code to make it more amenable to later modification.

        There's no sense in trying to make your program 0.3% faster when you could be finding a way to make it 20% faster instead.
    • *sigh* And some people wonder why in the age of 3GHz processors, they're still only as responsive as 33Mhz processors.
    • by JanneM (7445) on Wednesday October 12, 2005 @09:37PM (#13778761) Homepage
      For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?

      When the programming task is something like real-time image processing (computer vision), this kind of thing can make a serious difference. If 90% of your time is spent running these kinds of loops over and over again, an improvement in time will make a real difference on what combination of methods you have time for; or how exhaustively you can search for features during one frame; or what resolution image you are able to work on.

      If your code does something nice and graphical where 99% of the time is spent waiting for the user, sure, you're absolutely right. And if your system is doing something inherently bounded - it works until it's done, then it stops and waits until it's time again - then all you need is to make it fast enough and no faster. But there are real-world systems that today, and for the foreseeable future, are bounded by the available processing power and that can always benefit from any improvement in execution time.
      • For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?

        When the programming task is something like real-time image processing (computer vision), this kind of thing can make a serious difference. If 90% of your time is spent running these kinds of loops over and over again, an improvement in time will make a real difference on what combination of methods you have time for;

        Hence his question. In

    • Photoshop plug-ins (Score:4, Informative)

      by mwvdlee (775178) on Thursday October 13, 2005 @09:15AM (#13781406) Homepage
      I've written quite a lot of code for Photoshop plug-ins.

      Since this type of code typically iterates over a few hunderd million pixels you'd think that changing such details as array vs. pointer or some other common optimalization technique would have an impact.

      It does; it typically shaves about a few tenths of a second off of a 5 minute calculation.

      Then again, spending that same amount of time altering the algorithm will usually increase performance in a noticable way.

      Nowadays I don't bother optimizing code (usually the compiler does a better job at it anyway) but rather optimize the algorithms. Instead of opening the topic and waiting for a definitive answer on your quest for ultimate performance, you could probably have rewritten the algorithm and gained much more performance you'd ever get this way.
  • by topham (32406)

    This question sounds to me like discussions I had ages ago with other programmers, and it was always 1 programmer trying to justify his method of coding a routine over some other, equally valid method.

    Pointer arithmetic and array's in C really have the same issues. You can access beyond the array, and you can direct a pointer beyond the correct memory space. If you want to discuss good programming practices, C isn't it.

    I never really saw the point in arguing over array and pointers in C. When I programmed i
    • There are still tons of smaller, slower, "antiquated" processors embedded in all sorts of things, for which code gets written. The Z80 is still alive and well, believe it or not, along with lots of other, old standards (and newer, but still small and slow, CPUS).
      • very true.

        but reading the submission it really looked to me like someone who simply wanted to justify their opinion on something. he wanted a generic answer to a question which requires a specific answer.

        if someone here said the pointer method was always faster he would run back to his friends and declare he was right all a long.

        the fact is the answer requires knowledge of all the target platforms as well as all of the compilers involved.

        pointer method is generally faster and more efficient; but it makes fo
  • WOW! (Score:2, Insightful)

    by Anonymous Coward
    Now THIS is the kind of discussion that should be going on at Slashdot!
  • When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?

    One can always check the resulting assembly code, if one is so concerned about this.

    Though I'm pretty sure this isn't the performance bottleneck in your code (just remember - profile, profile, profile)

  • by Anonymous Coward on Wednesday October 12, 2005 @06:09PM (#13777553)
    ...and they want their arrays, pointers and C back.
  • In my experience ... (Score:2, Interesting)

    by Keeper (56691)
    In my experience, which is somewhat dated (by about 5 years), what you state is generally true for simple loops.

    However, for more complicated loops the compiler fails to opimize the loop very well. My 'experience' was in writing an alogirthm which took a raw camera sensor ccd data and ordered it into a proper bayer pattern. When moving from indexer based access to pointer based access I saw a significant increase in performance.

    Even basing one pointer off of another was slower than maintaining separate po
  • It optimizes out (Score:5, Interesting)

    by klossner (733867) on Wednesday October 12, 2005 @06:12PM (#13777571)
    An optimizing compiler, such as gcc -O, will rearrange the array code into the pointer code -- it doesn't require a base-index address mechanism. This is called strength reduction [wikipedia.org].

    Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree.

    • Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree.
      Kudos. :) My hat's off to you.
    • by LSD-OBS (183415) on Wednesday October 12, 2005 @07:23PM (#13778077)
      Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree

      Hah, yeah. Last week in a public toilet in London, someone drew an arrow pointing to the toilet roll, and it said "Computer degrees - please take one".

      Guess that about sums it up ;-)
    • In the "real world", not all compilers are that intelligent.

      Yes, it does make a substantial difference in speed with some systems/compilers.

    • Re:It optimizes out (Score:2, Informative)

      by Anonymous Coward
      Exactly, strength reduction changes the indexing operation to straight pointer arithmetic. this is done at least when i looked a few years ago in gcc/g++, in the later phases of the compilation, so that that the compiler is rearranging the RTL to eliminate the indexing variable. you can verify strength reduction by just setting the optimisation in gcc and looking at the assembler output.

      The point is now a bit moot since for many loops you do not want either indexing or pointer arithmetic, you want SIMD ins
    • Re:It optimizes out (Score:3, Interesting)

      by metamatic (202216)
      That was my first thought too. "They're equivalent, so why is anyone even asking? Any optimizing compiler will handle it."

      Then I remembered that this is Slashdot, where the groupthink is that a CS degree is useless and doesn't teach anything you need to know in the real world.
  • by 21chrisp (757902) on Wednesday October 12, 2005 @06:13PM (#13777573)
    These types of opimizations are virtually pointless on modern machines. The increased readability and lower likelihood of programming errors on the array option far outway any speed increase for the pointer option. Plus, as you noticed, the resulting assembly is basically the same. Most likely, both will run at virtually the same speed with modern compilers. Not only would the speed difference be unnoticeable, it would be utterly inconsequential.

    IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
    • by NullProg (70833) on Wednesday October 12, 2005 @08:11PM (#13778357) Homepage Journal
      These types of opimizations are virtually pointless on modern machines.
      I call bullshit. Optimizations are important regardless of the language or CPU.

      My Pentium III test machine with 256Meg of Ram blew away a dual processor Intel system with 1Gig of Ram while parsing a 30Meg XML import/export file.

      It took over six hours on the dual processor system with the native .Net and Java XML parsers. And yes, the original programmers tried several different methods/libraries to tweak the code (Sax, Xerces, whatever). They got it down to a best of four and a half hours.

      My C program parsed it in a hour and half. And yes, I used pointers. Why? Because its more efficient.

      Time is money, especially when your trying to push down 20,000 price changes from the Mainframe to 2000+ POS units during the off hours. The solution? We put my C routine into a shared library callable via C# or Java. Bonus, the 'C' code gets it done under an hour on the dual CPU machine. And yes, I tested the inputs for overflows, security problems whatever, before we went into production. Theres a big difference between a programmer who knows a language vs a programmer who understands it.

      IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
      Thats your opinion and I'm glad you shared it. You do know that your C#/Java/VB/Python etc. VM calls all wind up as pointer arithmetic to the CPU? Don't you? I wouldn't want to work for you though. Your competitors will write a faster program that uses less memory and you will loose the contract/job.

      No flame intended,
      Enjoy.
      • There are optimizations and then there are optimizations. If you run this particular loop once, or twice, it won't matter. Run it ten times or even a hundred, it won't matter. In these cases it would be a pointless optimization. Profile your software before you optimize. That way you can optimize where you need it and not waste time where you don't.

        I remember one code review where someone told me to use prefix instead of postfix notation, for optimization reasons. Yet it occured in an initialization routine
      • My Pentium III test machine with 256Meg of Ram blew away a dual processor Intel system with 1Gig of Ram while parsing a 30Meg XML import/export file.

        Heh, a little offtopic but - This is why I hate XML. It's so bloated. You take 1 to 6 hours parsing a 30 megabyte XML file in C? I was just tasked with parsing out some select data from a 37 gigabyte XML file (870 million lines). I tried all sorts of optimizations and parsers upon finding that it might take days to parse. My solution - 50 lines of perl usin
        • One of the more interesting things I've seen is to see how different software developers write a program for the following problem:

          Jack bought a bag of 100 pieces of candy at the store. It has 90 cherry candies and 10 lemon candies. He prefers the cherry candies over the lemon candies.

          Every day Jack randomly picks a candy from the bag. If the candy is a cherry candy, he eats it. If the candy is a lemon candy, he puts it back and randomly draws again. He eats the second randomly chosen candy no matte

          • Interesting. I'm at work now, so I shouldn't take the time to actually solve it until I get home. But it sounds like a problem you could solve by:
            1. Statistics. It seems like if you are One with the Statistics, you could do it with pencil, paper, and a calculator just powerful enough to do factorials. (Unfortunately, I am not.)
            2. Dynamic programming [wikipedia.org] or recursion with memoization. (There are overlapping subproblems.)
            3. Recursion. This would be slow, and I bet it's the way most people use.
    • Thinking like this (Score:3, Insightful)

      by phorm (591458)
      Is why we have big, ugly, bloated programs that require overpowered CPUs. First of all, it depends on what the application is being coded for. Perhaps it's actually intended for slower machines as well as faster... not everything is made to run on a P4-4Ghz machine you know.

      Secondly, these operations can add up. If, for example, this scenario is used throughout the code and called several times per second, on an operation perhaps requiring ready output, the output might even be visible delayed on a faster
    • lower likelihood of programming errors on the array option. .... IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.

      You realise that pointer arithmetic and array indices are the same thing, don't you? Ie given:

      int b[100];
      int i;

      The following are equivalent:

      b[i++]

      *(b + i++)

      Arrays *are* /nearly/ equivalent to pointers, indeed an array variable without a subscript degrades to a pointer. Further, note
  • Different compilers will optimize abstractions in different ways, so you are not guaranteed to get the same code when using arrays. Pointer operations should(!) compile to essentially the same code on ANY compiler, as the code is already about as reduced as it can get.

    In consequence, if you want a predictable program on ANY compiler, pointer operations are probably a better bet than arrays, regardless of how well compilers handle those arrays.

    If you want something of uniform reliability, rather than uniform

    • How the code gets treated depends on how "clear" it is to the compiler. The loop in the example is simple and any decent optimizing compiler worth it's 1's and 0's should optimize it.

      But if the code is unclear, let's say multiple nested loops with a good number of branches, local variables, etc. then an optimizing compiler will probably say WTF and just leave it alone.

      As has been said many times, profile your code. If you find the bottle-neck, fix it. If it comes down to pointer vs. array, get rid of th arr
  • Strength Reduction (Score:4, Informative)

    by The boojum (70419) on Wednesday October 12, 2005 @06:18PM (#13777607)
    My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.
    Yes, semantically array[index] has to have the same effect as *(array+index). But the compiler is free to generate conceptually equivalent code in any way that it pleases. Any decent C/C++ compiler optimizer that can perform strength reduction ought to be able to see how the index changes the memory location and turn it into simple pointer incrementation accordingly. And strength reduction is a well known optimization that's been around for ages -- if memory serves, even the old Red Dragon book talks about how it works in this context. If your compiler can't handle this you need to find a better compiler.
  • Do This instead (Score:3, Interesting)

    by Leimy (6717) on Wednesday October 12, 2005 @06:19PM (#13777616)
    void reversestring_array(char *str)
    {
    int head, tail;
    char temp;
    if (!str) return;
    tail = strlen(str) - 1;
    for (head = 0; head < tail; ++head, --tail)
    {
    temp = tail[str];
    tail[str] = head[str];
    head[str] = temp;
    }
    }

    That'll teach ya.
    • Bah, the original requirement was to do it with as little memory allocation as possible. You still use a temp variable. Let bitwise operations be your friend.
      • As was discussed in the original thread one point that this excercise miss is that a compiler will naturally put the temporary variables into registers. So the entire XOR thing is cute but will produce slower and less understandable code.

        Naturally neither version will actually use any added memory (as in RAM) but I doubt an optimising compiler will be able to perform strength reduction on the XOR operations.

        Don't try to out-think your compiler. The compiler is your friend.
  • The array function:
    mov bl,byte ptr [esi+edx]
    mov al,byte ptr [ecx+edx]
    mov byte ptr [ecx+edx],bl
    mov byte ptr [esi+edx],al

    The pointer function:
    mov bl,byte ptr [ecx]
    mov dl,byte ptr [eax]
    mov byte ptr [eax],bl
    mov byte ptr [ecx],dl

    if you look closely at these two portions of the assembl

    • Got to think of the 6502 and its indirect indexed addressing modes..

      Basicly, you'd store a pointer somewhere in memory page 0, and then use the x or y register as an index into whatever it pointed at (ie LDA ($a0),x would take the pointer stored in address a0 and a1, use the x register as an index, and read the contents of the resulting address into the acumulator) . Here it would be faster to manipulate the index and not the pointer.

      Pointers were 16 bit (64k address space) and x and y were 8 bit registers.
    • by Waffle Iron (339739) on Wednesday October 12, 2005 @07:44PM (#13778211)
      what you in fact have is in the array 4 additional additions (between registers),

      Actually, most x86s have a dedicated address generation units which handle those index additions in parallel on separate logic from the main ALU. So both cases would actually run at the same speed on a modern x86.

  • by Anonymous Coward
    The whole point is dumb.
    Every C programmer should know the answer to the following question:

    What is 6["abcdefghijkl"]?

    Answer: 'g'.

    How is this determined?
    By definition x[y]=*(x+y)=y[x].

    Don't believe me, check the standard.

    ( Yeah this is a degenerate case, like Duff's device. Still a compiler has to support it. )
  • I've been reading some really good arguments in these posts about how it'll work. And the bulk of them end up saying, "any good compiler will/should"...

    In other words, "No, you cannot."

    Your best bet is to do (almost) what you did - compile a test case per platform, check the code - and then the part that you neglected, you need to count the clock ticks per instruction. To reinforce what one poster said - there can be a large difference between [ecx] and [ecx+edx], conceptually speaking. If it truly matter
    • If it truly matters then you count the ticks for each platform, and decide if it is reasonable. 3 ticks versus 2... that's 33%. 2 ticks versus 1... that's 50%.

      Not necessarily. You're ignoring the fact that modern processors use pipelining architectures, branch prediction, extensive caching often at several levels, and a whole host of other things that mean the total time required is not the sum of the individual times for each instruction.

      One of these days, someone will invent a program that can take

      • Yes, necessarily.

        "When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?" is the quesiton.

        "You're ignoring the fact that modern processors use pipelining architectures, " you propose.

        As I said, the bulk of the posts end up saying, "any good compiler will/should"...

        In other words, "No, you cannot."

        But even in the case with chip-level optimizations, your point is still moot; for those processors, then it doesn't make any difference for you - call them
  • int array[8];
    int *int_ptr = malloc(8 * sizeof(int));
    char string[] = "Hello world!";
    char *char_ptr = "Hello world!";

    sizeof(array) == 8 * sizeof(int);
    sizeof(int_ptr) == sizeof(int *);

    You can get int_ptr to point to another location in memory (such as doing ++int_ptr); the similar statement (++array) is illegal.

    I think that altering individual characters of char_ptr is undefined (as it would result in changing the value of the literal string), while (as above) string cannot be pointed to a new literal string.
  • by blackcoot (124938) on Wednesday October 12, 2005 @07:19PM (#13778055)
    ... my experience has been that this matters more in the multidimensional array case than in the single dimensional array case (for those who are curious: my goal is to write algorithms which do non-trivial amounts of processing on VGA or larger video at full frame rates [>= 15Hz], any time i can make array operations faster my entire program benefits significantly). when dealing with two dimensional arrays, you can either do the addressing yourself (location [i,j] in a r x c matrix maps to [i*c+j] in a flat array). if you are clever about how you explain your indexing to the compiler, it should realize that you're passing through consecutive addresses in memory and generate code accordingly. if, on the other hand, you're doing something like A[i][j], the compiler has to generate two deref ops plus pay the cost of whatever cache misses result from using the two levels of indirection --- in this case, working with pointer / index arithmetic relative to the base address is a big win.

    have you tried this with intel's c/c++ compiler or other compilers? i'd be curious to see if what you're seeing is a result of how gcc is limited in the number of optimizations it can apply directly to the parse tree because it can't assume (at that stage) a particular target machine.
    • if, on the other hand, you're doing something like A[i][j], the compiler has to generate two deref ops plus pay the cost of whatever cache misses result from using the two levels of indirection --- in this case, working with pointer / index arithmetic relative to the base address is a big win.

      Not true. Not true at all. Multi-D arrays are merely syntactic sugar for single dimension. It doesn't involve two pointer dereferences unless it's a jagged array, aka an array of pointers. A built in C multi-D array
    • if, on the other hand, you're doing something like A[i][j], the compiler has to generate two deref ops plus pay the cost of whatever cache misses result from using the two levels of indirection --- in this case, working with pointer / index arithmetic relative to the base address is a big win.

      That's true for jagged arrays, but IIRC, statically allocated multidimensional arrays in C are not jagged. A[i][j] is equivalent to A[i*c+j] (or maybe A[i+j*c], whatever).
  • ... and then you will no longer have to worry that it might be slow.
  • by RML (135014) on Wednesday October 12, 2005 @07:25PM (#13778091)
    Just for fun, I tried the sample code on gcc (GCC) 4.1.0 20050723 (experimental), with -O3 -march=pentium-m. The loop from the array version:

    L13:
    movzbl  -1(%ebx), %edx
    movl    %esi, %ecx
    decl    %edi
    movl    8(%ebp), %eax
    movb    %dl, -13(%ebp)
    movzbl  -1(%esi,%eax), %edx
    movb    %dl, -1(%ebx)
    decl    %ebx
    movzbl  -13(%ebp), %edx
    movb    %dl, -1(%esi,%eax)
    incl    %esi
    cmpl    %ecx, %edi
    jg      L13

    The loop from the pointer version:

    L5:
    movzbl  1(%esi), %edx
    movl    %esi, %ecx
    movzbl  (%ebx), %eax
    movb    %al, 1(%esi)
    decl    %esi
    movb    %dl, (%ebx)
    incl    %ebx
    cmpl    %ecx, %ebx
    jb      L5

    Time to execute the array version 100,000 times on a 10,000 character string: 0m4.515s
    Time to execute the pointer version 100,000 times on a 10,000 character string: 0m3.936s

    So the pointer version actually generates somewhat faster code with the compiler I used on this example, which surprises me. But there's no substitute for actually testing.
    • GCC is designed to be portable, not fast, so the code is generates is often pretty bad compared to specialised, platform-specific compilers. Obviously your test is relevant if GCC is the compiler you'll be using, but for serious performance work it's pretty much irrelevant what GCC generates because no-one uses it when native alternatives are available anyway. In fact, your example code here is a great demonstration of this!

    • Use an actual release of the compiler and report this regression to the gcc mailing list. It's an error.
    • Unfortunately, the article didn't say what compiler he was using. But since we're giving data points:

      gcc 3.4.2 3.4.2 [FreeBSD] 20040728, x86, -O3 -march=pentium-m. Generated essentially the same code as the article's.

      Array version:

      .L12:
      movzbl (%esi,%ecx), %edx
      movzbl (%esi,%ebx), %eax
      movb %al, (%esi,%ecx)
      decl %ecx
      movb %dl, (%esi,%ebx)
      incl %ebx
      .L10:
      cmpl %ecx, %ebx
      jl .L12

      Pointer code:

      .L23:
      movzbl (%ebx), %edx
      movzbl (%ecx), %eax
      movb %al, (%ebx)
      decl

  • And C++/STL? (Score:5, Informative)

    by XenonOfArcticus (53312) on Wednesday October 12, 2005 @07:27PM (#13778103) Homepage
    And what of STL container classes under C++?

    Seriously though, there is no generalized answer. Good compilers will do what you want. Bad compilers (and there are more than you realize out there) will make lousy code.

    If your target involves an environment where you might be using a more primitive compiler, or you can't predict the environment and compiler, it might be an issue. This is why code like the PNG and JPEG libs go for tight/cryptic. As well, the performance of the runtime platform (CPU, memory, bus) have bearing. In some cases, though one piece of assembly might look less efficient than the other, the sheer brute force of CPU parallelism, out of order execution and other black juju might render it meaningless.

    Finally, you have to consider the cost/benefit of the situtation. Making cryptic fast code is worthwhile if you're writing some wicked FFT code or image processor main loop, where it'll run a few quadrillion times. Other places in the same codebase though, it's probably worthwhile to trade absolute performance for a bit better code readability and maintainability.

    Remember, profile before optimizing. Only optimize things that will really make a significant performance difference. Rewriting the UI display loop to use pointers instead of lists is probably pointless. Heh. Pointless.

    I'm a big fan of C++ STL containers now. If I _know_ a block of code is going to be a critical bottleneck, I'll use something else from the start. I've known people who coded UIs in assembly, for no good reason, and others who wrote image processing code in interpreted RAD scripting environments. I've written and optimized code (C++/C and assembly) on systems all the way back to the 6502 (yay! two 8-bit index registers!) and it's not hard, as long as you proceed sensibly and logically.

    That being said, the Microsoft VC++6 compiler (still in common use today) has a terrible code generator. It fails to perform simple loop invariant hoisting operations that my old SAS/C++ compiler (Amiga, yah, I'm one of _them_...) did in 1990. VC++7/2003 and Whidbey/2005 are showing signs of being MUCH more caught-up, and the Intel and Codeplay compilers (despite Intel's AMD-phobia) are much better too.

    When performance really counts, a whole new set of tools and processes come into play.
    • by Arandir (19206)
      I'm currently writing a piece of software to compete with a commercial proprietary offering. I'm using "bloated" STL and "bloated" C++ to manipulate "bloated" XML. Frankly, I'm shocked at its poor performance, and I might have to do some optimizations. It takes about a half of a second to load and process 200k worth of XML.

      ON THE OTHER HAND, the commercial proprietary alternative I'm competing against loads the equivalent data. But that data consumes 25Megs, and takes five seconds to load! You would think i
    • Re:And C++/STL? (Score:3, Informative)

      by WARM3CH (662028)
      OK I tried it with Visual Studio 2003 and also tried STL. Of course with STL you only need one line of code to reverse a character string:

      reverse(str, str+strlen(str));

      Now, the interesting part is what the optimizing compiler outputs for each of the three vairants (I only include the inner loop):

      With pointers we have:

      $L13583:
      mov bl, BYTE PTR [ecx]
      mov dl, BYTE PTR [eax]
      mov BYTE PTR [eax], bl
      mov BYTE PTR [ecx], dl
      inc ecx
      dec eax
      cmp ecx, eax
      jb SHORT $L13583

      With array we have:

      $L1252

  • If you use arrays the compiler these days will ocnvert to pointer arithmetic. However it also now knows that it is an array and can make assumptions not available to it before allowing more code order optimizations in loops.

    These days if its something stupid like using one syntax over another, that optimization has been built into the compiler. The only real speed enhancements are going to be algorithmic. You know the old thigns that make sense, take stuff out of loops that dont need to be inside, compiler

  • If you answer this question, you fuel a discussion that should be constrained to a very very small domain of C development. Otherwise, you're going to see needless flag waving about coding standards in domains where it doesn't need to exist. Don't feed the dirty C programmer (we like other info, really, we do).

    and frankly, the TFA depends on architecture, where most modern compilers make both snippets equal.
  • by jschmerge (228731) on Wednesday October 12, 2005 @08:28PM (#13778441)

    Premature optimization is the root of all evil.

    I personally let the compiler writers worry about this type of thing. I'd rather have my code be more readable than fast. That being said, there are many times that I switch back and forth between pointer arithmetic and array indexing within the same program. I'll primarily use the pointer arithmetic for things like simple string processing where it just leads to more compact code, and I'll use the array indexing when I have an actual bonafide array...

    In any event, my point is that you should be programming in a way that is maintainable and readable; you shouldn't be worried about shaving tens of clock cycle off of such simple loops. In more complex loops, you probably will not be able to shave nearly as much time off, because your indexes won't always be sitting in a register and the data that the index points to will most likely not even fit into a register. In this case, I don't think anyone will argue with the assertion that this:

    (ptr + index)->dataMember

    is less readable than this:

    ptr[index].dataMember
  • Pointers are for programmers, not for speed. A particular compilers implementation of the C specification may produce better or more efficient code when using pointers, but i seriously doubt that every implementation would work the same or even across different architectures.
  • by bigsteve@dstc (140392) on Wednesday October 12, 2005 @11:08PM (#13779181)
    I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?

    Honestly, there is no real justification for making any assumptions. It depends on how good a particular C compiler is at generating code for a particular ISA. Indeed, for some ISA's I'd expect a niave C compiler to generate FASTER code for indexing an array than for incrementing a pointer. (I'm thinking of word addressed ISAs, where a 'char*' has to be represented as a word pointer and a character offset.)

    In the big picture, it probably doesn't make a great deal of difference to performance which style you use. It might make 5-10% difference in a tight loop, but probably much less across a large application. If the difference performance is significant, you will get more benefit for your effort by:

    • finding better C compilers, or
    • using a profiler to find the real hotspots in the code and hand-optimizing them for the hardware platforms that you really care about.

    As other people have said, other issues than this are likely to have a greater bearing on the overall performance of a typical application; e.g. data structures, algorithms, database design, etc.

  • Vectorisation (Score:3, Insightful)

    by Heretik (93983) on Wednesday October 12, 2005 @11:13PM (#13779224)
    Maybe not relevant in this case since you're working with strings, but with vectorization being so important to performance on most modern architechtures, if you were dealing with floats the pointer one might actually be slower because it's much harder for the compiler to figure out if (and how) it can vectorize it.

    I'm not sure about various compilers and what they do in this case, but following the progress of GCC4's vectorisation, it looks much more likely that the pointer case is passed over by the vectorizer and ends up being (way) slower than the easy to vectorise array index version.

    Like I said, not sure what the actual situation in practise is, but it's worth looking into. The difference between vectorized SSE code and plain old x86 code (for example) is WAY greater than the trivial insignificant difference between the two examples you posted.
  • ASSume (Score:3, Insightful)

    by larry bagina (561269) on Wednesday October 12, 2005 @11:16PM (#13779236) Journal
    My argument is that one cannot assume

    You're right, however, you're also assuming that your pointer arithmetic is faster.

    Consider a 16-bit architecture with 32-bit pointers.

    Using pointer arithmetic (32-bit) is slower than using an index register (16-bit) as the array index.

    So stop assuming and stick with what you're comfortable with. If you prefer pointers, fine. if you prefer arrays, fine. But if you're so concerned with the speed, you'd be doing it in assembly.

  • by Anonymous Coward on Thursday October 13, 2005 @12:38AM (#13779569)
    I develop highly optimizing compilers for a living, so use that to put in context what I'm going to say. Things look very different from the other side of the source file.

    I'll admit that I'm always slightly bemused by these sorts of discussions. That bemusement quickly turns to disiniterest after I realize that a lot of people are burning a lot of cycles arguing about it.

    Quite frankly, your example has little relevance in the real world. The optimization you are talking about is covered by strength reduction, as others have pointed out. But that's not the point of my message. This sort of piddly optimization means almost nothing when one looks at the whole application. If this piece of code is in an inner loop that takes 90% of the application time and it proves to be the bottleneck, then one can think about taking a closer look.

    We have customers that come to us all the time with just such examples. They literally tell us, "You missed an opportunity to use a register here," and we know it's important because we know the customers are serious about profiling code and finding bottlenecks. So when they come to us we are happy to look at the "piddly stuff."

    I've seen all kinds of different code. They basically fall into two categories: code that the compiler can do something significant with and code the compiler has no hope of automatically optimizing in any truly meaningful way. When I say "significant" and "meaningful," I explicitly mean "not Dragon Book stuff, except for scheduling (which can provide a significant performance win)" Simple optimizations like common subexpression elimination and copy propagation are more useful at enabling other optimizations than in any cycles gained in their own right. They are important, but not to make the code run significantly faster in and of themselves.

    If one writes an application that does a lot of branching and pointer chasing (say, like, a traditional compiler*) then there's not much an optimizer can do with it. The aliasing difficulties alone will kill most optimizations. It's more important to write these kinds of applications in an understandable way because that is where the programmer time is most costly.

    That said, judicious use of directives for compilers that support them can go a long way toward making these kinds of codes run really fast. Think of threading a tree search, for example. But the compiler is not going to have much hope of converting such a low-level piece of code without help.

    An example of code that a compiler can do really, really well on are the traditional scientific applications. Here parallelism is everything, be it data-level, thread-level or at the distributed memory level (for really big machines). In these cases the loop optimizations are orders of magnitude more important speed-wise than sequential optimization (though, as I said, sequential optimization can enable some of these loop transformations). Some of the more important loop restructuring transformations are:

    • Interchange
    • Unroll
    • Fusion
    • Fission
    • Unroll-and-Jam
    • Invariant hoisting (both data and control)
    • Vectorization
    • Cache Blocking

    When the compiler is done with these, one hardly recognizes the code anymore. :)

    In my experience, the fundamental problem is that compilers are really hard to understand. People argue about what a compiler can and cannot do because they are enormously complex systems that require arcane knowledge of language standards and hardware architecture to really dig into. It's slightly less difficult to understand the broad strokes, for example, simple cases of vectorizable loops. It's a lot more difficult to understand how to parallelize a loop that compresses a sparse array into a dense one.

    If there's one lesson that I like to convey to programmers, it's to not sweat the optimization details. Don't hand-optimize the code. I can tell you fro

  • Benchmarking (Score:2, Insightful)

    by Threni (635302)
    If you're talking about optimisation then the only thing to do is to check different strategies. You can't just assume type a is a better coding method than type b. Even if you could prove it were true for every single compiler on the market, one could come along tomorrow and do things differently.
  • Pointer aliasing (Score:3, Interesting)

    by igomaniac (409731) on Thursday October 13, 2005 @04:27AM (#13780174)
    With modern compilers you should always use arrays. The reason is that for all but the most pathological cases of indexing, the compiler will do strength reduction and induction to turn the p[i*4] into *p,p+=4. The problem with using pointers is that if you have more than one of them, the compiler has no easy way of knowing if they point to the same place (this is known as pointer aliasing) so a bunch of useful optimizations have to be turned off. If you are doing a[i] = b[i] the compiler can much more easily find out that a and b are distinct memory locations than if you are doing *p = *q.

    If you want to learn more about these kind of source level optimisations, look the AMD Athlon(TM) Processor x86 Code Optimization Guide is a good reference -- it includes a section on why you want to use array accesses instead of pointers, and also has a lot more up-to-date and useful advice on what compilers do to your code. It is available from AMD's website.
  • by microTodd (240390) on Thursday October 13, 2005 @07:11AM (#13780545) Homepage Journal
    Wow. Just wow.

    No one will probably read this comment because its been a day since the OP, but I'm amazed at the quantity of people who are slamming this guy for wanting to research something that's admittedly interesting.

    For starters, if the submitter is a CompSci student then he definately gets my kudos. Too many CS students are just focused on "I wanna learn C# so I can go make money!" as opposed to actually LEARNING.

    Secondly, what happened to just plain geekiness of research and studying things because its fun and interesting? Does everything we do have to have some specific applicable purpose? If you say yes, you are thinking like the MBAs that always get bashed around here instead of a real nerd.

    Who knows? Its unlikely, but possible that thinking about this problem somehow leads to a train of thought that solves P=NP or something.

The use of anthropomorphic terminology when dealing with computing systems is a symptom of professional immaturity. -- Edsger Dijkstra

Working...