Arrays vs Pointers in C?

Follow Slashdot stories on Twitter

Arrays vs Pointers in C? 308

Posted by Cliff on Wednesday October 12, 2005 @06:42PM from the optimizations-may-not-be-portable dept.

UOZaphod asks: "A recent sub-discussion on Slashdot (in which, I confess, I was involved) piqued my curiosity because of several comments made about C compiler optimizations. I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?"

"Here are my own thoughts on the issue:

For discussion purposes, I present the following two equivalent functions which reverse the contents of a string. Note that these code fragments are straight C, and do not account for MBCS or Unicode.

The first function uses array indexing:

void reversestring_array(char *str) {

int head, tail; char temp; if (!str) return; tail = strlen(str) - 1; for (head = 0; head < tail; ++head, --tail) {

temp = str[tail]; str[tail] = str[head]; str[head] = temp;

}

}

The second function uses pointers:

void reversestring_pointer(char *str) {

char *phead, *ptail, temp; if (!str) return; ptail = str + strlen(str) - 1; for (phead = str; phead < ptail; ++phead,--ptail) {

temp = *ptail; *ptail = *phead; *phead = temp;

}

}

While there are obvious optimizations that could be done for both functions, I wanted to keep them as simple and semantically similar as possible.

Arguments have been made that the compiler will optimize the first example using register indexing built into the CPU instruction set, so that it runs just as fast as the pointer version.

My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.

Granted, when I compiled the above examples on an x86 machine, the resulting assembly for each of the two functions ended up looking very similar. In both cases, I enabled full compiler optimization (Pentium Pro). I will present just the inner loop for each function...

The array function:

forloop: mov bl,byte ptr [esi+edx] mov al,byte ptr [ecx+edx] mov byte ptr [ecx+edx],bl mov byte ptr [esi+edx],al inc esi dec ecx cmp esi,ecx jl forloop

The pointer function:

forloop: mov bl,byte ptr [ecx] mov dl,byte ptr [eax] mov byte ptr [eax],bl mov byte ptr [ecx],dl inc ecx dec eax cmp ecx,eax jb forloop

While this example appears to prove the claim that compiler optimizations eliminate the differences between array and pointer usage, I wonder if it would still be true with more complicated code, or when indexing larger structures.

I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references."

This discussion has been archived. No new comments can be posted.

Arrays vs Pointers in C?

Load All Comments

Search 308 Comments Log In/Create an Account

Comments Filter:

Why do you care? (Score:4, Insightful)

by Julian Morrison ( 5575 ) writes: on Wednesday October 12, 2005 @06:48PM (#13777402)

For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?

If it isn't, then don't wank around optimizing for single cycles on a machine that probably bleeds off a million cycles every time you raise a window.

Share
twitter facebook
- Re:Why do you care? (Score:2)
  
  by Blakey Rat ( 99501 ) writes:
  
  I agree. Both of those pieces of code happen so fast that it doesn't matter. Unless all your program does is reverse lists over and over and over...
  - Re:Why do you care? (Score:5, Funny)
    
    by aled ( 228417 ) writes: on Wednesday October 12, 2005 @08:30PM (#13778122)
    
    Both of those pieces of code happen so fast that it doesn't matter.
    
    Unless of course that someone writes a compiler so optimizing that the code ends before it begins, causing a paradox that will end the universe. To prevent that imminent danger all programmers must start programming in TI-99/4a Basic right now.
    
    Parent Share
    twitter facebook
  - - - Re:Why do you care? (Score:3, Funny)
        
        by glavenoid ( 636808 ) writes:
        
        Incedentally, I find some shades yellow to sound rather coarse, with harmonic triangles increasing with each beat inverse to the fundamental. Not as vulgar a sound as say, Taupe, but not as auarally pleasing to me as drips of purple. Now, Brown, on the other hand, has a rather discordant sound, much like a Dom. 7th, even if the actual interval is not such...
- Re:Why do you care? (Score:5, Interesting)
  
  by thegrassyknowl ( 762218 ) writes: on Wednesday October 12, 2005 @08:25PM (#13778087)
  
  then don't wank around optimizing
  Dude, best use of the word wank. Ever!
  for single cycles on a machine that probably bleeds off a million cycles every time you raise a window
  Computers have become more powerful and programmers have become more lazy. It's not strictly true because instead of focussing a lot of time writing efficient code programmers are now focussing a lot of time writing a lot of code to fill bigger machines. That million cycles is wasted doing crap and probably half of doesn't need to be done anyway...
  I can still remember the days when machines ran in the sub 10MHz range (yes, 10MHz is 400x slower than today's 4GHz). Software was generally responsive, functional and minimal. Adding a zillion features and eye candy was not considered necessary. Programs were easy to use and intuitive, and did I mention functional and minimal? In the days where "nobody will ever need more than 640k" software was designed and optimised to be small and chew up few cycles.
  Now, with RAM and Gigahertz available for next to nix software has just bloated out. It's nice to see a programmer thinking about efficiency/size even if it is purely academic. We should be encouraging that; I know I'd like my applications to work faster and carry less crap than they do currently.
  
  Parent Share
  twitter facebook
  - Re:Why do you care? (Score:2)
    
    by AuMatar ( 183847 ) writes:
    
    You are officially my hero now.
    
    I'd like to point out another related issue. Inefficiency adherents such as Java lovers always talk about how hardware improvements make the inefficiency go away. Well not if you waste it on bloatware. And at any rate, it doesn't improve the hardware I own. Efficiency will cease to matter as soon as they start buying my hardware and electricity.
    - Re:Why do you care? (Score:2)
      
      by thegrassyknowl ( 762218 ) writes:
      
      I'd like to point out another related issue. Inefficiency adherents such as Java lovers always talk about how hardware improvements make the inefficiency go away.
      Not a dig at the OP rather than an observation:
      Faster hardware will not make the inefficiency go away - it's still the same amount inefficient. If something wastes 30% of its time wanking on, it doesn't matter how fast you make it go, it's still wasting 30% of its time wanking on.
      Faster hardware just makes minor inefficiencies less noticibl
      - Re:Why do you care? (Score:3, Insightful)
        
        by psavo ( 162634 ) writes:
        
        Faster hardware just makes minor inefficiencies less noticible, so programmers add more minor inefficiencies and apply the same "but faster hardware will make it ok" attitude instead of fixing the real problem!
        None of the discussion I've seen so far has touched the real problem, not the wanking on the micro-optimisation (these compiler optimizations are all O(n) in gain) but that so few have any sort of a clue on what algorithms to use and when.
        I'm not saying that it's bad to optimize those sort of thing
    - - Re:Why do you care? (Score:3)
        
        by elfkicker ( 162256 ) writes:
        
        If only the theories were right. [azcentral.com] You're right and all, but please stop speaking like that.
  - Re:Why do you care? (Score:5, Insightful)
    
    by NonSequor ( 230139 ) writes: on Thursday October 13, 2005 @03:10AM (#13779879) Journal
    
    You don't understand the problem. A chemical reaction is only as fast as its slowest step. Catalyzing the other steps will not yield an improvement in reaction rate.
    
    For computer programs, you won't gain anything worthwhile by optimizing code that the computer only spends 1% of its time executing. That's not to say you should do a sloppy job, but you should focus on what matters. Microoptimization techniques (those techniques that involve choices of instructions and their orders rather than changing the algorithm that is used) typically yield very small gains. Microoptimization can yield substantial benefits when used properly in heavily used sections of code, but the time involved in trying to microoptimize everything could be better used to work on macrooptimizations or organizing the code to make it more amenable to later modification.
    
    There's no sense in trying to make your program 0.3% faster when you could be finding a way to make it 20% faster instead.
    
    Parent Share
    twitter facebook
  - - Re:Why do you care? (Score:4, Interesting)
      
      by thegrassyknowl ( 762218 ) writes: on Thursday October 13, 2005 @01:22AM (#13779510)
      
      You forgot to mention that keeping an 80x25 column text display updated only required moving around a maximum of 4000 bytes at a time (80x25 = 2000 bytes for the monochrome text buffer + another 80x25 colour buffer for 16 background and 16 foreground colours per text block), and that a sub 10MHz CPU would certainly struggle to animate that at 30 frames per second or higher.
      I grew up in the days if Amiga - and they certainly didn't have an 80x25 text console... so your analogy is fundamentally flawed. All of my favourite Amiga software was lightweight, efficient and responsive (except for the rey tracing engines I used, but that did _actually_ have to do some serious calculation). In fact, my Amiga ticked along on a 800x600 screen quite nicely. Your analogy is also flawed becase not all of the screen is updated at any one time; only the parts that have changed. Oh, and 4000 bytes x 30 FPS is 120kb;
      My Commodore 64 has a 700kHz processor in it and it can certainly animate the full screen at 25 frames per second albeit at a slightly reduced resolution. My Atari 2600 has a CPU of about the same speed and it was capable of keeping up with 25 frames per second (again, lower resolution) and still running the game engines just fine. Your argument doesn't hold water pal!
      My 1920x1200 colour display requires 55296000 bytes (1920x1200*24), which is 13,824 times as much as that 80x25 text display. Now despite my 2GHz CPU only being 200 times faster than the hypothetical 10Mhz CPU, it doesn't struggle at all - not only can it animate the whole screen at 60FPS or more, but it can also calculate positions for thousands of objects at the same time.
      What exactly do you do on your 1900x1200 colour display? First, 1900x1200x24 bits is actually 1900x1200x3 bytes (6,840,000 bytes). That is a far cry from your piss-ant 55,296,000 bytes (55MB).
      Given your eagerness to quote numbers that are practically meaningless to the point and blatantly inaccurate, and your "calculation of positions of thousands of objects" I suggest you are playing games on Windows.
      Again, your numbers are flawed, because my old 80x25 text mode display is still drawn from individual pixels. Those pixels still have to be updated individually by the CPU (in the 10MHz days it was often the main CPU that drew every little dot). 80x25 is drawn on a 640x480 in 16 glorious colours. Now, by your argument, 640x480x2 (your 2 image argument from above) = 614,400 bytes. Therefore, your 1900x1200 screen only requires 12x as many bytes to move about but I really dont' care about those numbers because most of the work is done in the GPU now; not the main CPU.
      There are certainly some areas in which the software doesn't seem to have sped up in proportion to the processor speed - but my guess is thats mostly because they ceased being CPU bound years ago, not because the CPU is flat out wasting cycles trying to do the job. That doesn't mean it's not the programmer's fault - but if it is, it's because they decided to block the interface while they waited for a DNS query (or similar bad design decision), not because they used a pointer offset instead of an array dereference, or vice versa.
      Software bloat is because programmers can get lazy when the CPU gets fast (the old "oh there'll be better CPUs by the time we release it" excuse). Looking at the power consumption figures for a modern Pentium4 CPU and figuring out that it wastes billions and billions of cycles a day doing work that it should never have needed to do is scary. If you average it out over all the PCs running in the world the amount of energy turned into heat because of sloppy programming alone would be enormous.
      What about all the wasted hours waiting for the computer to do something because of some sloppy programmer being willing to waste a few million CPU cycles here and there? It doesn't seem like much at the micro level, but think about it across all the processes that run on your machine of a day and the few
      Read the rest of this comment...
      
      Parent Share
      twitter facebook
      - 80x25 (Score:3, Interesting)
        
        by dolmen.fr ( 583400 ) writes:
        
        Again, your numbers are flawed, because my old 80x25 text mode display is still drawn from individual pixels. Those pixels still have to be updated individually by the CPU (in the 10MHz days it was often the main CPU that drew every little dot). 80x25 is drawn on a 640x480 in 16 glorious colours. Now, by your argument, 640x480x2 (your 2 image argument from above) = 614,400 bytes. Therefore, your 1900x1200 screen only requires 12x as many bytes to move about but I really dont' care about those numbers becau
- Re:Why do you care? (Score:2)
  
  by Reality Master 101 ( 179095 ) writes:
  
  *sigh* And some people wonder why in the age of 3GHz processors, they're still only as responsive as 33Mhz processors.
- Re:Why do you care? (Score:5, Insightful)
  
  by JanneM ( 7445 ) writes: on Wednesday October 12, 2005 @10:37PM (#13778761) Homepage
  
  For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?
  
  When the programming task is something like real-time image processing (computer vision), this kind of thing can make a serious difference. If 90% of your time is spent running these kinds of loops over and over again, an improvement in time will make a real difference on what combination of methods you have time for; or how exhaustively you can search for features during one frame; or what resolution image you are able to work on.
  
  If your code does something nice and graphical where 99% of the time is spent waiting for the user, sure, you're absolutely right. And if your system is doing something inherently bounded - it works until it's done, then it stops and waits until it's time again - then all you need is to make it fast enough and no faster. But there are real-world systems that today, and for the foreseeable future, are bounded by the available processing power and that can always benefit from any improvement in execution time.
  
  Parent Share
  twitter facebook
  - Re:Why do you care? (Score:2)
    
    by Scarblac ( 122480 ) writes:
    
    For any real programming task, the question has to be: why do you care baout that? Is it, specifically, a bottleneck in your code as detected with profiling tools?
    
    When the programming task is something like real-time image processing (computer vision), this kind of thing can make a serious difference. If 90% of your time is spent running these kinds of loops over and over again, an improvement in time will make a real difference on what combination of methods you have time for;
    Hence his question. In
    - - Re: Why do you care? (Score:2)
        
        by Black Parrot ( 19622 ) writes:
        
        > But as other people have pointed out, the answer is not trivially one or the other. It depends on the architecture you run and the compiler you use. And if you don't actually have access to the architectures and compilers where your code may be running, it makes sense to ask if there is a general rule of thumb to follow if you don't know.
        
        Here are the big-picture rules of thumb:
        
        don't use an exponential time algorithm when you could use a polynomial time algorithm (etc)
        make the common case fast
        It only p
- Photoshop plug-ins (Score:4, Informative)
  
  by mwvdlee ( 775178 ) writes: on Thursday October 13, 2005 @10:15AM (#13781406) Homepage
  
  I've written quite a lot of code for Photoshop plug-ins.
  
  Since this type of code typically iterates over a few hunderd million pixels you'd think that changing such details as array vs. pointer or some other common optimalization technique would have an impact.
  
  It does; it typically shaves about a few tenths of a second off of a 5 minute calculation.
  
  Then again, spending that same amount of time altering the algorithm will usually increase performance in a noticable way.
  
  Nowadays I don't bother optimizing code (usually the compiler does a better job at it anyway) but rather optimize the algorithms. Instead of opening the topic and waiting for a definitive answer on your quest for ultimate performance, you could probably have rewritten the algorithm and gained much more performance you'd ever get this way.
  
  Parent Share
  twitter facebook
Hundreds of possibilities (Score:2, Insightful)

by topham ( 32406 ) writes:

This question sounds to me like discussions I had ages ago with other programmers, and it was always 1 programmer trying to justify his method of coding a routine over some other, equally valid method.

Pointer arithmetic and array's in C really have the same issues. You can access beyond the array, and you can direct a pointer beyond the correct memory space. If you want to discuss good programming practices, C isn't it.

I never really saw the point in arguing over array and pointers in C. When I programmed i
- But not everything is a "modern PC". (Score:2)
  
  by Roadkills-R-Us ( 122219 ) writes:
  
  There are still tons of smaller, slower, "antiquated" processors embedded in all sorts of things, for which code gets written. The Z80 is still alive and well, believe it or not, along with lots of other, old standards (and newer, but still small and slow, CPUS).
  - Re:But not everything is a "modern PC". (Score:2)
    
    by topham ( 32406 ) writes:
    
    very true.
    
    but reading the submission it really looked to me like someone who simply wanted to justify their opinion on something. he wanted a generic answer to a question which requires a specific answer.
    
    if someone here said the pointer method was always faster he would run back to his friends and declare he was right all a long.
    
    the fact is the answer requires knowledge of all the target platforms as well as all of the compilers involved.
    
    pointer method is generally faster and more efficient; but it makes fo
WOW! (Score:2, Insightful)

by Anonymous Coward writes:

Now THIS is the kind of discussion that should be going on at Slashdot!
is this really your bottleneck? (Score:2, Insightful)

by eyal ( 774028 ) writes:

When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?
One can always check the resulting assembly code, if one is so concerned about this.
Though I'm pretty sure this isn't the performance bottleneck in your code (just remember - profile, profile, profile)
The eighties called... (Score:3, Funny)

by Anonymous Coward writes: on Wednesday October 12, 2005 @07:09PM (#13777553)

...and they want their arrays, pointers and C back.

Share
twitter facebook
- Unfortunately, the 2000s answered... (Score:2)
  
  by Anonymous Brave Guy ( 457657 ) writes:
  
  ...and said they can't have them because all these new-fangled alternatives suck when you've got serious work to do.
In my experience ... (Score:2, Interesting)

by Keeper ( 56691 ) writes:

In my experience, which is somewhat dated (by about 5 years), what you state is generally true for simple loops.

However, for more complicated loops the compiler fails to opimize the loop very well. My 'experience' was in writing an alogirthm which took a raw camera sensor ccd data and ordered it into a proper bayer pattern. When moving from indexer based access to pointer based access I saw a significant increase in performance.

Even basing one pointer off of another was slower than maintaining separate po
It optimizes out (Score:5, Interesting)

by klossner ( 733867 ) writes: on Wednesday October 12, 2005 @07:12PM (#13777571)

An optimizing compiler, such as gcc -O, will rearrange the array code into the pointer code -- it doesn't require a base-index address mechanism. This is called strength reduction [wikipedia.org].
Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree.

Share
twitter facebook
- Re:It optimizes out (Score:2)
  
  by SilverspurG ( 844751 ) * writes:
  
  Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree.
  
  Kudos. :) My hat's off to you.
- Re:It optimizes out (Score:5, Funny)
  
  by LSD-OBS ( 183415 ) writes: on Wednesday October 12, 2005 @08:23PM (#13778077)
  
  Back in the day, we all learned about this because a compiler construction course was required for a comp sci degree
  
  Hah, yeah. Last week in a public toilet in London, someone drew an arrow pointing to the toilet roll, and it said "Computer degrees - please take one".
  
  Guess that about sums it up ;-)
  
  Parent Share
  twitter facebook
- Re:It optimizes out (Score:2)
  
  by Detritus ( 11846 ) writes:
  
  In the "real world", not all compilers are that intelligent.
  Yes, it does make a substantial difference in speed with some systems/compilers.
- Re:It optimizes out (Score:2, Informative)
  
  by Anonymous Coward writes:
  
  Exactly, strength reduction changes the indexing operation to straight pointer arithmetic. this is done at least when i looked a few years ago in gcc/g++, in the later phases of the compilation, so that that the compiler is rearranging the RTL to eliminate the indexing variable. you can verify strength reduction by just setting the optimisation in gcc and looking at the assembler output.
  
  The point is now a bit moot since for many loops you do not want either indexing or pointer arithmetic, you want SIMD ins
- Re:It optimizes out (Score:3, Interesting)
  
  by metamatic ( 202216 ) writes:
  
  That was my first thought too. "They're equivalent, so why is anyone even asking? Any optimizing compiler will handle it."
  
  Then I remembered that this is Slashdot, where the groupthink is that a CS degree is useless and doesn't teach anything you need to know in the real world.
I echo the above statements (Score:5, Insightful)

by 21chrisp ( 757902 ) writes: on Wednesday October 12, 2005 @07:13PM (#13777573)

These types of opimizations are virtually pointless on modern machines. The increased readability and lower likelihood of programming errors on the array option far outway any speed increase for the pointer option. Plus, as you noticed, the resulting assembly is basically the same. Most likely, both will run at virtually the same speed with modern compilers. Not only would the speed difference be unnoticeable, it would be utterly inconsequential.

IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.

Share
twitter facebook
- Re:I echo the above statements (Score:5, Interesting)
  
  by NullProg ( 70833 ) writes: on Wednesday October 12, 2005 @09:11PM (#13778357) Homepage Journal
  
  These types of opimizations are virtually pointless on modern machines.
  I call bullshit. Optimizations are important regardless of the language or CPU.
  
  My Pentium III test machine with 256Meg of Ram blew away a dual processor Intel system with 1Gig of Ram while parsing a 30Meg XML import/export file.
  
  It took over six hours on the dual processor system with the native .Net and Java XML parsers. And yes, the original programmers tried several different methods/libraries to tweak the code (Sax, Xerces, whatever). They got it down to a best of four and a half hours.
  
  My C program parsed it in a hour and half. And yes, I used pointers. Why? Because its more efficient.
  
  Time is money, especially when your trying to push down 20,000 price changes from the Mainframe to 2000+ POS units during the off hours. The solution? We put my C routine into a shared library callable via C# or Java. Bonus, the 'C' code gets it done under an hour on the dual CPU machine. And yes, I tested the inputs for overflows, security problems whatever, before we went into production. Theres a big difference between a programmer who knows a language vs a programmer who understands it.
  
  IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
  Thats your opinion and I'm glad you shared it. You do know that your C#/Java/VB/Python etc. VM calls all wind up as pointer arithmetic to the CPU? Don't you? I wouldn't want to work for you though. Your competitors will write a faster program that uses less memory and you will loose the contract/job.
  
  No flame intended,
  Enjoy.
  
  Parent Share
  twitter facebook
  - Re:I echo the above statements (Score:3, Interesting)
    
    by Arandir ( 19206 ) writes:
    
    There are optimizations and then there are optimizations. If you run this particular loop once, or twice, it won't matter. Run it ten times or even a hundred, it won't matter. In these cases it would be a pointless optimization. Profile your software before you optimize. That way you can optimize where you need it and not waste time where you don't.
    
    I remember one code review where someone told me to use prefix instead of postfix notation, for optimization reasons. Yet it occured in an initialization routine
  - Re:I echo the above statements (Score:3, Interesting)
    
    by Neil Blender ( 555885 ) writes:
    
    My Pentium III test machine with 256Meg of Ram blew away a dual processor Intel system with 1Gig of Ram while parsing a 30Meg XML import/export file.
    
    Heh, a little offtopic but - This is why I hate XML. It's so bloated. You take 1 to 6 hours parsing a 30 megabyte XML file in C? I was just tasked with parsing out some select data from a 37 gigabyte XML file (870 million lines). I tried all sorts of optimizations and parsers upon finding that it might take days to parse. My solution - 50 lines of perl usin
    - Re:I echo the above statements (Score:3, Interesting)
      
      by eric76 ( 679787 ) writes:
      
      One of the more interesting things I've seen is to see how different software developers write a program for the following problem:
      
      Jack bought a bag of 100 pieces of candy at the store. It has 90 cherry candies and 10 lemon candies. He prefers the cherry candies over the lemon candies.
      Every day Jack randomly picks a candy from the bag. If the candy is a cherry candy, he eats it. If the candy is a lemon candy, he puts it back and randomly draws again. He eats the second randomly chosen candy no matte
      - Re:I echo the above statements (Score:3, Interesting)
        
        by slamb ( 119285 ) * writes:
        
        Interesting. I'm at work now, so I shouldn't take the time to actually solve it until I get home. But it sounds like a problem you could solve by:
        
        Statistics. It seems like if you are One with the Statistics, you could do it with pencil, paper, and a calculator just powerful enough to do factorials. (Unfortunately, I am not.)
        
        Dynamic programming [wikipedia.org] or recursion with memoization. (There are overlapping subproblems.)
        Recursion. This would be slow, and I bet it's the way most people use.
- Thinking like this (Score:3, Insightful)
  
  by phorm ( 591458 ) writes:
  
  Is why we have big, ugly, bloated programs that require overpowered CPUs. First of all, it depends on what the application is being coded for. Perhaps it's actually intended for slower machines as well as faster... not everything is made to run on a P4-4Ghz machine you know.
  
  Secondly, these operations can add up. If, for example, this scenario is used throughout the code and called several times per second, on an operation perhaps requiring ready output, the output might even be visible delayed on a faster
- Re:I echo the above statements (Score:3, Informative)
  
  by Paul Jakma ( 2677 ) writes:
  
  lower likelihood of programming errors on the array option. .... IMHO there is no place for pointer arithmetic in modern software. If someone working for me wrote something like the second option, I would ask them to rewrite it.
  
  You realise that pointer arithmetic and array indices are the same thing, don't you? Ie given:
  
  int b[100];
  int i;
  
  The following are equivalent:
  
  b[i++]
  
  *(b + i++)
  
  Arrays *are* /nearly/ equivalent to pointers, indeed an array variable without a subscript degrades to a pointer. Further, note
  - Re:I echo the above statements (Score:3, Funny)
    
    by sleepingsquirrel ( 587025 ) writes:
    
    The following are equivalent:
    
    b[i++] *(b + i++)
    
    Yeah, but the fun doesn't really begin until you start writing code like...
    
    i++[b]
Different methods (Score:2)

by jd ( 1658 ) writes:

Different compilers will optimize abstractions in different ways, so you are not guaranteed to get the same code when using arrays. Pointer operations should(!) compile to essentially the same code on ANY compiler, as the code is already about as reduced as it can get.
In consequence, if you want a predictable program on ANY compiler, pointer operations are probably a better bet than arrays, regardless of how well compilers handle those arrays.
If you want something of uniform reliability, rather than uniform
- Re:Different methods (Score:2)
  
  by Xyrus ( 755017 ) writes:
  
  How the code gets treated depends on how "clear" it is to the compiler. The loop in the example is simple and any decent optimizing compiler worth it's 1's and 0's should optimize it.
  
  But if the code is unclear, let's say multiple nested loops with a good number of branches, local variables, etc. then an optimizing compiler will probably say WTF and just leave it alone.
  
  As has been said many times, profile your code. If you find the bottle-neck, fix it. If it comes down to pointer vs. array, get rid of th arr
Strength Reduction (Score:4, Informative)

by The boojum ( 70419 ) writes: on Wednesday October 12, 2005 @07:18PM (#13777607)

My argument is that one cannot assume, in a multi-architecture environment, that such optimizations will always be available. Semantically, the expression array[index] must always be expanded to *(array + index) when the index is variable. In other words, the expression cannot be reduced further, because the value of the index is unknown at run time.

Yes, semantically array[index] has to have the same effect as *(array+index). But the compiler is free to generate conceptually equivalent code in any way that it pleases. Any decent C/C++ compiler optimizer that can perform strength reduction ought to be able to see how the index changes the memory location and turn it into simple pointer incrementation accordingly. And strength reduction is a well known optimization that's been around for ages -- if memory serves, even the old Red Dragon book talks about how it works in this context. If your compiler can't handle this you need to find a better compiler.

Share
twitter facebook
Do This instead (Score:3, Interesting)

by Leimy ( 6717 ) writes: on Wednesday October 12, 2005 @07:19PM (#13777616)

void reversestring_array(char *str) {

int head, tail; char temp; if (!str) return; tail = strlen(str) - 1; for (head = 0; head < tail; ++head, --tail) {

temp = tail[str]; tail[str] = head[str]; head[str] = temp;

}

}

That'll teach ya.

Share
twitter facebook
- Re:Do This instead (Score:2)
  
  by forkazoo ( 138186 ) writes:
  
  Bah, the original requirement was to do it with as little memory allocation as possible. You still use a temp variable. Let bitwise operations be your friend.
  - Re:Do This instead (Score:2)
    
    by Hast ( 24833 ) writes:
    
    As was discussed in the original thread one point that this excercise miss is that a compiler will naturally put the temporary variables into registers. So the entire XOR thing is cute but will produce slower and less understandable code.
    
    Naturally neither version will actually use any added memory (as in RAM) but I doubt an optimising compiler will be able to perform strength reduction on the XOR operations.
    
    Don't try to out-think your compiler. The compiler is your friend.
there is a difference... (Score:2)

by simcop2387 ( 703011 ) writes:

The array function:
mov bl,byte ptr [esi+edx]
mov al,byte ptr [ecx+edx]
mov byte ptr [ecx+edx],bl
mov byte ptr [esi+edx],al

The pointer function:
mov bl,byte ptr [ecx]
mov dl,byte ptr [eax]
mov byte ptr [eax],bl
mov byte ptr [ecx],dl

if you look closely at these two portions of the assembl
- Re:there is a difference... (Score:2)
  
  by SillyNickName4me ( 760022 ) writes:
  
  Got to think of the 6502 and its indirect indexed addressing modes..
  
  Basicly, you'd store a pointer somewhere in memory page 0, and then use the x or y register as an index into whatever it pointed at (ie LDA ($a0),x would take the pointer stored in address a0 and a1, use the x register as an index, and read the contents of the resulting address into the acumulator) . Here it would be faster to manipulate the index and not the pointer.
  
  Pointers were 16 bit (64k address space) and x and y were 8 bit registers.
- Re:there is a difference... (Score:4, Informative)
  
  by Waffle Iron ( 339739 ) writes: on Wednesday October 12, 2005 @08:44PM (#13778211)
  
  what you in fact have is in the array 4 additional additions (between registers),
  Actually, most x86s have a dedicated address generation units which handle those index additions in parallel on separate logic from the main ALU. So both cases would actually run at the same speed on a modern x86.
  
  Parent Share
  twitter facebook
- - Re:there is a difference... (Score:2)
    
    by simcop2387 ( 703011 ) writes:
    
    well personally i'd suggest this
    <blockquote>*tail^=*str; *str^=*tail; *tail^=*str;</blockquote>
    for the swap, as this will be faster on many more architecures. and its more memory effecient.
There cannot be any difference. (Score:2, Informative)

by Anonymous Coward writes:

The whole point is dumb.
Every C programmer should know the answer to the following question:

What is 6["abcdefghijkl"]?

Answer: 'g'.

How is this determined?
By definition x[y]=*(x+y)=y[x].

Don't believe me, check the standard.

( Yeah this is a degenerate case, like Duff's device. Still a compiler has to support it. )
In summary... (Score:2)

by SmurfButcher Bob ( 313810 ) writes:

I've been reading some really good arguments in these posts about how it'll work. And the bulk of them end up saying, "any good compiler will/should"...

In other words, "No, you cannot."

Your best bet is to do (almost) what you did - compile a test case per platform, check the code - and then the part that you neglected, you need to count the clock ticks per instruction. To reinforce what one poster said - there can be a large difference between [ecx] and [ecx+edx], conceptually speaking. If it truly matter
- Re:In summary... (Score:3, Funny)
  
  by Anonymous Brave Guy ( 457657 ) writes:
  
  If it truly matters then you count the ticks for each platform, and decide if it is reasonable. 3 ticks versus 2... that's 33%. 2 ticks versus 1... that's 50%.
  
  Not necessarily. You're ignoring the fact that modern processors use pipelining architectures, branch prediction, extensive caching often at several levels, and a whole host of other things that mean the total time required is not the sum of the individual times for each instruction.
  One of these days, someone will invent a program that can take
  - Re:In summary... (Score:2)
    
    by SmurfButcher Bob ( 313810 ) writes:
    
    Yes, necessarily.
    
    "When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?" is the quesiton.
    
    "You're ignoring the fact that modern processors use pipelining architectures, " you propose.
    
    As I said, the bulk of the posts end up saying, "any good compiler will/should"...
    
    In other words, "No, you cannot."
    
    But even in the case with chip-level optimizations, your point is still moot; for those processors, then it doesn't make any difference for you - call them
differences not having to do with optimization (Score:2)

by bersl2 ( 689221 ) writes:

int array[8]; int *int_ptr = malloc(8 * sizeof(int)); char string[] = "Hello world!"; char *char_ptr = "Hello world!"; sizeof(array) == 8 * sizeof(int); sizeof(int_ptr) == sizeof(int *);
You can get int_ptr to point to another location in memory (such as doing ++int_ptr); the similar statement (++array) is illegal.

I think that altering individual characters of char_ptr is undefined (as it would result in changing the value of the literal string), while (as above) string cannot be pointed to a new literal string.
- - Re:differences not having to do with optimization (Score:2)
    
    by Anonymous Brave Guy ( 457657 ) writes:
    
    You can alter the chars of char_ptr all you want, it's just a pointer to some bytes (that happen to be ascii chars) on the stack.
    
    Nope, sorry, he was right and you're wrong. In the code given, char_ptr points to a string literal, which is automatically constant data and can be stored somewhere completely different to either the stack or the heap. Attempting to change that data results in undefined behaviour.
    (For completeness: in C++, you can only even point a (non-constant) char * to a string literal b
your mileage will vary, but... (Score:4, Informative)

by blackcoot ( 124938 ) writes: on Wednesday October 12, 2005 @08:19PM (#13778055)

... my experience has been that this matters more in the multidimensional array case than in the single dimensional array case (for those who are curious: my goal is to write algorithms which do non-trivial amounts of processing on VGA or larger video at full frame rates [>= 15Hz], any time i can make array operations faster my entire program benefits significantly). when dealing with two dimensional arrays, you can either do the addressing yourself (location [i,j] in a r x c matrix maps to [i*c+j] in a flat array). if you are clever about how you explain your indexing to the compiler, it should realize that you're passing through consecutive addresses in memory and generate code accordingly. if, on the other hand, you're doing something like A[i][j], the compiler has to generate two deref ops plus pay the cost of whatever cache misses result from using the two levels of indirection --- in this case, working with pointer / index arithmetic relative to the base address is a big win.

have you tried this with intel's c/c++ compiler or other compilers? i'd be curious to see if what you're seeing is a result of how gcc is limited in the number of optimizations it can apply directly to the parse tree because it can't assume (at that stage) a particular target machine.

Share
twitter facebook
- Re:your mileage will vary, but... (Score:2)
  
  by PsychicX ( 866028 ) writes:
  
  if, on the other hand, you're doing something like A[i][j], the compiler has to generate two deref ops plus pay the cost of whatever cache misses result from using the two levels of indirection --- in this case, working with pointer / index arithmetic relative to the base address is a big win.
  
  Not true. Not true at all. Multi-D arrays are merely syntactic sugar for single dimension. It doesn't involve two pointer dereferences unless it's a jagged array, aka an array of pointers. A built in C multi-D array
  - Re:your mileage will vary, but... (Score:2)
    
    by blackcoot ( 124938 ) writes:
    
    it depends on how you allocate your arrays. if everything is statically allocated with sizes known at compile time, you are correct. the dynamic case, however, is a very different story.
- Re:your mileage will vary, but... (Score:2)
  
  by Mr2001 ( 90979 ) writes:
  
  if, on the other hand, you're doing something like A[i][j], the compiler has to generate two deref ops plus pay the cost of whatever cache misses result from using the two levels of indirection --- in this case, working with pointer / index arithmetic relative to the base address is a big win.
  
  That's true for jagged arrays, but IIRC, statically allocated multidimensional arrays in C are not jagged. A[i][j] is equivalent to A[i*c+j] (or maybe A[i+j*c], whatever).
Use Java instead .... (Score:2, Funny)

by icepick72 ( 834363 ) writes:

... and then you will no longer have to worry that it might be slow.
GCC experimental results (Score:5, Interesting)

by RML ( 135014 ) writes: on Wednesday October 12, 2005 @08:25PM (#13778091)

Just for fun, I tried the sample code on gcc (GCC) 4.1.0 20050723 (experimental), with -O3 -march=pentium-m. The loop from the array version: L13: movzbl -1(%ebx), %edx movl %esi, %ecx decl %edi movl 8(%ebp), %eax movb %dl, -13(%ebp) movzbl -1(%esi,%eax), %edx movb %dl, -1(%ebx) decl %ebx movzbl -13(%ebp), %edx movb %dl, -1(%esi,%eax) incl %esi cmpl %ecx, %edi jg L13 The loop from the pointer version: L5: movzbl 1(%esi), %edx movl %esi, %ecx movzbl (%ebx), %eax movb %al, 1(%esi) decl %esi movb %dl, (%ebx) incl %ebx cmpl %ecx, %ebx jb L5 Time to execute the array version 100,000 times on a 10,000 character string: 0m4.515s Time to execute the pointer version 100,000 times on a 10,000 character string: 0m3.936s So the pointer version actually generates somewhat faster code with the compiler I used on this example, which surprises me. But there's no substitute for actually testing.

Share
twitter facebook
- Re:GCC experimental results (Score:3, Interesting)
  
  by Anonymous Brave Guy ( 457657 ) writes:
  
  GCC is designed to be portable, not fast, so the code is generates is often pretty bad compared to specialised, platform-specific compilers. Obviously your test is relevant if GCC is the compiler you'll be using, but for serious performance work it's pretty much irrelevant what GCC generates because no-one uses it when native alternatives are available anyway. In fact, your example code here is a great demonstration of this!
  - Re:GCC experimental results (Score:2, Insightful)
    
    by UOZaphod ( 31190 ) writes:
    
    What is used to compile the Linux Kernel?
    - Re:GCC experimental results (Score:2)
      
      by yarbo ( 626329 ) writes:
      
      gcc, icc, or tcc. The majority of people probably use gcc, but tcc and icc are definitely possibilities.
  - Re:GCC experimental results (Score:2)
    
    by dtfinch ( 661405 ) * writes:
    
    No one who wants performance uses gcc?
  - Re:GCC experimental results (Score:2, Interesting)
    
    by tajribah ( 523654 ) writes:
    
    Actually, that's not true in many cases -- GCC 3.x generates very good code, but the 4.x versions still haven't caught up with the 3.x line.
  - Re:GCC experimental results (Score:3, Insightful)
    
    by Deorus ( 811828 ) writes:
    
    >> no-one uses it when native alternatives are available anyway
    
    Aren't hundreds of Linux distributions out there enough to prove that assumtion wrong? Gcc is used regardless of its speed because it's free.
- Re:GCC experimental results (Score:2)
  
  by Nelson ( 1275 ) writes:
  
  Use an actual release of the compiler and report this regression to the gcc mailing list. It's an error.
- Re:GCC experimental results (Score:3, Informative)
  
  by Piquan ( 49943 ) writes:
  
  Unfortunately, the article didn't say what compiler he was using. But since we're giving data points:
  
  gcc 3.4.2 3.4.2 [FreeBSD] 20040728, x86, -O3 -march=pentium-m. Generated essentially the same code as the article's.
  Array version:
  .L12: movzbl (%esi,%ecx), %edx movzbl (%esi,%ebx), %eax movb %al, (%esi,%ecx) decl %ecx movb %dl, (%esi,%ebx) incl %ebx .L10: cmpl %ecx, %ebx jl .L12
  
  Pointer code:
  .L23: movzbl (%ebx), %edx movzbl (%ecx), %eax movb %al, (%ebx) decl
And C++/STL? (Score:5, Informative)

by XenonOfArcticus ( 53312 ) writes: on Wednesday October 12, 2005 @08:27PM (#13778103) Homepage

And what of STL container classes under C++?

Seriously though, there is no generalized answer. Good compilers will do what you want. Bad compilers (and there are more than you realize out there) will make lousy code.

If your target involves an environment where you might be using a more primitive compiler, or you can't predict the environment and compiler, it might be an issue. This is why code like the PNG and JPEG libs go for tight/cryptic. As well, the performance of the runtime platform (CPU, memory, bus) have bearing. In some cases, though one piece of assembly might look less efficient than the other, the sheer brute force of CPU parallelism, out of order execution and other black juju might render it meaningless.

Finally, you have to consider the cost/benefit of the situtation. Making cryptic fast code is worthwhile if you're writing some wicked FFT code or image processor main loop, where it'll run a few quadrillion times. Other places in the same codebase though, it's probably worthwhile to trade absolute performance for a bit better code readability and maintainability.

Remember, profile before optimizing. Only optimize things that will really make a significant performance difference. Rewriting the UI display loop to use pointers instead of lists is probably pointless. Heh. Pointless.

I'm a big fan of C++ STL containers now. If I _know_ a block of code is going to be a critical bottleneck, I'll use something else from the start. I've known people who coded UIs in assembly, for no good reason, and others who wrote image processing code in interpreted RAD scripting environments. I've written and optimized code (C++/C and assembly) on systems all the way back to the 6502 (yay! two 8-bit index registers!) and it's not hard, as long as you proceed sensibly and logically.

That being said, the Microsoft VC++6 compiler (still in common use today) has a terrible code generator. It fails to perform simple loop invariant hoisting operations that my old SAS/C++ compiler (Amiga, yah, I'm one of _them_...) did in 1990. VC++7/2003 and Whidbey/2005 are showing signs of being MUCH more caught-up, and the Intel and Codeplay compilers (despite Intel's AMD-phobia) are much better too.

When performance really counts, a whole new set of tools and processes come into play.

Share
twitter facebook
- Re:And C++/STL? (Score:2, Funny)
  
  by Arandir ( 19206 ) writes:
  
  I'm currently writing a piece of software to compete with a commercial proprietary offering. I'm using "bloated" STL and "bloated" C++ to manipulate "bloated" XML. Frankly, I'm shocked at its poor performance, and I might have to do some optimizations. It takes about a half of a second to load and process 200k worth of XML.
  
  ON THE OTHER HAND, the commercial proprietary alternative I'm competing against loads the equivalent data. But that data consumes 25Megs, and takes five seconds to load! You would think i
- Re:And C++/STL? (Score:3, Informative)
  
  by WARM3CH ( 662028 ) writes:
  
  OK I tried it with Visual Studio 2003 and also tried STL. Of course with STL you only need one line of code to reverse a character string:
  
  reverse(str, str+strlen(str));
  
  Now, the interesting part is what the optimizing compiler outputs for each of the three vairants (I only include the inner loop):
  With pointers we have: $L13583: mov bl, BYTE PTR [ecx] mov dl, BYTE PTR [eax] mov BYTE PTR [eax], bl mov BYTE PTR [ecx], dl inc ecx dec eax cmp ecx, eax jb SHORT $L13583 With array we have: $L1252
use array (Score:2)

by dave1g ( 680091 ) writes:

If you use arrays the compiler these days will ocnvert to pointer arithmetic. However it also now knows that it is an array and can make assumptions not available to it before allowing more code order optimizations in loops.

These days if its something stupid like using one syntax over another, that optimization has been built into the compiler. The only real speed enhancements are going to be algorithmic. You know the old thigns that make sense, take stuff out of loops that dont need to be inside, compiler
TFA is a Troll (Score:2)

by mugnyte ( 203225 ) writes:

If you answer this question, you fuel a discussion that should be constrained to a very very small domain of C development. Otherwise, you're going to see needless flag waving about coding standards in domains where it doesn't need to exist. Don't feed the dirty C programmer (we like other info, really, we do).

and frankly, the TFA depends on architecture, where most modern compilers make both snippets equal.
As Donald Knuth said... (Score:4, Insightful)

by jschmerge ( 228731 ) writes: on Wednesday October 12, 2005 @09:28PM (#13778441)

Premature optimization is the root of all evil.

I personally let the compiler writers worry about this type of thing. I'd rather have my code be more readable than fast. That being said, there are many times that I switch back and forth between pointer arithmetic and array indexing within the same program. I'll primarily use the pointer arithmetic for things like simple string processing where it just leads to more compact code, and I'll use the array indexing when I have an actual bonafide array...

In any event, my point is that you should be programming in a way that is maintainable and readable; you shouldn't be worried about shaving tens of clock cycle off of such simple loops. In more complex loops, you probably will not be able to shave nearly as much time off, because your indexes won't always be sitting in a register and the data that the index points to will most likely not even fit into a register. In this case, I don't think anyone will argue with the assertion that this:

(ptr + index)->dataMember

is less readable than this:

ptr[index].dataMember

Share
twitter facebook
- Re:As Donald Knuth said... (Score:2)
  
  by illuminatedwax ( 537131 ) writes:
  
  I personally let the compiler writers worry about this type of thing.
  
  This is Slashdot; don't you think some of us are compiler writers?
  - - Re:As Donald Knuth said... (Score:2)
      
      by illuminatedwax ( 537131 ) writes:
      
      And goddammit, we're gonna make that as CPU efficient as we possbily can!
Pointers are for programmers silly rabbit!! (Score:2, Insightful)

by Da_Weasel ( 458921 ) writes:

Pointers are for programmers, not for speed. A particular compilers implementation of the C specification may produce better or more efficient code when using pointers, but i seriously doubt that every implementation would work the same or even across different architectures.
You shouldn't make any assumptions ... (Score:3, Insightful)

by bigsteve@dstc ( 140392 ) writes: on Thursday October 13, 2005 @12:08AM (#13779181)
I was informed that said optimizations have made it so that indexing an array with the [] operator is just as fast as using an incremented pointer. When the goal is maximum performance across multiple CPU architectures, can one always assume that this is true?
Honestly, there is no real justification for making any assumptions. It depends on how good a particular C compiler is at generating code for a particular ISA. Indeed, for some ISA's I'd expect a niave C compiler to generate FASTER code for indexing an array than for incrementing a pointer. (I'm thinking of word addressed ISAs, where a 'char*' has to be represented as a word pointer and a character offset.)
In the big picture, it probably doesn't make a great deal of difference to performance which style you use. It might make 5-10% difference in a tight loop, but probably much less across a large application. If the difference performance is significant, you will get more benefit for your effort by:
- finding better C compilers, or
- using a profiler to find the real hotspots in the code and hand-optimizing them for the hardware platforms that you really care about.
As other people have said, other issues than this are likely to have a greater bearing on the overall performance of a typical application; e.g. data structures, algorithms, database design, etc.
Share
twitter facebook
Vectorisation (Score:3, Insightful)

by Heretik ( 93983 ) writes: on Thursday October 13, 2005 @12:13AM (#13779224)

Maybe not relevant in this case since you're working with strings, but with vectorization being so important to performance on most modern architechtures, if you were dealing with floats the pointer one might actually be slower because it's much harder for the compiler to figure out if (and how) it can vectorize it.

I'm not sure about various compilers and what they do in this case, but following the progress of GCC4's vectorisation, it looks much more likely that the pointer case is passed over by the vectorizer and ends up being (way) slower than the easy to vectorise array index version.

Like I said, not sure what the actual situation in practise is, but it's worth looking into. The difference between vectorized SSE code and plain old x86 code (for example) is WAY greater than the trivial insignificant difference between the two examples you posted.

Share
twitter facebook
ASSume (Score:3, Insightful)

by larry bagina ( 561269 ) writes: on Thursday October 13, 2005 @12:16AM (#13779236) Journal

My argument is that one cannot assume
You're right, however, you're also assuming that your pointer arithmetic is faster.
Consider a 16-bit architecture with 32-bit pointers.
Using pointer arithmetic (32-bit) is slower than using an index register (16-bit) as the array index.
So stop assuming and stick with what you're comfortable with. If you prefer pointers, fine. if you prefer arrays, fine. But if you're so concerned with the speed, you'd be doing it in assembly.

Share
twitter facebook
From the Compiler Trenches (Score:5, Informative)

by Anonymous Coward writes: on Thursday October 13, 2005 @01:38AM (#13779569)
I develop highly optimizing compilers for a living, so use that to put in context what I'm going to say. Things look very different from the other side of the source file.
I'll admit that I'm always slightly bemused by these sorts of discussions. That bemusement quickly turns to disiniterest after I realize that a lot of people are burning a lot of cycles arguing about it.
Quite frankly, your example has little relevance in the real world. The optimization you are talking about is covered by strength reduction, as others have pointed out. But that's not the point of my message. This sort of piddly optimization means almost nothing when one looks at the whole application. If this piece of code is in an inner loop that takes 90% of the application time and it proves to be the bottleneck, then one can think about taking a closer look.
We have customers that come to us all the time with just such examples. They literally tell us, "You missed an opportunity to use a register here," and we know it's important because we know the customers are serious about profiling code and finding bottlenecks. So when they come to us we are happy to look at the "piddly stuff."
I've seen all kinds of different code. They basically fall into two categories: code that the compiler can do something significant with and code the compiler has no hope of automatically optimizing in any truly meaningful way. When I say "significant" and "meaningful," I explicitly mean "not Dragon Book stuff, except for scheduling (which can provide a significant performance win)" Simple optimizations like common subexpression elimination and copy propagation are more useful at enabling other optimizations than in any cycles gained in their own right. They are important, but not to make the code run significantly faster in and of themselves.
If one writes an application that does a lot of branching and pointer chasing (say, like, a traditional compiler*) then there's not much an optimizer can do with it. The aliasing difficulties alone will kill most optimizations. It's more important to write these kinds of applications in an understandable way because that is where the programmer time is most costly.
That said, judicious use of directives for compilers that support them can go a long way toward making these kinds of codes run really fast. Think of threading a tree search, for example. But the compiler is not going to have much hope of converting such a low-level piece of code without help.
An example of code that a compiler can do really, really well on are the traditional scientific applications. Here parallelism is everything, be it data-level, thread-level or at the distributed memory level (for really big machines). In these cases the loop optimizations are orders of magnitude more important speed-wise than sequential optimization (though, as I said, sequential optimization can enable some of these loop transformations). Some of the more important loop restructuring transformations are:
- Interchange
- Unroll
- Fusion
- Fission
- Unroll-and-Jam
- Invariant hoisting (both data and control)
- Vectorization
- Cache Blocking
When the compiler is done with these, one hardly recognizes the code anymore. :)
In my experience, the fundamental problem is that compilers are really hard to understand. People argue about what a compiler can and cannot do because they are enormously complex systems that require arcane knowledge of language standards and hardware architecture to really dig into. It's slightly less difficult to understand the broad strokes, for example, simple cases of vectorizable loops. It's a lot more difficult to understand how to parallelize a loop that compresses a sparse array into a dense one.
If there's one lesson that I like to convey to programmers, it's to not sweat the optimization details. Don't hand-optimize the code. I can tell you fro
Read the rest of this comment...
Share
twitter facebook
Benchmarking (Score:2, Insightful)

by Threni ( 635302 ) writes:

If you're talking about optimisation then the only thing to do is to check different strategies. You can't just assume type a is a better coding method than type b. Even if you could prove it were true for every single compiler on the market, one could come along tomorrow and do things differently.
Pointer aliasing (Score:3, Interesting)

by igomaniac ( 409731 ) writes: on Thursday October 13, 2005 @05:27AM (#13780174)

With modern compilers you should always use arrays. The reason is that for all but the most pathological cases of indexing, the compiler will do strength reduction and induction to turn the p[i*4] into *p,p+=4. The problem with using pointers is that if you have more than one of them, the compiler has no easy way of knowing if they point to the same place (this is known as pointer aliasing) so a bunch of useful optimizations have to be turned off. If you are doing a[i] = b[i] the compiler can much more easily find out that a and b are distinct memory locations than if you are doing *p = *q.

If you want to learn more about these kind of source level optimisations, look the AMD Athlon(TM) Processor x86 Code Optimization Guide is a good reference -- it includes a section on why you want to use array accesses instead of pointers, and also has a lot more up-to-date and useful advice on what compilers do to your code. It is available from AMD's website.

Share
twitter facebook
Where have all the geeks gone? (Score:4, Interesting)

by microTodd ( 240390 ) writes: on Thursday October 13, 2005 @08:11AM (#13780545) Homepage Journal

Wow. Just wow.

No one will probably read this comment because its been a day since the OP, but I'm amazed at the quantity of people who are slamming this guy for wanting to research something that's admittedly interesting.

For starters, if the submitter is a CompSci student then he definately gets my kudos. Too many CS students are just focused on "I wanna learn C# so I can go make money!" as opposed to actually LEARNING.

Secondly, what happened to just plain geekiness of research and studying things because its fun and interesting? Does everything we do have to have some specific applicable purpose? If you say yes, you are thinking like the MBAs that always get bashed around here instead of a real nerd.

Who knows? Its unlikely, but possible that thinking about this problem somehow leads to a train of thought that solves P=NP or something.

Share
twitter facebook
- Re:Hmm.. (Score:3, Informative)
  
  by Surye ( 580125 ) writes:
  
  Oops, messed up the tag at the top, and it ate the quote portion:
  
  I'd certainly be interested in hearing more discussion on the matter, accompanied by examples and references
  
  Preview! Gah.
  - Re:Hmm.. (Score:2)
    
    by Strike ( 220532 ) writes:
    
    Preview! Gah.
    
    You must be new here.
  - Re:Hmm.. (Score:2)
    
    by Alomex ( 148003 ) writes:
    
    Of course, because a feature such as "edit" is only for wimps...
    
    The unix philosophy lives on: if the system is cumbersome and user unfriendly, it's the user's fault!
    - Re:Hmm.. (Score:2)
      
      by cortana ( 588495 ) writes:
      
      I think that people making idiots out of themsevles is much more preferable to trolls editing their +5 moderated posts into Goatse ascii art.
    - Re:Hmm.. (Score:2)
      
      by Hast ( 24833 ) writes:
      
      There is no edit function on Slashdot because down that road lies edit wars and other problems. (Specifically there are issues with editing + moderation as the mod should reflect the original article and not the edit.)
      
      Sure it would be nice with a "allow edit withing 3 minutes" function which can be found on WWW boards. OTOH Slashdot generates several times the traffic as other sites with webboards.
      - Re:Hmm.. (Score:2)
        
        by Alomex ( 148003 ) writes:
        
        or how about an "editing undoes the moderation" feature, just as posting to a thread one has moderated undoes any moderation one might have done therein....
- Xnews? Bah! (Score:2)
  
  by jd ( 1658 ) writes:
  
  There is nothing to beat xvnews - particularly the icon that shows the economy collapsing whenever there is anything new to read.
- Re:Here's the answer (Score:2)
  
  by cant_get_a_good_nick ( 172131 ) writes:
  
  forgive my inference, can anyone tell me the reference for this?
- Re:Professor Cormen said... (Score:4, Informative)
  
  by RAMMS+EIN ( 578166 ) writes: on Wednesday October 12, 2005 @07:41PM (#13777779) Homepage Journal
  
  You're not completely right.
  
  First:
  
  ``int i[20] followed by int *k = i, then i[4] is the same as *(k + (4 * 4))''
  
  You're trying to get the 5th element of the array by using an offset of 4 times 4, assuming sizeof(int) == 4. First off, don't make that assumption; always write sizeof(int) when that's what you mean. Secondly, the C compiler automatically multiplies your offset by the size of the elements, so you would have to write *(k + 4) instead.
  
  Secondly, you're not getting the point (probably, you were misled by the headline, as I was). The question is not whether variables holding arrays are really holding pointers to the arrays (they are), but whether, say, iterating through an array by updating a pointer is faster than iterating by using an index variable as an offset. In other words, it's not whether a[0] is the same as *a, but whether while(*a) a++; is faster than while(a[i]) i++;.
  
  Parent Share
  twitter facebook
  - - Re:Professor Cormen said... (Score:2)
      
      by RAMMS+EIN ( 578166 ) writes:
      
      ``As for your second point, I don't see how that's at all different from the first one.''
      
      Ok, I'll elaborate.
      
      When you say while(a[i]) i++;, you do two adds each cycle; one to increment i, and one to add i to the address of a. When you say while(a) a++;, you do only one add per cycle. The OP's question is whether optimizing compilers get rid of one of the adds, or not.
      
      This is different from asking which of a[4] and *(a + 4) is faster. In that case, the loops would have looked like while(a[i]) i+=; and while(*
- Spelling and grammar and punctuation, oh my! (Score:5, Funny)
  
  by Roadkills-R-Us ( 122219 ) writes: on Wednesday October 12, 2005 @07:42PM (#13777789) Homepage
  
  I didn't see any errors in punctuation or grammar, either. I don't recall the last time I saw a post of that length which didn't confuse plural's (sic) with possessives.
  
  Parent Share
  twitter facebook
- Re:Effective cache use will be a better optimisati (Score:2)
  
  by TheLink ( 130905 ) writes:
  
  Yeah that's what I figured, use a good algorithm (e.g. N vs N^2 ) and as long as the processing loop fits in cache you'll be fine.
  
  And if you can't find/figure out a good algorithm yet, don't do really strange stuff. That way someone later can more easily figure out how to fix your code.
- Re:Effective cache use will be a better optimisati (Score:2, Funny)
  
  by functor0 ( 89014 ) writes:
  
  I have good story for this one. In my second co-op term, I was asked to improve the speed of the general 2D convolution image filter for a well-known commercial paint package. Nothing complicated, just run the given 2D kernel through the image.
  
  The previous writer of the function had carefully removed all the array accesses, only using pointers that were incremented by fixed constants as it proceeded through the code. He had also carefully maintained an array of the sums of the results as we moved from pixel
- Re:Register indexing. (Score:2)
  
  by clem.dickey ( 102292 ) writes:
  
  > I can't think of any modern architectures that don't support register indexing
  
  I don't see any register indexing in IA64. No immediate offset, either. If you want to access an address, you'd better have that address - exactly - in a register. At least you get enough registers. :-).
  
  Incidentally, the IBM S/360 was an early machine (perhaps the first) with base+index+offset addressing. I've heard that the three-way adder required was a performance hit. I know that some models had instructions which were fa

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Why do you care? (Score:4, Insightful)

Re:Why do you care? (Score:2)

Re:Why do you care? (Score:5, Funny)

Re:Why do you care? (Score:3, Funny)

Re:Why do you care? (Score:5, Interesting)

Re:Why do you care? (Score:2)

Re:Why do you care? (Score:2)

Re:Why do you care? (Score:3, Insightful)

Re:Why do you care? (Score:3)

Re:Why do you care? (Score:5, Insightful)

Re:Why do you care? (Score:4, Interesting)

80x25 (Score:3, Interesting)

Re:Why do you care? (Score:2)

Re:Why do you care? (Score:5, Insightful)

Re:Why do you care? (Score:2)

Re: Why do you care? (Score:2)

Photoshop plug-ins (Score:4, Informative)

Hundreds of possibilities (Score:2, Insightful)

But not everything is a "modern PC". (Score:2)

Re:But not everything is a "modern PC". (Score:2)

WOW! (Score:2, Insightful)

is this really your bottleneck? (Score:2, Insightful)

The eighties called... (Score:3, Funny)

Unfortunately, the 2000s answered... (Score:2)

In my experience ... (Score:2, Interesting)

It optimizes out (Score:5, Interesting)

Re:It optimizes out (Score:2)

Re:It optimizes out (Score:5, Funny)

Re:It optimizes out (Score:2)

Re:It optimizes out (Score:2, Informative)

Re:It optimizes out (Score:3, Interesting)

I echo the above statements (Score:5, Insightful)

Re:I echo the above statements (Score:5, Interesting)

Re:I echo the above statements (Score:3, Interesting)

Re:I echo the above statements (Score:3, Interesting)

Re:I echo the above statements (Score:3, Interesting)

Re:I echo the above statements (Score:3, Interesting)

Thinking like this (Score:3, Insightful)

Re:I echo the above statements (Score:3, Informative)

Re:I echo the above statements (Score:3, Funny)

Different methods (Score:2)

Re:Different methods (Score:2)

Strength Reduction (Score:4, Informative)

Do This instead (Score:3, Interesting)

Re:Do This instead (Score:2)

Re:Do This instead (Score:2)

there is a difference... (Score:2)

Re:there is a difference... (Score:2)

Re:there is a difference... (Score:4, Informative)

Re:there is a difference... (Score:2)

There cannot be any difference. (Score:2, Informative)

In summary... (Score:2)

Re:In summary... (Score:3, Funny)

Re:In summary... (Score:2)

differences not having to do with optimization (Score:2)

Re:differences not having to do with optimization (Score:2)

your mileage will vary, but... (Score:4, Informative)

Re:your mileage will vary, but... (Score:2)

Re:your mileage will vary, but... (Score:2)

Re:your mileage will vary, but... (Score:2)

Use Java instead .... (Score:2, Funny)

GCC experimental results (Score:5, Interesting)

Re:GCC experimental results (Score:3, Interesting)

Re:GCC experimental results (Score:2, Insightful)

Re:GCC experimental results (Score:2)

Re:GCC experimental results (Score:2)

Re:GCC experimental results (Score:2, Interesting)

Re:GCC experimental results (Score:3, Insightful)

Re:GCC experimental results (Score:2)

Re:GCC experimental results (Score:3, Informative)

And C++/STL? (Score:5, Informative)

Re:And C++/STL? (Score:2, Funny)

Re:And C++/STL? (Score:3, Informative)

use array (Score:2)

TFA is a Troll (Score:2)

As Donald Knuth said... (Score:4, Insightful)

Re:As Donald Knuth said... (Score:2)

Re:As Donald Knuth said... (Score:2)

Pointers are for programmers silly rabbit!! (Score:2, Insightful)

You shouldn't make any assumptions ... (Score:3, Insightful)