Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming Technology

Abandoning Header Files? 207

garethw asks: "I'm working on a project where the lead developer, following a suggestion by our tool vendor, wants to get rid of the header files and directly #include source code. The language is a somewhat specialized language, but for all intents and purposes, you can assume it's Java or C. The conventional argument I recall for using header files, and incremental compilation, is that it's faster to use a makefile and conditionally build only those files that have changed. However, it turns out that the brute force of invoking the compiler once on the top-level does actually compile much faster. I feel that there is something about #include'ing source files directly, compiling only the top-level file, just doesn't 'feel' right and I'm at a loss to really give a solid argument as to why. Has anyone actually used this approach? Does anyone have any thoughts on any advantages or drawbacks?"
This discussion has been archived. No new comments can be posted.

Abandoning Header Files?

Comments Filter:
  • Need more info... (Score:5, Insightful)

    by sfjoe ( 470510 ) on Friday January 14, 2005 @06:00PM (#11367800)
    ...following a suggestion by our tool vendor,...

    How much money will your tool vendor make if you implement this suggestion and what, if any, product does she sell that neatly solves any problems this might bring up?

    • i.e.: Will they pay for Dennis Ritchie's cardiac medication?
    • by Tim Browse ( 9263 ) on Friday January 14, 2005 @11:04PM (#11370699)
      If they're anything like some tool vendors I've come across, it's because they either don't have decent compilation perfomance, or don't support the features that would help, such as pre-compiled headers, etc.

      So rather than fixing the problem by investing in their product, they're telling their customers to use ugly hacks to get around the product's shortcomings, and hope they won't switch to another system (I suspect).

      I've certainly been on the receiving end of such tactics.

      The dead giveaway is when they start saying things like "pre-compiled headers wouldn't help you anyway" :-)
  • Not useful for C (Score:5, Informative)

    by david.given ( 6740 ) <dg@cowlark.com> on Friday January 14, 2005 @06:00PM (#11367804) Homepage Journal
    ...or, to a lesser extent C++, because of the way C scoping works:

    static global variables have scope within the module they're defined in. Which means that two static globals in different source files don't collide, because they're in different modules.

    Including everything into one big source file will mean that they're both in the same module, and so will collide. Not good.

    Can't say about other languages, though.

    • You sure about that?

      $ cat inc1.h
      static int foo=5;
      $ cat inc2.h
      static int foo=10;
      $ cat test.c
      #include "inc1.h"
      #include "inc2.h"
      #include <stdio.h>

      int main() {
      printf("foo is %d\n", foo);
      return 0;
      }
      $ make test
      cc -O2 -pipe -fno-strict-aliasing -march=athlon-tbird test.c -o test
      In file included from test.c:2:
      inc2.h:1: error: redefinition of 'foo'
      inc1.h:1: error: previous definition of 'foo' was here
      *** Error code 1

      Stop in /tmp.

      Sticking the variables in separate files doesn't automatically get you name

      • Lol,

        reread your parrent!!

        Exactly what you show is what he says. But he was talking about *.c Files, not *.h files. So while the *.c files would scope the foo variables leading to two distinct ones the +. h file pulls them both into the same c file.

        So what in the beginning worked, while it was scoped, does no longer work if everything is pulled int one single source file via #include.

        So your example exactly shows the conflict your parent wanted to point out.

        angel'o'sphere
        • Re:Not useful for C (Score:3, Informative)

          by Bloater ( 12932 )
          The term used in C is the "Translation Unit". When you compile a .c file you are compiling a translation unit. If the C source file #includes the contents of another file, then those contents replace the #include line in what the compiler considers to be the code to be translated.

          It doesn't matter what the file name is from the point of view of C, but a given compiler may use the last dot of the filename and the characters after it to determine which language it is, and whether it is a source file to be co
    • Frankly, if you're using file-scoped variables, you should know better than to #include source. That said, file-scoped variables haven't been a good idea since 1980; even C programmers know better these days.
      • file-scoped variables haven't been a good idea since 1980

        Yeah, now we use class static variables WHICH ARE THE SAME GOD-DAMNED THING, only with a pretty set of brace brackets around them.
        • They certainly are not. File scoped variables aren't introduced into the external namespace. Besides, class statics should almost never exist; they are useful for shared buffering, singleton behavior, and little else.

          You shouldn't be abusing naming mechanisms, scoping mechanisms or generative mechanisms in order to isolate a variable from the outside world. Make a private member or a singleton instead.

          Also, please stop swearing when someone gives you constructive criticism.
  • by Dimwit ( 36756 ) * on Friday January 14, 2005 @06:00PM (#11367811)
    Well, there's the obvious separation of interface from definition. And the problem of duplicate definitions - there's a reason why "extern" is a keyword. :)

    Plus, header files define an interface, which is useful if you don't actually have the code (i.e. binary shared library). Moot point in your case, I think, but...

    Plus it's just good programming style to have separate definitions and implementations. Easier to track down bugs.
    • > it's just good programming style to have separate definitions and implementations. Easier to track down bugs.

      You can seperate definitions and implementations within 1 source file by using the following complex formula:

      1. Put the definitions at the top of the source file.
      2. Put the implementations at the bottom of the source file (i.e. after the defintions)

      This may be difficult to get used to at first, but once you learn how to use the Page Up and Page Down keys it's not so bad...
    • Sadly, that doesn't work so well for template classes. I understand why, but it's still unfortunate that you can't separate template declarations from their definitions -- as it is, the code winds up being somewhat messy and hard to read, and if you care about such things (I don't for our in-house software) you can't "hide" the implementation of your code.

      I know for our newer programmers, it's hard to read through a messy .h file to see what's "available" -- and that's always the problem when learning a ne
    • Header files aren't a good way to separate interface from definition. For one, the programmer can still put the entire program code in the header file. And in C++ the private parts of classes end in there too.
      Java and Delphi use just one file but the interface could be easily obtained from the binary object. That way prevents duplicate definition and I like it better.
      • Nothing prevents you from doing so in C++ --- try man nm.

        However, the interface you get this way is not very well documented --- in Java or C++. For this reason, most people doing C++ (and Java) uses comments in the source files, which are then transformed into documentation by doxygen and similar programs (like javadoc).

        • It isn't exactly the same. nm works just with some compilers (or is it a gcc only utility?), and can't really extract all the type information, that isn't really there. Except may be in C++ decorated object code but I don't remember it now.
          Also most C/C++ code doesn't make clear in a nm listing which aftifacts are public, a programmer has to extra steps to prevent this. nm is at least confusing when you want to get the public API of an object compared to other language tools that just do it.
          I'm not an exper
          • Well, nm is from binutils. It's certainly not gcc only --- it can read various object file formats. Anyway, surely you are only interested in those that your linker, ld, can actually read.

            It is true that nm is faithful to the object code, in that "private" artifacts and type info in C (C++ has them, as you suspect, encoded into the name, much like Java) are listed. This reflect the fact that nothing prevents you from modifying a header file to use a private artifact --- it's a compiler thing, not a linker.

            • I agree with you, my point is that having separate definition/declaration files is useless and redundant. You can have the docs generated or -at least in java- can have the exact prototypes generated from the binaries. There's no technical need to includes nor is needed for documentation. That's why I like it better the Java/Delphi way than C/C++ #includes.
              • This is my favorite gripe with C++, so I agree with you that is seems silly and redundant.

                However, you could get close to the Java way by just writing the class out like in Java in a C++ header file. We have to ask ourselves the question: Won't don't C++ programmers do this? Force of habit? Compile times? Group pressure? Or is it, after all, better this way?

                Another thought: Nothing would prevents header file being automatically generated from source files, though a few extra comments would be neccessary l

      • The programmer can easily subvert the type, scope, and privacy level of any source or binary interface you can come up with, the requirements of C compatibility require that it work this way.

        One of the better ways to discourage client programmers from subverting your interface is to use as the only private member a pointer to an implementation struct which is forward declared in the header and defined within the implementation file. This can seriously cut down dependancies in large projects and remove one

  • by SunFan ( 845761 ) on Friday January 14, 2005 @06:06PM (#11367923)

    They are just about the only way to centrally organize declarations for data structures and function signatures. Doing so will save your ass eventually, because having function prototypes available can allow the compiler and lint tools catch stupid programmer errors. You do use lint-like tools, right? They _will_ catch bugs that testers and visual scanning wont.

    The only draw back to headers in C is that if you forget to 'make clean' after changing a header, you can end up with object files using old definitions. Just make a habit of doing a full build after changing the headers. If you designed your software properly, changing header files won't be all that common (adding functions new data structures, etc.).
  • Speed (Score:4, Insightful)

    by jbrandon ( 603700 ) on Friday January 14, 2005 @06:14PM (#11368056)
    Have you tested the speed difference when you change only one non-header file? I bet incremental compilation will make that quite a bit faster. In addition, if you want to compile that changed source file to check for syntax or type errors, you don't have to check for collision between it and the whole rest of the project, only collisions between it and the header defining it.
    • Re:Speed (Score:3, Insightful)

      by stonecypher ( 118140 ) *
      I bet incremental compilation will make that quite a bit faster.

      Chances are he's got massive coupling problems, which can totally throw away any benefit of incremental linking. And by the way, incremental compilation is something totally different; whereas I realize that the error is that of the original speaker, not yours, it should nonetheless be pointed out.

      C++ does not support incremental compilation, though ICC and MSVC both have extensions to support it. MSVC refers to it as runtime code generat
  • GCC (Score:3, Interesting)

    by lexarius ( 560925 ) on Friday January 14, 2005 @06:17PM (#11368086)
    My OS prof was demonstrating the differences in what errors the C compiler and linker would pick up. However, we found that we could make two source files with no include lines in either that both defined a global variable (sans extern). The main function set the global variable and then called a function that is defined in the other source file, which would then print the gv. Then we compiled and linked them with gcc. No warnings, no errors. The program ran exactly the way we wanted it to, which was unexpected. So yes, you can do away with includes and header files without even performing the includes manually. Depending on the language, your compiler might be smart enough to figure it out.
    But that doesn't make it a good idea. Besides, do you want to be the one who has to go update the library functions that would normally have been included any time you change the code in one file?
    • Re:GCC (Score:3, Informative)

      That's not an error, and if your OS prof said it was an error that was not picked up, he/she is mistaken.

      The C language definition is clear that you can write the program you did, with a variable defined in "common" form (no initialiser and no extern) in both files, and a function called without a prototype, and it is a valid C program with well defined behaviour.

      -- Jamie

      • Out of curiosity, where can I find this? Was it added to a recent version of the definition? He seemed to be under the impression that this functionality had been added to gcc for programmer convenience and wasn't standard behavior. After all, compilers weren't always smart enough to not need function prototypes or explicit includes.
        • The functionality is much older than GCC, and even pre-dates the first ANSI C specification. Any ancient (i.e. 1970s onwards) unix C compiler will behave like this. In the old days (K&R C, before ANSI/ISO C), there were no function prototypes.

          -- Jamie

  • I think the separate header is simply code duplication and memory limitations of old C compilers.

    Larger programs (compilation unit) could be compiled if the preprocessor - compiler were separated, and used batch processing, unused parts of the headers were never seen by the compiler.

    The main problem with headers is that preprocessor stat is global to the entire operation, not per header or C file.

    This makes conditional state flow from one to the other, which makes separate precompiled headers hard (sin
    • _Avoid_ code duplication of course.

      Also note the speed of e.g. Turbo Pascal or Delphi.
      It stems mostly from not having to cope with recursively included headers.

      If a unit is imported for the second time, they can simply copy the state, since preprocessor state has no effect on the precompiled header. The compiler only has to resolve recursive
      inclusions.

      Header caching _is_ possible for C though; make could keep the compiler alive, and have e.g. a CRC over the conditional state, to see if the state changed.
    • No, it's a limitation of the C language that was imposed by the old compilers' environments'.

      Remember that in The Old Days, the C compiler had to run each phase in less than 64K 16-bit words, text and data. Separate compilation allowed the separate segments to be compiled within those limits, and then the .o files could be linked as a last step. This also allowed for such niceties as overlaying loaders etc.

      #include was a mechanism to eliminate code duplication; instead of recoding the interfaces in each

    • Not entirely.

      One thing I've noticed is that the automatic recompilation feature of Java seems broken- changing a file doesn't reliably trigger recompilation of all affected .class files. That's really bad for big projects since you basically have to recompile everything from scratch.

      I've never quite got to the bottom of how and why, but I beleive that if Java had automatically generated separate header files and code files and dependency lists from the .java files then it probably would have worked corr

      • I don't remember if Sun's javac compiler does dependency checking but others like IBM's Jikes does . Using an Ant build script may do it for you for free.
        I think you are wrong on that using header files would make it better. Just doing the check on the source .java and the generated .class works without introducing redundancy and creating less files to compile. C compilers don't do dependency checking, most of the time you must set it in makefiles or your IDE, and with makefiles it's usually a by hand work
        • Yeah but Gcc for example spits out the dependency information if you ask it to, and then you can suck that into a makefile automatically. Then the makefile only needs to check the timestamp on the files to know whether it needs to rebuild everything and nothing that doesn't. I think I benchmarked that at 10,000 files in 45 seconds plus compilation time.

          Java's compiling technique seems to me to be quite broken- only recompiling the .java files that have changed often results in runtime problems. If you're l

          • Gcc for example spits out the dependency information if you ask it to, and then you can suck that into a makefile automatically.

            Gcc does that but it is a kludge to me because you are expected to maintain it by yourself or include it on the makefile. This is the kind of thing that should be automatic, without you having to setup the dependency data.

            Java's compiling technique seems to me to be quite broken

            I don't agree with you. First, this is not a problem with Java but a limitation of some tools, nota
            • Just watching the files opened is dangerous because Java can generate multiple .class file per source because of inner classe (example: a.java can generate a.class, a$1.class, a$2.class, etc). The tool may miss something.

              Actually, no, because in order to write the files out the compiler must have opened them. The clearcase technique is very clever and general. Still, Ant seems pretty good from what you say.

    • I think the separate header is simply code duplication and memory limitations of old C compilers.

      Headers have never had anything to do with memory limitations, and by definition code in a header cannot be repeated in an implementation file. It's worth noting that code should only ever be in a header if it is to be inlined, which doesn't exist in C until C99, by which point memory limitations simply don't matter. (It is worth pointing out that placing templates in headers is not placing code in headers,
      • I had a whole reply ready, but IMHO it is not worth the trouble replying to. Maybe it is better when you read the original post with some comments.

        Oh, and:

        you>Namedropping doesn't make you seem correct, y'know.

        Neither does getting personal about perceived faults, when it's pretty much your own assumptions that are the problem.

        -------------------

        Reread the original msg with this in mind:
        - I never said _anything_ about C++. Neither does the thread starter. I assume roughly some C like language, non in
        • I had a whole reply ready, but IMHO it is not worth the trouble replying to.

          "Oh, I wrote a reply, but I don't want to paste it because you're not a good person and I don't want to." My eight year old son knows that nobody falls for this sort of passive agressive dismissal; it's disappointing that you do not.

          Namedropping doesn't make you seem correct, y'know.

          Neither does getting personal about perceived faults, when it's pretty much your own assumptions that are the problem.


          Observing that something
  • Why? (Score:5, Insightful)

    by Pacifix ( 465793 ) <zorp&zorpy,com> on Friday January 14, 2005 @06:23PM (#11368180)
    It seems that the onus should be on the vendor to explain very, very convincingly why you should abandon decades of standard practice and good coding practice. This better be one hell of a good product you're developing to justify the should a radical change. You shouldn't need to defend standard practice, they must campaign for a change to that practice. Imagine trying to explain this to all the coders who will work on the product for the next decade - will they think you're crazy or is there really a reason to do this?
    • Re:Why? (Score:5, Funny)

      by CamMac ( 140401 ) <PvtCam@ya[ ].com ['hoo' in gap]> on Friday January 14, 2005 @06:33PM (#11368333)
      Remeber, if you remove all the comments from the code, it will compile faster and the executable will be smaller.

      --Cam
      • For starters it will be even better if you remove all whitespace and tabs. For more advance examples of better practices of C programming learned over decades of good coding take a look here [ioccc.org].
      • In your IDE, change the code to be a smaller font - this will use less disk space and ultimately make things run faster...

  • by cookd ( 72933 ) <douglascook&juno,com> on Friday January 14, 2005 @06:26PM (#11368220) Journal
    1. Advantages:
    2. Faster compile of the full product. You only invoke the compiler process once, and much less work for the linker to do.
    3. Much better optimization. Compilers can only optimize within a compilation unit. Intel and Microsoft have "Link-time code generation" compilers which performs a final optimization pass during link, but if you aren't using those compilers, there might be a significant amount of additional optimization enabled by putting everything in the same compilation unit.
    1. Disadvantages:
    2. You're not doing it the way everyone expects you to do it. Certain components (the compiler, the linker, and pre-existing code) might have been designed under the assumption that individual files would be compiled separately. The pre-existing code might have declared static (per-file) variables or functions in a way that could collide with other code (namespaces might help here). The compiler and linker might have limits. And you might not hit those limits until late in the project.
    3. For building the whole product, yeah, it will be faster. But for making a small change and rebuilding the results of that change, it might be much slower.
    As with every issue you'll ever run into, there are two (or three) sides to it.

    • You can accomplish (1) by letting make keep the compiler alive, and only call some init.

      I also agree with (2), and a simple example is inlining small calls across compilation units.

      IMHO the best solution would be to allow tighter integration between make - cpp - gcc - as - ld

      4 binaries to run per compilation unit is quite a lot of overhead. (ld runs for the bin only). This would push up memory requirements a bit though.

      Caching headers in-compiler/preprocessor would then also be possible.

      If this was do
    • >>You're not doing it the way everyone expects you to do it.

      Let's extend this: Developer churn. Software lifecycle. Is this lead developer going to write a full documentment explaining (1) the easy stuff: the compilation steps AND (2) the minutae of hacks, fixes and workaround when you hit compiler limits, newb misunderstandings, etc.

      Think about your way slower compile when only a single module changes implementation. abandon static libs and your make time should be swift.

      Overall I think you c
    • 1. Faster compile of the full product.

      Well, back in the real world, in a properly decoupled project incremental linking is a massive speed win, even when building from the top, as there's far less cross-lexing and as the build tables may be handled a small piece at a time, which is important because their parsing in the compiler itself is generally of O(n^2 log n) time or better. Once you've worked on a large project which fails to make proper decouplings, you will become painfully aware of this trend.

      Whereas in this particular project the complete build is apparently faster, that is almost certainly the result of a very naive code tree and/or build scheme; the importance of incremental linking towards speed of compile cannot be overestimated, even in the case of compiling from clean.

      2. Much better optimization. Compilers can only optimize within a compilation unit.

      This simply isn't true. Whereas only some compilers make cross-TU optimizations, that is not the same as cross-TU optimizations being only able to optimize within a translation unit (why do people keep saying compilation unit? There's no such thing!) Besides, you're dramatically underestimating the commonality of link-time cross-tu counterspecialization, which now exists in ICC, BCC, MSCC, ARM ADS, EDG/Comeau, GHOC, and is in experimental development within GCC.

      You're not doing it the way everyone expects you to do it. Certain components (the compiler, the linker, and pre-existing code) might have been designed under the assumption that individual files would be compiled separately.

      They most certainly have not been. The C and C++ standards do not allow for such ridiculously inappropriate behavior. Where did you get this idea? Compiler writers may not impose arbitrary restrictions on the codebase in any relation to the local filesystem. This is just untrue.

      The pre-existing code might have declared static (per-file) variables or functions in a way that could collide with other code (namespaces might help here).

      This is a well known gigantic red flag indicating an amateur programmer. File-scoped variables are antiquated even within the pure C community; the only time they're acceptable in most professional programmer's eyes are within a library which is built alone. In fact, you might want to read the things Kernighan himself said about when file-scoped variables are appropriate in K&R 2; the primary author of the language himself says that this is a fundamentally bad technique and should not be done.

      Of course, that you're causing problems by misusing the toolchain and allowing bad code to collide when build trees written seperately are blindly merged without the help of a linker is just not surprising.

      The compiler and linker might have limits.

      Not if they're standards compliant, they mightn't. Did you know that there's a document out there floating around telling compiler authors in concrete detail what they may and may not do? You should read that before commenting on what a compiler may or may not do; you are simply out in left field, here.

      As with every issue you'll ever run into, there are two (or three) sides to it.

      Not when you know what you're talking about. Whereas many things are issues of pro/con, many simply aren't; you'll be hard pressed to find pros in the distribution of heavy ordinance to delusional sociopaths, you'll be hard pressed to find pros in setting up a "bring a molester to school day," and you'll be hard pressed to find pros in non-decoupled code, once you've actually read the standard and are aware of the real limitations of compiler authors, instead of your guesses about what might maybe happen if someone wasn't paying attention.
  • by Dink Paisy ( 823325 ) on Friday January 14, 2005 @06:27PM (#11368233) Homepage
    You need to clarify exactly what is going on... My best effort at interpretation is that currently you have something like:

    gcc -c f1.c gcc -c f2.c gcc -c f3.c gcc -o f f1.o f2.o f3.o

    Your vendor instead thinks it would be better to do:

    gcc -o f f.c

    Where f.c looks like:

    #include "f1.c"
    #include "f2.c"
    #include "f3.c"

    Am I right, or am I completely off track?

    If I'm right, you'd probably still want to include header files because you want everything to remain modular. According to software engineering type people, that makes maintenance easier. Another problem is symbol scoping. C keeps symbols local to the module they appear in, so you want to make sure you have naming conventions, namespaces, or some other protection against naming clashes. I'm dubious about the benefits, but I work on projects that take significant amounts of time to compile. Not hours, but enough time that if you wait for all the objects to compile you are wasting a lot of time. In general, I'd claim that the larger the project, the worse an idea it is.

    • If I'm right, you'd probably still want to include header files because you want everything to remain modular. According to software engineering type people, that makes maintenance easier.

      Whereas you understand the original petitioner's complaint correctly, including headers in a project which simply mass-merges all the source would have absolutely no effect on modularity. The modularity which developers are probably referring to with regards to headers is the ability to swap out libraries without recomp
  • by StarWynd ( 751816 ) on Friday January 14, 2005 @06:27PM (#11368239)

    While including code directly may speed up the compilation time, you will loose all the time you gain and then some when you get into debugging.

    If you have a complicated #include chain, you can wind up with a lot of duplication. Some compilers will complain, some won't. However, if you have typedefs, structs or the like, most compliers will complain and not compile your code until the duplications are removed. I don't know what compiler you're using or if you are planning on including more than functions or global variables, so I don't know if this is an issue or not.

    The more general issue is that it's much easier to track down bugs and other problems if there is a clean separation between definitions and implementations. I can't characterize that difference in a few sentences, so I'll just say that it has been my experience that projects which are developed in a true modular nature are much easier to debug than projects designed in a monolithic nature. The time saved in debugging more than makes up for a little time lost in compilation.

    • If you have a complicated #include chain, you can wind up with a lot of duplication. Some compilers will complain, some won't.

      So sorry: the ODR rule prevents duplicated code from functioning in any compliant C or C++ compiler, all the way back to day one (and in fact into the parent languages B and BCPL.) This is simply false.

      However, if you have typedefs, structs or the like, most compliers will complain and not compile your code until the duplications are removed.

      If by most you mean all...

      so I'll

      • Right on: preach it, brother. This is one of the least understood principles of modern design: machine time is significantly inferior to programmer time. Herb Brooks would be proud.


        For those who don't know, Herb Brooks is the lesser known brother of famed author Fred Brooks. Herb is best known for writing the obsure tome, _The Legendary Monkey-Hour_. LMH is not as well known as older brother Fred's _Mythical Man Month_, but among primate coders, it is the bible.

        Herb is also the President of the Billy
  • by nadador ( 3747 ) on Friday January 14, 2005 @06:31PM (#11368309)
    Depending on the size of your project, you will get varying returns from each of these:

    1. Seperate source files means that units of code can hide data and functions.
    2. Seperate headers, combined with something like GCC's -Wmissing-prototypes enforces the good coding practice of well defined functional interfaces.
    3. Seperate headers and source files means that when you look at a function in a file, you will have some idea of what it touches because you can go and look that it included header X but not Y.
    4. You can tell the compiler to explicitly forbid global data symbols, which is pointless in one single file.
    5. You can use different compiler switches for different files.
    6. Your code will have some hope of portability.

    If your project is small, it doesn't matter anyway. If your project is large, you can get your compiler to enforce some good design rules on you, which doesn't mean you can't still have a good design anyway, but it will make it more likely. I worked on a project that used a compiler that let you get away with everything. Try and port that code to anything UNIX-like, and it was ridiculous.
  • by Lord Kano ( 13027 ) on Friday January 14, 2005 @06:46PM (#11368468) Homepage Journal
    You have issues with scope.

    The easist one for me is that with #includes, it's so much easier to fix bugs. If you find a bug or an inefficient way to solve a problem, you only have to fix it once. Everything that #includes the suspect file will be fixed on the next compile.

    If customfunctions.h has been changed or optimized, you don't have to edit the 30 projects that you're using those functions in. Just the one file is fixed, each project gets the benefits during the next compile.

    LK
  • Playing devil's advocate to most of the replies I've read so far, there are two benefits you could get from this approach.

    If your language supports "static" in the file-scope sense, you could declare every global object as static and reap the compiler optimizations that come with that declaration.

    If your language supports smart inlining, you could end up with code that has been inlined more effectively, since any code could be an inlining candidate regardless of location.

    I can think of plenty of reas

    • I don't think the "static" advantage is all that big an advantage when you consider a large program. Remember the semantics of static are different from the semantics of auto, and depending on some code generation issues, of extern.
    • If your language supports "static" in the file-scope sense, you could declare every global object as static and reap the compiler optimizations that come with that declaration.

      Er. There are no optimizations which apply to globals but not to globals. No, that's not a typo: file-scoped variables are globals whose names are not imported into the external tokenspace. There is no difference from the perspective of the compiler between the two; file-scoping is a purely lexing issue, and lexing cannot optimiz
  • ccache (Score:4, Informative)

    by yamla ( 136560 ) <chris@@@hypocrite...org> on Friday January 14, 2005 @06:57PM (#11368581)
    It is hard to tell from your statements, but this may stop tools like ccache [samba.org] from working. I use ccache in my projects and it radically cuts down the amount of recompilation required when I do a complete rebuild. Now, an obvious question is why I don't simply rely on makefiles to ensure only changed files ever get rebuilt. This often happens because compilation involves generating new cpp files that are then compiled and I don't want to be grepping through these all the time. I suppose I could move them all to a different directory, but ccache works very well.

    The other problem, of course, is that separating your classes into header and implementation means that if you change the implementation, you only need to recompile that one file and relink, rather than recompiling EVERYTHING. This can be a matter of a few seconds vs. several minutes. And implementation does change, a lot... fix a bug, you fix the implementation. The headers change too, but much much less frequently.
    • The other problem, of course, is that separating your classes into header and implementation means that if you change the implementation, you only need to recompile that one file and relink, rather than recompiling EVERYTHING

      Of course, his original post explicitly deals with this issue, and rejects your opinion out of hand; if you want to tell him about this, you need to help him understand why it isn't working for him. It is frequent for non-mediocre programmers to fail to understand the problems of med
  • by mattgreen ( 701203 ) on Friday January 14, 2005 @07:06PM (#11368671)
    When I write libraries, I try to make them header-only. Generally users don't want to have to modify their makefiles if they don't have to, and I'll resort to compiler specific pragmas if I have to.

    It depends on the size of the system. If you are using a component-based system then only the pieces of the system that are actually being modified should be compiling anyway, which cuts out a lot of compilation. However this implies there is fairly loose coupling involved. In a more conventional application, there has to be a breaking point where the amount of time to parse files is longer than the time to link them normally. Using precompiled headers on any system header will also drastically decrease the time it takes to compile, since the compiler essentially just dumps the parse tree out to disk. So much time is spent inside some system headers! (Especially Windows.h. Ugh!)

    There are some tools that keep the header files in sync with source files automatically, but I don't know of any off-hand. I have seen some for C, but I'm not sure there is one for C++ that supports all the wild and crazy stuff like namespaces and templates. :)
    • I find it incredibly disappointing that this post, which seems to be the only actually clueful post in the discussion, has had no moderator points applied to it. I would do it myself, except that I'm crusading around splitting hairs.

      Please mod parent up: he's not only correct, but absolutely dead-on.

      but I don't know of any off-hand.

      ccache is probably the least sucking one, but with a good build tree, makefile and strong TU seperation it's virtually never important anyway.
  • It can work (Score:3, Interesting)

    by Foolhardy ( 664051 ) <`csmith32' `at' `gmail.com'> on Friday January 14, 2005 @07:23PM (#11368823)
    Including all the source code into one main file compiled to one object can work, if the source files cooperate. C can have problems with the namespace, but C++ allows multiple namespaces and you can even put the namespace blocks in the main file around the #includes. The source code has to support this, though. It's best if all the source files to be included are under your control. For libraries that expect to use a declarative header, use it like it was intended.

    I've done this on lots of projects and it works great. Most of the arguments here are either about performance or an appeal to tradition (that's the way we've always done it... must be the only true way). Modern compilers will create pre-compiled headers that can include code, usually used for template and inline definitions; modern compilers don't get the same benefits from the traditional model anymore. Actually, even larger projects seem to take longer to link with iostream and windows.h than the source does to compile.
    The compiler's ability to optomize code may be increased greatly, espescially its ability to inline functions. Too much inlining will cause code bloat, but the compiler's options should give you control over the balance.
    Modern compilers also allow you to change the compilation options mid-file.
    Any debugger or source analyzer shouldn't have problems handling inline or same-file implementations, or you're using bad tools.
    It can also be easier to create test code; create a series of test files t01.cpp, t02.cpp (each with a main) but include only one. The others are there for reference but don't interfere. This is also useful for testing a prototype replacement for a component; include the new one and comment out the old include. Going back is trivial.

    It's more a question of coding style than anything. Personally, I hate maintaining redundant information of any kind, and this very much includes the prototypes in the header with the actual functions. Source code redundancy is bad for all the same reasons that database redundancy is bad. Making my C++ member functions inline and including their files frees me from this.

    I don't think this will work too well in Java. A Java source filename = the .class filename = the ONE public class exported by the file. Unless you want a total of 1 public class, it won't work. Java doesn't use header files anyways. Class binaries export everything public automatically.
    • Including all the source code into one main file compiled to one object can work, if the source files cooperate. C can have problems with the namespace, but C++ allows multiple namespaces and you can even put the namespace blocks in the main file around the #includes.

      Surely you're joking? Do you actually fail to understand how broken such a behavior would be? Consider the relatively trivial case of a #included file which contained two namespaces, and attempted to reference one another through uses claus
  • by GoofyBoy ( 44399 ) on Friday January 14, 2005 @07:28PM (#11368877) Journal
    Its an interesting approach and you have no idea why you shouldn't do it.

    So do it.

    In the end, regardless if it works or not, you will have learned something new.
  • by crmartin ( 98227 ) on Friday January 14, 2005 @07:35PM (#11368937)
    "Oh, Bullwinkle, that trick never works."

    One of the really depressing things about having been in the business for nigh on to 40 years now is that, along with the occasional new dumb idea, all the old dumb ideas keep coming back. Among those dumb ideas that keep coming back are "visual programming" --- using graphics instead of programming languages; complicated schematic graphics for software --- UML in its utter complex form; and, sure enough, using the preprocessor to mess with C-like languages.

    Every time this is tried --- and God knows it's been tried a lot --- you run into some severe problems:

    1. The scoping rules of C-like languages give semantics to file inclusion. If you #include chunks of code, you are defeating the language's (limited) ability to protect you from name space clashes, mis-named variables, and so on.
    2. While it might be that you gain something from only needing to start the compiler once, parsing and compilation are inherently a bit harder than O(n) where n is the number of source characters or tokens. A normal environment with make(1) will generally need to process fewer tokens than compiling everything all the time; the time required for a big file will inevitably dominate the startup time eventually.

      If you've got control of the compiler for this peculiar language, why not explore making the startup time shorter, say, eg., by using shared libraries, DLLs, or by setting the sticky bit?

    3. From sad experience, I can tell you that using the #include scheme will introduce weird-ass order dependencies into the code (ie., what order do you include files in?) that are very very difficult to debug.
    4. Most tools for C-based languages expect you to do the sources in a normal fashion. You confound the tools' expectations' at your peril.
    5. Similarly, most debuggers exploit, or attempt to exploit, scoping rules that you will break through this approach.
    6. When you write lots of smaller modules, each one can be create a single, small TEXT and DATA section, or a collection of small code sections. This makes the job of memory mapping in virtual memory systems much easier. Do it all as one big thing, and you're liable to get one big TEXT section.
    7. Optimization is comnbinatorially fairly hard, quadratic or worse, and global optimizations tend to be managed within section bounardies. One-big-module designs may either make the optimization phase very lengthy, or defeat optimization entirely when table space etc. runs out.
    8. You piss off every experienced C programmer who ever has to deal with the code in the future, especially old farts like me who've seen this trick 20 years ago.
    • Yeah, seriously, you should be designing modules to interact through clean interfaces with known, published contracts. How you partition your compilation units should fall naturally from this. You shouldn't try to do the reverse and somehow "accrete" a design based on low-level preprocessor hacks.
    • When you write lots of smaller modules, each one can be create a single, small TEXT and DATA section, or a collection of small code sections. This makes the job of memory mapping in virtual memory systems much easier. Do it all as one big thing, and you're liable to get one big TEXT section.

      That's the only one I don't buy. The individual text sections-- if they're even separate after ld is done with it-- will get merged into one big text section (along with the data and bss) when they're loaded into memor

      • Uh, which loader and which linker, and for that matter which operating system etc? I suspect you're right about linux, and it'd be lovely to believe that there aren't any systems (other than in hobbyists' basements) that don't do it that way any more, but with the information we've got, we can't exclude the possibility.
    • Awesome. You've just earned seat five on my friends list in seven years of slashdotting.
  • Well, coding style and software engineering aside, you need to do some testing if you think this will increase you speed.

    Quick test to illustrate. 1,000,000 lines of C code, using gcc 3.3.4, default options.

    Time to compile spread of 1000 files (with 8 lines of include and function body per file): ~2 minutes

    Time to compile all in a single file: unknown

    Why is the second time unknown? My computer doesn't have the memory to do it. Now I could pump up the memory of my machine (assuming I've got a 64-bit mach
  • If the amount of code changed or added in your program per unit time is roughly constant, then it will take O(N^2) time to compile, where N is the size of the program, over the life of the program.

    Think about that for a moment. Think back to computer science 101. Now ask yourself if this is a good idea or not.

  • by Chemisor ( 97276 ) on Friday January 14, 2005 @08:50PM (#11369687)
    Speeding up a full build should not be important. The only people who care about it are in your test lab doing daily builds and regression tests, who can start the build overnight and have it ready by morning. Of course, this is the situation in a well-designed application. If you find yourself needing a full rebuild all the time, it means one of two things: 1. you are hacking a core component, or 2. all your components are written with spaghetti code and any change in one forces rebuilds in all the others.

    In the first case, try just testing one or two components during development, and then verify all the others when the API is stabilized. This is, incidentally, the advantage you gain from using header files: once the API is stable, you never need to rebuild that component again except to fix bugs (which require rebuilding only that component).

    In the second case, you need some serious refactoring. Look at the code and break it up. Encapsulate everything you possibly can. Make stuff private and static. Make everything you don't modify const. Keep it up until each component is accessed only through its API and that API is clean. Trust me, this is possible in any project. The enormous decrease in maintenance costs will more than pay for any time you spend on it.
    • Speeding up a full build should not be important.

      Agreed.

      In the second case, you need some serious refactoring.

      Decoupling isn't refactoring. Fowlerian refactoring is about changing the interaction of code, the code paths, and pushing behavior around into conceptually correct normalized areas. Decoupling is just about specifying no more than the bare minimum information required for external linkage in order to reduce cross dependencies.

      Encapsulate everything you possibly can. Make stuff private

      W
      • Proper logical and physical encapsulation can speed up builds by a huge factor. The more extreme encapsulation of pimpl can speed builds even more.

        I think he is using 'static' in the sense of 'class variable' or maybe as in 'compilation unit scoped', not in the sense of 'persistent between invocations of a function' - the third use of 'static' is the one that always causes grief when threads enter the picture. You could conceivably cause thread problems with class variables, but to my mind that nearly alw

        • Proper logical and physical encapsulation can speed up builds by a huge factor.

          True. That doesn't make statics encapsulation at all, let alone proper.

          The more extreme encapsulation of pimpl can speed builds even more.

          It's always a shame when people talk about sutter's pointer to implementation or fast pointer to implementation idioms without actually understanding them. Pimpl is not encapsulation. Encapsulation is the seperation of implementation from interface, the distinction between "inside" an
      • > Decoupling isn't refactoring.

        I would say that it is. Mathematically, if you convert (a+b/c)bc into abc+bb and then to b(ac+b), I call it refactoring, since it involves moving factors around. In programming I say "refactoring" to mean "extract interfaces from your code and use them to pull up modules, which can then be encapsulated into classes".

        >> Encapsulate everything you possibly can. Make stuff private

        > Whereas these are good principles, it will have zero effect on the problem at hand.
        • Decoupling isn't refactoring.

          I would say that it is. Mathematically, if you convert (a+b/c)bc into abc+bb and then to b(ac+b), I call it refactoring, since it involves moving factors around.


          Neat how you avoided the justification I gave, and instead relied on equivocation [wikipedia.org] to make your argument for you. It doesn't really matter that in mathematics you can push things called factors around; that has nothing to do with Fowlerian refactoring. Should you choose to read his book, you may be surprised to lea
  • First, your language is not like anything like Java, because Java does not have header files at all.

    Second, your language is not anything like C, because C was carefully designed from the ground up to use header files and compilation units. Running this way will annoy your compiler, your linker, your debugger, and every other link on your tool chain, and muck up many standard C coding practices.

    So, yeah. If you're using a language that's not like C or Java, and your tool vendor is telling you to do thi
    • Running this way will annoy your compiler, your linker, your debugger, and every other link on your tool chain, and muck up many standard C coding practices.

      Conspicuously absent from this list is "other programmers."
  • You don't want to abandon headers; they are the basis of seperation into translation units. If you're compiling from the top and it's faster than compiling small pieces, then that means your source needs a hell of a lot more decoupling; by definition the only way that top-down build could possibly be faster than incremental build is if incremental build is still making everything anyway.

    Consider reading Modern C++ Design [tri-bit.com], specifically the section about generic functors, which should help you with your cou
  • Total red herring... (Score:4, Informative)

    by pla ( 258480 ) on Saturday January 15, 2005 @09:41AM (#11372691) Journal
    First of all, "speed", either compilation-wise or runtime-wise, has nothing to do with why you should use header files.

    I too disliked header files, long ago, in my early days of programming C. It seemed pointless, to have two files (or rarely, as many as four), when one would do just as well.

    For small projects, I'll still use one large monolithic source file. In that aspect, it makes sense to skip breaking out your data and function definitions.

    But when you get to the "real" world... Imagine even a "small" serious project, with perhaps 10k lines of code. Try to find a single function in that file - I hope you feel on good terms with your IDE's search capabilities!

    So, break that out into a dozen files - You have your network code in one file, your UI code in another, your file I/O in another, perhaps some database interaction in another, and so on. Okay, that works well... But wait, your network code, your file I/O, and your database code, all make use of the same checksum algorithm! So, you have the same exact code duplicated three times.

    That would work, because each file will compile to a module with its own namespace (in most languages). But it wastes space, both in the source and in the compiled code. It also wastes time and can very easily introduce bugs - For example, if you decide you need to switch from MD5 for SHA1 as your checksumming algorithm, you now need to change three places instead of one. If you miss one of those, but use them to compare results between the three different uses, you have a very serious bug that may drive you batty trying to track it down.

    So, the obvious solution, break out all your common functions into a toolkit-like source file. Now, you could just #include that in every other file that needs it, but WOW would that cause some serious bloat in the compiled code - In my experience, shared code files frequently end up as the single largest source file in the entire project.

    So, use a header file. That way, you don't end up with massive duplication of code, you have the advantage of a logical breakout of your code into similar-purpose files, and you can still make changes to only one file to modify one function.

    Incidentally, the above chain of thinking more-or-less describes the evolution of standard libraries... Would your professor actually suggest that you shouldn't "#include<stdio.h>", but instead should manually pull the code for each function you use into your source file? Because, in the degenerative case, he has told you exactly that.
  • by mqx ( 792882 ) on Saturday January 15, 2005 @12:31PM (#11373355)

    Just remember that a header file defines the interface to the body: which actually duplicates some of the material in the body. Because of this duplication, you can have problems, i.e. faulty build dependencies, mismatch between header/body, etc. Removing this sort of duplication is usually a good thing: so if the technology (i.e. compiler) is smart/performance/etc to get it right, then the change could be a good thing.

    I'd like to point out that many other respondants have argued their case with reference to 'C', however the poster clearly said it was not 'C' -- without further information, it's difficult to know whether these 'C' type issues will translate. I'd point out that some languages, e.g. python, java, perl, do not have ideas of separate header/body -- suggesting that "current trends" in languages is to do away with the duplication.

    The compiler could be intelligent enough to construct a parse tree quickly, and only resolve parts of the parse tree when necessary: so for example, if there was previously a 5K header, and a 30K body, but now only a 30K body, the compiler may read the entire 30K, and only "roughly" parse it (e.g., say for a function, it parses the outer scope of the function, but resolves nothing inside the function until some other code actually uses the function).

    I don't think there's an answer for this guy: there are too many issues that haven't been stated, as we know nothing about the particular toolchain, the build environment, the language, etc. All we have an abstract concept of splitting files into header/body. That concept by itself isn't good or bad, it depends upon a lot of other issues that change the perspective.

    My answer would be that surely in the guys company he has a couple of clueful senior engineers that can sit around a whiteboard and discuss (using their computer science training) what actual impact the change will have on the project, and whether to go with the impact.

  • More from the author (Score:2, Informative)

    by garethw ( 584688 )
    Thanks for all the interesting replies. It's always nice to start a flame war.

    I wish I'd included a few more details, which might have avoided questions like, "Are you stupid?" and "Have you taken basic Computer Science course?" (the answers are "On occasion" and "Waterloo, Comp Eng '98" respectively :) )

    A few details which might put the question into perspective might be:
    • The project is a chip verification project. There is no final "product" at the end of my work. The name of the game is endles
    • I haven't read the flames yet, but.... asking for specific advice when you won't a) reveal what language the code is in, or b) what compiler is being used sort of invites flames.

      Candidly, how do you expect a serious (not to mention informative) response?

      OK - now to read the flames....
  • There is what you do for release, and what you do for development. For release (which can include the daily build!) this is a good idea, if there is any gain in run speed.

    For development I consider this a bad idea. When I change foo.c I don't want to recompile every other file in the project. It will slow things down in any project with size.

    It will be more work to implement both. Since computer time is cheaper than human time most people don't bother with adding the ability to do both. However I

Arithmetic is being able to count up to twenty without taking off your shoes. -- Mickey Mouse

Working...