Tools For Understanding Code? 383
ewhac writes "Having just recently taken a new job, I find myself confronted with an enormous pile of existing, unfamiliar code written for a (somewhat) unfamiliar platform — and an implicit expectation that I'll grok it all Real Soon Now. Simply firing up an editor and reading through it has proven unequal to the task. I'm familiar with cscope, but it doesn't really seem to analyze program structure; it's just a very fancy 'grep' package with a rudimentary understanding of C syntax. A new-ish tool called ncc looks promising, as it appears to be based on an actual C/C++ parser, but the UI is clunky, and there doesn't appear to be any facility for integrating/communicating with an editor. What sorts of tools do you use for effectively analyzing and understanding a large code base?"
When I was your age... (Score:2, Interesting)
(and GET OFF MY LAWN).
doxygen - with full source option (Score:3, Interesting)
Creating small demo apps that use the code can also help.
mhack
Etags (Score:3, Interesting)
Also, run the program with a debugger and step through it. Or put some print statements in key places and see what it produces.
I find that's all I ever need.
Wait 'till you get to reading the specs... (Score:3, Interesting)
They'll be out of date, full of inconsistencies and incomplete.
Then you'll be reading the code only to discover that people's idiosyncrasies and personalities definitely affects their coding styles. (There's even some gender bias where women tend to set a lot of flags [sometimes quite needlessly] and decided what to do later in the execution while men code as if they knew where they were going all the time, just that when they get there, they're missing some piece of information or other.)
If you read code developed by a whole team of people, you'll get to know them, intimately.
Good luck. You'll be at the bar in no time... I kept the stool warm for you.
Re:Paper (Score:3, Interesting)
HTML based cross reference (Score:3, Interesting)
ctags *
gtags
htags -Fan
It will create a ~\HTML folder with all the function/variables cross-referenced. Open the file index.html or mains.html in your browser. If your not running Linux, I think these utilities are included in cygwin http://www.cygwin.com/ [cygwin.com]
Enjoy,
Tests (Score:4, Interesting)
Another great tool is valgrind+KCachegrind - it gives you really nice call trees. Vtune can do something similar as well, but IMHO the output is not as good as in KCachegrind. The only problem, of course, is that valgrind makes your program very slow and, it is, AFAIK, not available on MS Windows.Vtune, OTOH, runs the program at normal speed, but it's calltree output is ugly, at least on Linux.
If these two options are not for you than you might add a trace output to each function. IMO this is better than using a debugger - especially in C++ with BOOST and STL, where a lot of stepping goes through inline functions.With proper logging levels you can get a very useful output to see what's going on. It helps to understand the code, and it also helps, if you hit a bug.
Re:Wait for cenqua's solution (Score:1, Interesting)
Shameless plug for CodeSurfer (Score:3, Interesting)
You can browse your code, following dependences and definitions. You can also construct queries, do isolate what statements can affect a particular variable, and a bunch of other tricks based on static analysis. There's a programming interface too.
Other good ways to get your head around code (speaking as a software engineer, rather than a guy promoting his company):
About "new tools" (Score:3, Interesting)
Well ... some good points, and some I'd say are too detailed at this point.
I totally agree with point (1). I forgot to mention it since I assumed (always a bad thing) that the author actually could compile and run the thing. An important point to keep in mind. Thanks for bringing it up.
Points (2)-(5) however all come after you've understood the basic structure of your code base.
Next, I'd say that a fairly junior software engineer trying to tackle a large unknown code-base without proper tools is doomed to failure no matter what. So the step from "If you're in a rush" and "You are in a paid job and expected to deliver predictable results." to "forget about tools you're not familiar with and just dive in" is an exercise in self-delusion and a recipe for disaster. Nothing less. It's like someone rushing out of the house and sprinting for work because they don't know where they put the car keys or their bus ticket and feel they are too much in a rush to search for them.
Besides, producing automated documentation is a good way to communicate. The tool communicates the structure of the code-base to you, and you can use e,g, the call-graphs to (efficiently) communicate the complexity (or otherwise) of the code-base to your supervisor. It also communicates to him how you are approaching the problem, which is likely to be a plus.
Now suppose the codebase is really difficult. A competent software engineer is, like any other kind of engineer, co-responsible for making actual and potential trouble spots *visible* to management. Preferably before they explode. Although it's popular wisdom to despise Management, if you, the hands-on person, don't tell Management of the problems, you ensure that they're driving blind. You rob them of the chance do do anything about it before the problem becomes so acute that even they'll have to notice. They will recognise it if you do and keep it in mind when they have to assess you. Depend on it. Besides, you just happen to be the only one who can tell them, and you fail in your responsability if you don't. Part of a software engineer's job is to *communicate*. Now you can't give your supervisor any honest estimate of how well you have the new code base under control before you get to know it. And tools really really help you save time and allow you get a much better overview.
Communication works both ways. If, with all the tools you use, you are unable to understand the code-base, you lack one or perhaps two elements that distinguishes a basic software engineer from a good or even a great one. Talent and experience. And you should be honest with yourself and your supervisor about that too. If the job really is too hard for you, have the guts to own up before you mess up and thereby save yourself and your company a lot of trouble. And believe me ... there are lots and lots of good jobs in software development / maintenance that can be done without a surfeit of either. Such is the power of engineering.
Now Doxygen (or similar tools) may be unfamiliar to the author, but such tools really work. Besides, I've seen students download, understand, and use Doxygen in less than 1 hour after they were told about it.
Been there, will never return (Score:3, Interesting)
I had a very wise undergrad EE prof who said on the first day of design class that we needn't worry about the many "complicated" things that we would have to design during the course because we had already completed all of our circuit analysis courses. He said it's much harder to figure out the details of someone's design than to design it yourself. Same applies here in software. I've been there working with other's undocumented code and quite frankly it was infrequently that I left the project with more respect for the programmer. Here I'll just say what I learned from the experiences as useless as it might be.
If the coding style used is appropriate you stand some chance. Lines of code don't matter much when behavior is sufficiently complex that you cannot list the states and events that trigger execution and state change let alone keep track of them in your head long enough to understand their context.
I once had a similar problem with some legacy OS9 c code that performed a simple communication task and updated a monitor. With no documentation from the writers I was to "simply add some new data to be collected and display it." The problem with this 3000 loc was that it was written as a state machine with no modularization - next to impossible to follow in a debugger. What I wanted to do is run a performance analyzer along with the code but I was told that was "out of budget". This would have told me at least the parts of code being executed frequently and I could start to associate the external events with the code processing.
On very large applications like AT&T's RNS (residential account management for BellSouth) that exceed million-lines-of-c-code the only thing that made the application workable for new features was the fact that it was created in a CMM III product environment thus it was well documented in design, development, testing, feature changes, bug fixes, etc. Even with all of this the number of processes and related data stores still showed a lot of bleed over and function duplication (there was no simple way to determine if a function was in existence that already did what you needed and even harder to determine if it was state data dependent and thus unusable in certain other states. Attempts by us (contract coders mainly) to get the company to allow us build a function-finding-tool/database to eliminate this problem fell on mostly deaf ears.
Because of this we had to depend on the longer-lived of the system architects to get an idea of where functionality existed. There were many times though when no one knew and weeks had to be spent reverse engineering communication structures, what the heck undocumented stretches of code did, re-write the documentation correctly and then start to implement the feature or correct the problem that had "been there for years." Management did not like the time taken to repair poor coding as this was not included as one our trackable metrics and therefore not in our feature/bug's budget (since it was not considered to be either).
RNS sounds bad but it was a breeze compared to that tightly optimized state machine code without documentation. So, my recommendations are:
1) If it is stream-of-thought-code (kind of like Faulkner's The Sound and the Fury), not modularized, not documented Tell your manager that it most likely will have to be re-designed to understand it fully. That means do an essential model of it's processing and data stores, use-cases, objects and events or whatever rigorous methodology you prefer. Then use that to re-write it. If management doesn't want to do that then you do not work for a company interested in maintainable code but wants a cheap fix. I would leave as soon as you get from them what they took from you in suckering you into the place.
2) If it is structured and/or developed in a "self-documenting-language" like Ada, Modula, Eiffel, etc. that forces structure (or at least makes it easier to write structured rather than unstructured), finish documenting it properly a