Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Software

Tools For Understanding Code? 383

ewhac writes "Having just recently taken a new job, I find myself confronted with an enormous pile of existing, unfamiliar code written for a (somewhat) unfamiliar platform — and an implicit expectation that I'll grok it all Real Soon Now. Simply firing up an editor and reading through it has proven unequal to the task. I'm familiar with cscope, but it doesn't really seem to analyze program structure; it's just a very fancy 'grep' package with a rudimentary understanding of C syntax. A new-ish tool called ncc looks promising, as it appears to be based on an actual C/C++ parser, but the UI is clunky, and there doesn't appear to be any facility for integrating/communicating with an editor. What sorts of tools do you use for effectively analyzing and understanding a large code base?"
This discussion has been archived. No new comments can be posted.

Tools For Understanding Code?

Comments Filter:
  • Stepping Through (Score:5, Insightful)

    by blaster151 ( 874280 ) * on Friday January 18, 2008 @12:38PM (#22094862)
    I've always found that stepping through the debugger at runtime is a decent way to start making sense of a large code base. Easier, anyway, than trying to read static code printouts. Just set a breakpoint at a point of interest, fire up the application, and use it as a starting point. You get a sense for program flow and it's a great way to generate questions--lots of them. (What does class SuchAndSuch do? It looks like the application is handling remoting in such-and-such a fashion; is that right?) You can also choose one aspect of the architecture and selectively ignore or step over other aspects, building up your understanding one aspect at a time. In my case, with Visual Studio as a development environment, I can hover the mouse cursor over variable names to see their current values. In the case of variables of a certain type, like datasets or XML structures, I can use realtime visualizers to browse the contents and get a much better feel for what's going on.

    If there's no one at your company that can help answer your questions and bring you up to speed, I feel for you - your employers ought to know enough to give you some extra margin. It can be very hard to take over a large code base without some human-to-human handover time.

    Also, is it an object-oriented system? I assume that it's not, based on your post, but you don't say either way. If it is, the important aspects of program flow often live in the interactions between classes and objects and the business logic is decentralized. OO is great, but it can be harder to reverse-engineer business logic because it's distributed among various classes. A debugger that lets you step through running code is almost essential in this case.
  • Paper (Score:2, Insightful)

    by raddan ( 519638 ) on Friday January 18, 2008 @12:41PM (#22094918)
    You should really be sitting down and attempting to understand the code, ASAP. Asking Slashdot for fancy tools isn't really going to help you. The real barrier here is your own brain.
  • Ctags (Score:3, Insightful)

    by pahoran ( 893196 ) * on Friday January 18, 2008 @12:42PM (#22094948)
    google exuberant ctags and learn how to use the resulting tags file(s) with vim or your editor of choice
  • by wampus ( 1932 ) on Friday January 18, 2008 @12:43PM (#22094972)
    Sometimes its hard to follow execution, especially in a large codebase. Its made even more difficult when a smug jackass wrote it to be as terse as possible.
  • by Jeremi ( 14640 ) on Friday January 18, 2008 @12:44PM (#22094992) Homepage
    One might as well ask, why are you posting smarmy retorts when you clearly didn't understand the question? The question was about understanding the program, not the underlying language.
  • by daVinci1980 ( 73174 ) on Friday January 18, 2008 @12:47PM (#22095068) Homepage
    This post is dead on.

    Place a breakpoint somewhere you think will get hit (e.g. main), and then start stepping over and into functions. I usually attack this problem as follows:

    Place breakpoint. Use step-in functionality to drop down a ways into the program, looking at things as I go. What are they doing, how do they work, etc.

    Once I feel like I understand how a section of code works, I step over that code on subsequent visits. If I feel like this isn't taking me fast enough, I let the program run for a bit, then randomly break the program and see where I am.

    Lather, rinse, repeat.

    Also, this should go without saying, but you should ask someone who works with you for a high-level overview of what the code is doing. The two of these combined should get you up to speed as quickly as possible.
  • by Anonymous Coward on Friday January 18, 2008 @12:51PM (#22095144)
    Seriously folks, having spent large chunks of my working life having to decipher the mess of those who came before me I cannot stress enough the importance of clear comments, variable/function names, and consistent and readable syntax. AND WRITE F@#$%ing HUMAN READABLE DOCUMENTS DESCRIBING FUNCTIONAL REQUIREMENTS, ALGORITHMS USED, LESSONS LEARNED, ETC.
    Calling all your variables "pook" or the like may be very cute, but does not help me figure out what the heck the function is supposed to do or why I would ever want to call it. Yes it's a pain. Yes we're all under time deadlines and want to get it working first and go back and document it later. And yes, it WILL bite you in the ass (ever heard of karma? your own memory can go and then you have to decipher your OWN code!).

    That said, if you have inherited a code base from someone who ignored the above, go through and generate the documentation yourself. Write flow charts and software diagrams showing what gets called where and why. Derive the equations and algorithms used in each piece and figure out why the constant values are what they are. Finally, start at the main function or reset vector (I do a lot of microcontroller development) and trace the execution path.
  • Osmosis (Score:3, Insightful)

    by Greyfox ( 87712 ) on Friday January 18, 2008 @12:51PM (#22095150) Homepage Journal
    If the original developer made useful comments that will help immensely. If there's a design document showing how the program fits together that helps a lot. If there's a process document explaining the business logic the application implements, that helps a lot. On average you'll start with a marginal code base with no comments, no design documents and no explanation of what the application is attempting to accomplish.

    Get the guys who use it to explain what they're trying to do, read the code for a couple of days and then have them show you how they use the application. Then plan on six months to a year to get to the point where you can look at buggy output and know immediately where the failure is occurring. In the mean time just work in it as much as you can and don't try to redesign major parts of it until you know what it's doing.

  • by namgge ( 777284 ) on Friday January 18, 2008 @12:52PM (#22095174)

    and an implicit expectation that I'll grok it all Real Soon Now

    It is unlikely that your job is really to 'grok it all'. Most likely there are specific issues that need to be solved - stop panicking and pick the simplest one on the list and start working on it.

    In a similar position to you, I followed Brook's advice to study on the data structures and found it good. Also just running the application under a debugger, inserting breaks in important looking code and then having a look at the call stack when that code was used also proved enlightening. A good debugger also lets you explore the data structures.

    When smart-asses tell you "Bill would have fixed that in ten minutes." I recommend replying "I never met Bill, why do you think he left?"

    Namgge

  • Re:The best tool (Score:2, Insightful)

    by Anonymous Coward on Friday January 18, 2008 @01:02PM (#22095406)
    The best programmers I've ever worked with didn't have degrees. But some of the worst ones did.
  • by gaspyy ( 514539 ) on Friday January 18, 2008 @01:19PM (#22095734)
    I'm appalled by some of the comments that imply that the poster may not be fit for the job.

    A few years back I had to maintain a large module written in C#. I had about 200K lines of code, 50 classes, zero documentation, zero comments, zero error logging support, and I was expected to find and fix bugs and add functionality the day after the module was handled over.

    So if you were never in this position, just STFU. Yeah, the code is there, but is this flag for? Is this part really used, or is obsolete? What are the side-effects of using that method? And so on...

    Eventually, I learned it, especially after some intensive debugging sessions, but it was frustrating to say the least. I would have loved to have some aiding tools.
  • Mod parent up (Score:5, Insightful)

    by mccrew ( 62494 ) on Friday January 18, 2008 @01:28PM (#22095944)
    Sorry, no points today to mod you up myself.

    I would suggest a slight variation on the theme. Fire up the application, start it on one of its typical tasks, and then interrupt it in the debugger to catch it. While the process is stopped mid-flight, take note of the call stack to see which classes and methods are being used. Maybe step through a few calls, then let the program run some more.

    By doing this repeatedly, you will quickly get a sense for which parts of the code see the most action, and would provide the most obvious places to start studying the code base, and provide the best bang-for-buck return on your time.

  • Re:Paper (Score:3, Insightful)

    by cjonslashdot ( 904508 ) on Friday January 18, 2008 @01:30PM (#22095996)
    I agree. I have found that it is fairly easy to uncover program structure. But UNDERSTANDING the intention of each line or function is another matter. This is where one wishes that there were documentation of design decisions. This is why whenever I build something I simultaneously maintain a design document in which I record each decision that I make and each pattern that I devise and use. As I revisit decisions, I do it in the design, and only when I have worked out the design do I try to code it. This is not the traditional "big up front design" - it is an agile approach to design, attacking it incrementally and in a just-what-is-needed manner.
  • by smitth1276 ( 832902 ) on Friday January 18, 2008 @01:39PM (#22096182)
    That doesn't always work for a code base with millions of lines of atrociously written code. I've worked with code where it is absolutely not feasible to step through everything.

    It seems like in those cases I end up working from effects... I note some program behavior and then try to find exactly what causes that behavior, which can be surprisingly difficult if you are dealing with the "right" kind of code. After a while, though, the patterns begin to emerge in the system as a whole.
  • by dupup ( 784652 ) on Friday January 18, 2008 @01:41PM (#22096220)
    The parent post is correct, IMO. In fact, I have found that, for me, the easiest way to start understanding a new code base is to jump with a bug or two to fix. It's a little painful at first, but a specific goal combined with judicious use of the debugger will help you understand how the system works more quickly.
  • Absolute tosh ! (Score:5, Insightful)

    by golodh ( 893453 ) on Friday January 18, 2008 @01:43PM (#22096258)
    An interesting post, even if it's absolute tosh. No-one in his right mind tackles a new code-base of any size or complexity with nothing but a printout. Not if he's expected to understand how it works and/or maintain it in a responsible way.

    In fact, it nicely highlights the difference between "software engineers" and "code monkeys". Code monkeys just dive in; they never pause to think. In fact ... they tend to avoid thinking. It's not their strong point. After all ... they're paid to code, right? Not to think. Software engineers on the other hand, look before they leap and spot the places where they need to pay attention first. And they're systematic about it.

    In fact, a software engineer will happily spend a day or two putting the right tools in place, *including* a full backup and a proper version management system for when he's going to have to touch anything.

    The first thing you want to know about a new code base (after you find out what it's supposed to be doing) is its structure. Tools like Doxygen (see previous posts) show you that structure *far* quicker and *far* more reliably than any amount of dumb code-browsing can. And besides ... once you do it, you've got that documentation stashed away securely instead of milling around incoherently in your head (you'll have completely forgotten most of what you read by next month) or on disorganised pieces of note paper.

    The second thing is to figure out if it calls any "large" functionalities like subroutine libraries or even stand-alone programs like databases, let alone if it makes operating system calls. The call-tree will give you an excellent view, and the linker files can complete the picture. You wouldn't be the first maintenance programmer who found out after months that his application critically depends on some other application he wasn't told about.

    The third thing is to see where your code does dirty things. Let the compiler help you. Just compile your application with warnings on and have a look at what the compiler comes up with. You might be surprised (and horrified). Then compile with the settings used by your predecessor and check that your executable is bit-for-bit identical to what's running (you wouldn't be the first sucker who's given a slightly-off code base).

    If performance is at all important, then running the whole thing for a night on a standard case under a good profiler will also tells you lots of important things. Starting with where your code spends its time, where it allocated memory and how much, and where the heavily-used bits of code are. All neatly written down in the profiler logs.

    Finally, run your application with a tool to detect memory management errors the first chance you get. Useful tools are Valgrind (in a Linux environment), Purify (expensive, but probably worth it) under Windows, and sundry proprietary utilities under Unix. Just about 90% of the errors made in C programs come from memory management problems, and half of them don't show up except through memory leakage and overwritten variables (or stacks .. or buffers .. or whatever). You'll need all the help you can get here, and as far as these errors are concerned, dumb code browsing is useless. Just keep your head when looking at reports from such tools ... they can throw up false positives. Ask around on a forum with specific questions if you're allowed, or ask your supervisor. After all ... you showed due dilligence.

    When you know all that (if you have the tools in place, all of this can be done within 1 day + 1 overnight run + 1 hour reading the profiler output), go ahead and trace through the code in a debugger. You'll be in a *far* better position to judge what you should be reading.

  • by swillden ( 191260 ) <shawn-ds@willden.org> on Friday January 18, 2008 @01:49PM (#22096420) Journal

    ...if you don't understand the language?

    Yes, it's hard to understand questions when you don't understand the language.

    I'm sure you can find some remedial English classes if you look.

  • by JesterXXV ( 680142 ) <jtradke@@@gmail...com> on Friday January 18, 2008 @01:50PM (#22096430)
    I don't think there's any replacement for talking to the real-live developers who wrote it. Failing that, any design documentation they left behind. Failing that, just get a task to do, and try to get it to work. Nothing like learning by doing.
  • by smittyoneeach ( 243267 ) * on Friday January 18, 2008 @01:50PM (#22096442) Homepage Journal
    I think unit tests are actually better, for code that is suited to being driven externally.
    Pick a tool to wrap something, start writing little bits to excercise the code.
    You can comment and version unit tests, giving a sense of history.
    Debuggers, on the other hand, mostly exist in the present tense.
    Sure, you learn something now, but how about some breadcrumbs for later?
  • by Assmasher ( 456699 ) on Friday January 18, 2008 @01:51PM (#22096462) Journal
    I certainly think that stepping through is by far the most valuable method; however, it can be difficult when dealing with asynchronicity and/or parallelism. In those cases, commenting is the only solution that seems to help me... LOL.
  • by orclevegam ( 940336 ) on Friday January 18, 2008 @02:10PM (#22096858) Journal
    Much as I would love to agree with you, unfortunately the world isn't always so accommodating. Sometimes you have to suck it up and stay with a job till you can find something better, and most employers won't let you toss anything out, let alone a major chunk of their code base. Doesn't matter if it's utter crap, they paid for it, and as far as their concerned turd polishing is better then starting from scratch even if starting from scratch would be a hell of a lot cheaper. Can't expect MBAs to understand the difference between good code and bad code, to them it's all just code, and as far as their concerned, the more the better. It's the old idiotic idea that more lines of code means a better product, therefor anything that reduces lines of code must be a bad thing.
  • by JonTurner ( 178845 ) on Friday January 18, 2008 @02:32PM (#22097388) Journal
    An *excellent* stragegy and thorough explanation. Especially the bit about stopping to think and devise a plan rather than just diving in headfirst. All spot on!

    The only thing I could possibly add is to say "gather resources to understand the *purpose* of the system", either through documentation or by speaking with project management and/or end users. If you can learn the business rules and processes, that will be an enormous help in understanding the code's design.
  • Been there... (Score:5, Insightful)

    by seanadams.com ( 463190 ) * on Friday January 18, 2008 @02:40PM (#22097548) Homepage
    There are two kinds of hard problems in programming: problems that are hard because they require ingenuity and deep thought, and problems that are hard because they require weeks of unraveling someone else's garbage.

    There are some horrible programmers out there and I have on many occasions been tasked with cleaning up their messes. In your situation I would suggest either a) try to figure out if it would take less time for you to implement it in a clean and maintainable way or b) find someone else you can hire who knows the code base or at least is more familiar with the specific problem.

    If you can't do a or b then you're screwed. In that situation, personally, I would either quit, ask for a different project, or print out the whole source code and sit back with a pen and start studying and commenting - one of the few tasks for which I still prefer dead trees.
  • by superwiz ( 655733 ) on Friday January 18, 2008 @02:57PM (#22097920) Journal
    The guy asked about a large code base. I am assuming that means on the order of at least half a million lines. Stepping through the code won't even get you into most modules of something that big. Never mind that it will do nothing to help you understand that a certain chunk of the code is a module that gets used only under certain extraordinary conditions. To be sure, what you suggest is what you do on day 1. The post was essentially asking what do you do three weeks into it after you've understood what the loop in main does and yet you still don't know what's tied to what and how.
  • Re:hmm. (Score:2, Insightful)

    by SageinaRage ( 966293 ) on Friday January 18, 2008 @03:31PM (#22098576)
    It's more like a carpenter asking for a nail gun because it's quicker, less tiring, with less change of damaging themselves. Any carpenter with any sense would ask for one, just like any coder with any sense would ask for these tools.
  • Re:Been there... (Score:5, Insightful)

    by skiflyer ( 716312 ) on Friday January 18, 2008 @03:31PM (#22098578)
    a) is so often the wrong choice and can really submarine a company because they keep getting a cycle of a)'s ... every 5th release becomes a complete rewrite as the new team says "we need a refactoring of the code, no one here is familar with it and/or it's spaghetti code, just give us 5 months we'll maintain the behavior 100% and we'll clean up a lot of bugs and we promise in the future maintenance will be a breeze"

  • by ChrisA90278 ( 905188 ) on Friday January 18, 2008 @03:45PM (#22098808)
    "That doesn't always work for a code base with millions of lines of atrociously written code. I've worked with code where it is absolutely not feasible to step through everything"

    You are correct. All these people talking about using a debugger and so on... That does NOT work on larger projects any on fairly simple ones. "Large" projects might have 250 source code files and thousands of functions or classes and likely a dozen or so interacting executable programs. I've seen print outs of source code that fill five bookcase shelves. No one could ever read that.

    I've had to come up to speed on million+ lines of code projects many times. The tool i use is pencil and paper

    The first step is to become an expert user of the software. Just run the thing, a lot and learn what it does. Looking at code is pointless untill yu know it well as a user.
  • by Anonymous Coward on Friday January 18, 2008 @03:55PM (#22098986)
    Don't knock the gotos: they have a legitimate use in environments where exceptions can't be used. I once saw a 1000-2000 line function which should have had an end block and a bunch of gotos to it (for "bail out now but clean up" type situations - you'll see these all over the Linux source). Whoever wrote the function obviously got taught not to use gotos in his programming class, so instead he put all the cleanup code into a ~100-line macro and called it wherever he wanted to bail out.

    Think about that: you can't step most debuggers through that cleanup code, you can't set breakpoints on specific parts of it, it probably won't be properly syntax-highlighted, and you have N inline copies of the code in your executable.
  • by no-body ( 127863 ) on Friday January 18, 2008 @04:15PM (#22099316)
    Run it through a profiler - giving function names, times called, cpu time used, calling hirarchy/tree

    if... there is such an animal around still for the environment in question.
  • Re:Mod parent up (Score:4, Insightful)

    by ckaminski ( 82854 ) <slashdot-nospam.darthcoder@com> on Friday January 18, 2008 @04:19PM (#22099386) Homepage
    The best way to learn the code is to start fixing some low or medium severity bugs. Something that's not a sev1 is either not so endemic to the system that changing it breaks everything, nor is it likely to be some random data corruption issue that will be impossible to find. It will be stupid user-input problems, or interaction issues.

    Most of my productive code learning was in the first three months of bug-fixing. I think that's why most newhires end up on bug fixing as a rule - it's the fast-path to comprehension.

  • by jgarra23 ( 1109651 ) on Friday January 18, 2008 @04:57PM (#22100104)

    Clearly you don't write (or at least read source for) applications of any substance as that would be mildly described as tedious if not impossible.


    I have no idea how you formulated this from parent based from 2 or 3 sentences.


    One of the best ways to understand code is to do so visually with the software equivalent of blueprints. UML is generally considered a very capable way of modeling/communicating both static structures and dynamic behavior of software.


    A lot of times a programmer is stuck without those tools for any number of reasons. A lot of times people are stuck with spaghetti code which there is no documentation or design pattern to work with. I think your answer is assuming that the planets are aligned and we live in a utopia. Do you have any suggestions for people who have to deal with reality?
  • by Anonymous Coward on Friday January 18, 2008 @06:16PM (#22101536)
    It just sucks around here, too ... some 40k lines of code plus some more in zend encoded libraries: out of those 40k of lines, 20k are in a big file that's just one big mangrove of if's; what makes it suck even more, register_globals must be ON, so I guess anything might happen anyplace.

    When I am asked when I will be ready, I just say: "I have no f****g idea, better start from scratch or fire me".

    For the fellows that said 'just read the code, loser', my answer is "it's probably one of you that took wages for three years to write such a big chunk of junk".
  • Re:Mod parent up (Score:4, Insightful)

    by Just Some Guy ( 3352 ) <kirk+slashdot@strauser.com> on Friday January 18, 2008 @06:34PM (#22101844) Homepage Journal

    By doing this repeatedly, you will quickly get a sense for which parts of the code see the most action, and would provide the most obvious places to start studying the code base, and provide the best bang-for-buck return on your time.

    If only there were some way to automatically generate this information, this "profile" of the running code, if you will.

Software production is assumed to be a line function, but it is run like a staff function. -- Paul Licker

Working...