Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Software

Tools For Understanding Code? 383

ewhac writes "Having just recently taken a new job, I find myself confronted with an enormous pile of existing, unfamiliar code written for a (somewhat) unfamiliar platform — and an implicit expectation that I'll grok it all Real Soon Now. Simply firing up an editor and reading through it has proven unequal to the task. I'm familiar with cscope, but it doesn't really seem to analyze program structure; it's just a very fancy 'grep' package with a rudimentary understanding of C syntax. A new-ish tool called ncc looks promising, as it appears to be based on an actual C/C++ parser, but the UI is clunky, and there doesn't appear to be any facility for integrating/communicating with an editor. What sorts of tools do you use for effectively analyzing and understanding a large code base?"
This discussion has been archived. No new comments can be posted.

Tools For Understanding Code?

Comments Filter:
  • Doxygen (Score:5, Informative)

    by Raedwald ( 567500 ) on Friday January 18, 2008 @12:39PM (#22094886)

    For C++ code, Doxygen [stack.nl] can be useful, as it shows the class inheritance. As requested, it uses a (rudimentary) parser. It works with several other languages too, although I can't vouch for its utility for them.

  • doxygen (Score:3, Informative)

    by greywar ( 640908 ) on Friday January 18, 2008 @12:41PM (#22094922) Journal
    If its in a language that doxygen can understand, thats the tool I would HIGHLY recommend.
  • Understand C++ (Score:5, Informative)

    by SparkleMotion88 ( 1013083 ) on Friday January 18, 2008 @12:43PM (#22094978)
    Sorry I don't have an open source tool for you, but I've used Understand for C++ [scitools.com] in the past and it was pretty helpful. To me, the most useful piece of information for understanding a large codebase is a browseable call graph. I'm sure there are simpler tools out there that generate a call graph, but this is the only one I've used with C++.
  • RR & EA (Score:3, Informative)

    by Anonymous Coward on Friday January 18, 2008 @12:44PM (#22094988)
    Sometimes tools like Rational Rose [ibm.com] or Enterprise Architect [sparxsystems.com.au] are successful at reading in the code an building a UML model that you can then attempt to parse through. I'm not familiar with the use of either, but I know it can be done, with mixed results depending on the size and complexity of the code being analyzed. Both tools are fairly expensive though, I believe.
  • lxr (Score:1, Informative)

    by Anonymous Coward on Friday January 18, 2008 @12:45PM (#22095026)
    I often use LXR for understanding the kernel, but have used it for other large code bases. If you pair it with some sort of sticky note firefox add-on it becomes particularly useful.

    http://lxr.linux.no/ [linux.no]
  • What I do (Score:5, Informative)

    by laughing_badger ( 628416 ) on Friday January 18, 2008 @12:48PM (#22095078) Homepage
    SourceNavigator : A good visualisation package http://sourcenav.sourceforge.net/ [sourceforge.net]

    ETrace : Run-time tracing http://freshmeat.net/projects/etrace/ [freshmeat.net]

    This book is worth a read http://www.spinellis.gr/codereading/ [spinellis.gr]

    Draw some static graphs of functions of interest using CodeViz http://freshmeat.net/projects/codeviz/ [freshmeat.net]

    Write lots of notes, preferably on paper with a pen rather than electronically.

  • by Mr.Bananas ( 851193 ) on Friday January 18, 2008 @12:50PM (#22095130)
    I use Doxygen for C code, and it is really helpful. One of its most useful features is that it generates caller and callee graphs for all functions. You can also browse the code itself in the generated HTML pages, and the function calls are turned into links to the implementation. Data structures and file includes are also pictorially graphed for easy browsing.

    If the system you need to understand has a really big undocumented architecture, then this presentation [uwaterloo.ca] might be useful to you (there is a research paper, but it's not free yet). In it, the authors present a systematic method of extracting the underlying architecture of the Linux kernel.
  • GNU Global (Score:4, Informative)

    by Masa ( 74401 ) on Friday January 18, 2008 @12:50PM (#22095134) Journal
    GNU Global is able to generate a set of HTML pages from C/C++ source code. This tool has helped me several times. All member variables, functions, classes and class instances are hyperlinks. It provides an easy way to examine source code. It also provides tags for several text editors (for Vim and Emacs especially). http://www.gnu.org/software/global/ [gnu.org]
  • Re:Doxygen (Score:3, Informative)

    by zeekec ( 795504 ) on Friday January 18, 2008 @12:59PM (#22095324)
    Doxygen can produce UML diagrams for undocumented code. (UML_LOOK and EXTRACT_ALL)
  • by Anonymous Brave Guy ( 457657 ) on Friday January 18, 2008 @01:01PM (#22095364)

    I'm afraid you've set yourself an almost impossible task. IME, there are no shortcuts here, and it it's going to take anywhere from a few months to a couple of years for a new developer to really get their head around a large, unfamiliar code base.

    That said, I recommend against just diving in to some random bit of code. You'll probably never need most of it. Heck, I've never read the majority of the code of the project I work on, and that's after several years, with approx 1M lines to consider.

    You need to get the big picture instead. Identify the entry point(s), and look for the major functions they call, and so on down until you start to get a feel for how the work is broken down. Look for the major data structures and code operating on them as well, because if you can establish the important data flows in the program you'll be well on your way. Hopefully the design is fairly modular, and if you're in OO world or you're working in a language with packages, looking at how the modules fit together can help a lot too. Any good IDE will have some basic tools to plot things like call graphs and inheritance/containment diagrams, if not there are tools like Doxygen that can do some of it independently.

    If you're working on a large code base without a decent overall design that you can grok within a few days, then I'm afraid you're doomed and no amount of tools or documentation or reading files full of code will help you. Projects in that state invariably die, usually slowly and painfully, IME.

  • by Yiliar ( 603536 ) on Friday January 18, 2008 @01:04PM (#22095458)
    See:
    http://www.stack.nl/~dimitri/doxygen/ [stack.nl]
    and:
    http://uml.sourceforge.net/index.php [sourceforge.net]

    These tools allow you to 'visualize' a codebase in several very helpful ways.
    One important way is to generate connection graphs of all functions.
    These images can look like a mess, or a huge rail yard with hundreds of connections.
    The modules, libraries, or source files that are a real jumble of crossconnected lines are a clear indication of where to start clean up activities. :)

    Good luck!
  • Re:What I do (Score:1, Informative)

    by Anonymous Coward on Friday January 18, 2008 @01:08PM (#22095536)
    Use sourcesinsight for C/C++/java/C#/perl/ksh/etc programming languges. It is very light, yet powerful IDE and could be used to browse thru code.

    I have used for code bases more than 3000 C/C++ files and yet the IDE behaved well -- jusk like Eclipse for java platform, and consumes very less memory
  • Re:Stepping Through (Score:3, Informative)

    by The_reformant ( 777653 ) on Friday January 18, 2008 @01:09PM (#22095570)
    Absolutely since joining the real world I have found the visual studio debugger my most prized tool. Somehow I managed all through my degree to never come into contact with one (probably because all the free ones are rubbish and most schools won't shell out for visual studio). I now extol the virtues of debugging to all and sundry!
  • Where be dragons? (Score:2, Informative)

    by mm4 ( 1089615 ) on Friday January 18, 2008 @01:24PM (#22095862)
    Apart from Understand for C++, I'd also suggest SourceMonitor - http://www.campwoodsw.com/sm20.html [campwoodsw.com] It will at least quickly point you to potentially problematic parts (long functions, deep nesting, etc.).
  • Browse-by-Query (Score:3, Informative)

    by mmacdona86 ( 524915 ) on Friday January 18, 2008 @01:32PM (#22096040)
    I'll plug my own open-source project for this:
    Browse-by-Query [sourceforge.net]-- it won't help with C/C++(sorry for the original questioner), but it will handle Java or C#.
    It dumps the code into a database and lets you query it to find the relationships.
    I'm biased, of course, but I've found it's just the thing to understand how a particular piece of functionality in an unfamiliar code base fits into the big picture.
  • Re:Mod parent up (Score:5, Informative)

    by Lazerf4rt ( 969888 ) on Friday January 18, 2008 @01:53PM (#22096506)

    Fire up the application, start it on one of its typical tasks, and then interrupt it in the debugger to catch it. While the process is stopped mid-flight, take note of the call stack.

    Good advice -- breaking randomly. However, it works best in CPU-intensive applications. If the app is mostly idle and event-driven, you're best off searching the code and looking for a place to set breakpoints.

    Also, when I use the debugger to help understand some new code, often I'll open a text file and build a "trace" as I go. As I explore things in the debugger and find new call stacks, I add more detail to the trace, in a hierarchical (indented) style. Then I save the traces in case I forget something later.

    As for the original question, I would recommend staying focused. Don't go all over the program trying to understand every system at once. Pick a specific part you really need to understand (say, based on a task you have to do) and focus on understanding that.

    Unfortunately, the best tool for understanding code is experience. Not theory and not some fancy visualization program. Once you've seen a lot of different code, you come to recognize what each person was thinking when they wrote it. Once that kind of thing comes easily, you no longer find it necessary to bitch about each different programmer's coding style (as some do). So in a way, the guy who posts this question is lucky to have such a big pile of code in front of him.

  • by davide marney ( 231845 ) on Friday January 18, 2008 @01:53PM (#22096512) Journal
    If your project is object oriented, you may be able to get your UML modeling tool to import the code and visualize the classes. When you do this, you'll probably get a HUGE diagram that seems just as unwieldy as looking at the code. The trick is to apply a filter to the model, so you're not overwhelmed with detail. Your UML tool should be able to do that for you.

    I recommend focusing on all interface classes first. This can give you a remarkably sane picture of a system, and will help you divide up the code into more conceptually meaningful chunks.

    The tool I use is Enterprise Architect [sparxsystems.com], which does quite a lot of heavy lifting yet is still inexpensive enough for me to own a personal copy.
  • Re:Doxygen (Score:5, Informative)

    by Bill_the_Engineer ( 772575 ) on Friday January 18, 2008 @02:10PM (#22096854)

    Doxygen I thought did java-doc like parsing for C++? I was thinking he should look for something able to build a UML diagram based on the code... I hate UML, but if there isn't any documentation telling you the structures of the code it might be a place to look.

    Doxygen is more than a javadoc replacement.

    I like Doxygen + Graphviz. Just set Doxygen to document all (instead of just the code with tags) and set it to generate class diagrams, call trees, and dependency graphs and allow it to generate a cross reference document that you can read using your web browser. Set the html generator to frame based, and your browsing of code will be easier. I would also set Doxygen to inline the code within the documentation.

    I've use Doxygen to reverse engineer very large programs and had good luck with it. I will say Doxygen is not going to do all your work for you, but it will make your job easier. Especially if you add comments to the code as you figure each section out.

    Now if you like to see the logical flow of each method then try JGrasp (jgrasp.org). It has a neat feature called CSD that allow you to follow the logic of the code a little better. It's a java based IDE so that may be a turn off for you. I do whole heartedly recommend the Doxygen (w/ Graphviz).

    Good luck.

  • Re:Absolute tosh ! (Score:3, Informative)

    by mabhatter654 ( 561290 ) on Friday January 18, 2008 @02:11PM (#22096886)
    I'd agree. He's being considered a "code monkey" and not a software engineer. Typical situation is that they'll drop some random user problem on his desk after a week "to familiarize himself" then expect him to figure out what program it is and why it broke and suggest a process improvement. Then tell him he's all wrong because "they already tried that 5 years ago."

    The question he's trying to answer is what does the code "do"? why does it exist? what problem does it solve? When you inherit some homegrown ERP system for example, it's easy to find a bug in a routine... not so easy is why input from program A is displayed wrong in program E that is processed by B, C, & D then stored for a week. He's looking for a quick picture of what it all looks like.. in 90% of cases nobody has that info for the CURRENT version of their homegrown system...they might have made the flowcharts, data dictionaries, and code books years ago, but nobody keeps them current.. and DOCUMENTED. How do you get enough info in a short amount of time?
  • Re:Doxygen (Score:4, Informative)

    by mhall119 ( 1035984 ) on Friday January 18, 2008 @02:28PM (#22097296) Homepage Journal

    Only problem is, it is a pain to configure. Also, windows versions don't look very stable.
    Windows version has been very stable for me, I've not had any problems with either Doxygen or Graphviz. It also includes a configuration wizard that is both easy to understand and powerful. There is also an Eclipse plugin that lets you configure and run Doxygen.
  • More than tools (Score:5, Informative)

    by sohp ( 22984 ) <.moc.oi. .ta. .notwens.> on Friday January 18, 2008 @02:31PM (#22097350) Homepage
    The best tool is your brain, applied liberally. Here's some thoughts to put in it

    Feathers, Michael. Working Effectively with Legacy Code [amazon.com], Chapter 16 especially.

    Spinellis, Diomidis. Code Reading: The Open Source Perspective [amazon.com], Chapter 10 lists some tools for you.

    My own thoughts now. First, don't trust the comments, they are probably outdated. Second, if it's a big code base, forget the debugger. Write some little unit test cases that exercise the sections of code you need to understand, and assert what you think the code is supposed to do.

    Finally, unless you are cursed with a codebase which is not kept in version control (in which case, ugh, time to start the jobhunt up again maybe), then take a look at the revision history. See what changes have been made to the area you are working on. With luck, someone will have put in a revision message that points you towards greater understanding of why a change was made, which will in turn nudge you towards knowing the purpose of the section of code that was change.

  • My main tool for figuring it all out was to use exuberant ctags [sourceforge.net] to create a tags file, and Nedit [nedit.org] to navigate through the source under Solaris, with a little grep thrown in. I also used gdb with the DDD [gnu.org] front-end to do a little real-time snooping.

    I've since added both cscope [sourceforge.net] and freescope [sourceforge.net], as well as the old Red Hat Source Navigator [sourceforge.net] for good measure.

  • Re:Stepping Through (Score:3, Informative)

    by Nethemas the Great ( 909900 ) on Friday January 18, 2008 @04:31PM (#22099618)

    Clearly you don't write (or at least read source for) applications of any substance as that would be mildly described as tedious if not impossible.

    One of the best ways to understand code is to do so visually with the software equivalent of blueprints. UML is generally considered a very capable way of modeling/communicating both static structures and dynamic behavior of software. There exist any number of tools that are capable of reverse-engineering existing source into UML. Two tools that I consider to be more capable than others are IBM's Rational Rose [ibm.com], and No Magic's MagicDraw [magicdraw.com]. If commercial products aren't a possibility there are likely a number of open-source/free tools--though likely of lesser ability--available. A Google search on "reverse engineering UML" should point you at some.

  • Re:Stepping Through (Score:2, Informative)

    by orclevegam ( 940336 ) on Friday January 18, 2008 @05:01PM (#22100208) Journal
    Yes, I know the difference between their, they're and there. I noticed the mistake after I posted it. It's one of the few mistakes I seem to be prone to in writing.

    As for using "lines of code", I don't, they do. It seems the biggest issue they have with rewriting code (or refactoring if you prefer) has something to do with the way it's budgeted and accounted for. Apparently adding new code/features to a project comes out of a different budget, than replacing or repairing already existing code does. Don't ask me why that is, I just know whenever we've tried to push to replace some horrendous piece of code they would tell us it wasn't in the budget, and as long as the code ran we weren't allowed to change it. We had to work our way around the bean counters eventually by carefully picking features to implement that touched on code we wanted to replace, then as part of implementing the feature we would rip out and rewrite the code we wanted to.
  • Source Insight (Score:3, Informative)

    by Effugas ( 2378 ) * on Friday January 18, 2008 @06:22PM (#22101634) Homepage
    It's inexpensive, and scales astonishingly. I've spent the last two years in it, and it's just how I audit code nowadays.
  • by Anonymous Coward on Friday January 18, 2008 @06:36PM (#22101870)
    red hat source navigator - nice free tool for groking c/c++
  • Re:Stepping Through (Score:5, Informative)

    by plover ( 150551 ) * on Saturday January 19, 2008 @03:31AM (#22106278) Homepage Journal
    (Warning: you asked!)

    Well, the learning curve is certainly important in the real world, although I expect a professional to know his or her tools before they arrive on the job. But there are a metric crapload of things I like better about Visual Studio that make it a much more effective debugger than gdb, in my opinion. (Note that I am not a big gdb user, so I may be cutting it a bit short in the feature set here. My apologies in advance if I do so.)

    Things I've found I prefer include many tool windows simultaneously showing the states of registers, memory, the call stack, an object or seven (expanded to show a few properties), and automatic resolution of virtually every symbol and name, including the operating system (although you have to download the symbol files for your OS version from Microsoft.) And you still have full navigation through the source.

    Simply hovering the mouse over a symbol will bring up a tool-tip to display the contents. If you highlight an entire expression such as pFoo->pBar->Blah.count+7 and hover, the tooltip will display the calculated result.

    You can set a temporary breakpoint by setting the cursor on a line of code and clicking "run to cursor." You can run, single step, run to the current cursor, or run till function return. That last one is great for re-entering a function multiple times to test different conditions.

    The variables window contains the current call stack as a dropdown list -- changing the stack lets you see the newly-local variables. Watch windows can display data as hex or decimal, just right click and select. Watch entries can even be used as calculators (enter a literal value, such as 0xf0 + 12, and it will display the results.)

    In the watch windows, you can also call arbitrary functions (good for testing without driving your code to that point) or other functions in your memory space, such as the C runtime memory checkers. If you're trying to track an errant pointer, create a debug build, start running and break, type _CrtCheckMemory() into a watch window, and every time the watch window is refreshed, it will check all your fenceposts. You might get lucky and spot your corruption as it happens. The /GZ compiler option will perform a similar task at the function level, but this would let you do it at a line level.

    There are also dozens of possible formats it can display your watch variables in -- suffix a pointer with ,s and it'll display the contents as an ASCII string. Only see one byte because of Unicode? Suffix the pointer with ,su and you'll see the unicode string. A ,wm suffix displays window messages by name. ,hr suffix displays HRESULTs by name.

    The memory windows will highlight in another color any data that's changed since the last time it was refreshed, whether it be a single step or a previous breakpoint. You can have memory displayed as bytes, shorts, or longs. And with the newer visual studios, you can have multiple memory windows, so you can keep track of two, three or four arrays simultaneously. You simply drag and drop them wherever they're convenient, then step through the code and watch for colored variables indicating change.

    Again, all these windows are automatically updated every time the debugger drops from the program to your control. I've got two 17" monitors, and I can fill them both. The problem with debugging is that sometimes you are really starting blind, and the faster you can get more information, the less time you waste debugging.

    There's a cute "magic trick" I like to show people with the memory window and the disassembly window. Let's say you've had a crash, and attached the debugger to the running program. You're looking at a corrupt stack in the call stack window -- just one line of garbage data. What to do? Where did it break? Enter @ESP in the memory window. Change the view to 'long' and it displays the memory as 8-digit numbers. If y

  • Re:Stepping Through (Score:3, Informative)

    by plover ( 150551 ) * on Monday January 21, 2008 @05:32PM (#22130872) Homepage Journal
    Thanks for the compliment!

    I used to teach a course in debugging with Visual Studio, and I basically trawled through my syllabus looking for the cool tricks. Using the stack-crash demo to drop into the source code of the crashing module is a real attention-grabber.

    I found debugging in gdb to be a lot like debugging in WinDBG. You have to learn a lot of esoteric commands that you don't use very often, so it takes a lot of practice to learn them. And if you aren't constantly searching for the side effects of each step, you can miss a valuable clue. Seeing the color change on watched values that have changed is a great way to pick up on otherwise subtle corruptions. Seeing an entire object's value hierarchy go red because you munged its pointer really stands out, at least to my eye.

    Here is the bibliography and recommended references from my syllabus. It's pretty out of date these days (I especially miss the C/C++ Journal,) but the references are still good if you can find them.

    Bibliography

    Debugging Applications by John Robbins (Microsoft Press, 2000, ISBN 0-7356-0886-5)

    Visual C++ Development Stunts by Mike Blaszczak (Lecture at Tech Ed 98)

    How to Debug Quickly and Effectively with Visual Studio 97 by Martyn Lovell (Lecture at Tech Ed 98)

    MSDN Library Visual Studio 6.0 (Microsoft, 1998), see also
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnvc60/html/memleaks.asp

    Customising Autoexp_dat.doc, by EMCC Software, see http://www.emccsoft.com/devzone/tools/devstudio.html
    The Bugslayer by John Robbins (Column in Microsoft Systems Journal / MSDN Magazine)
    MFC mailing list, 1998-1999 (see http://www.microsoft.com/workshop/essentials/mail.asp)
    2600 Magazine, Finding and Exploiting Bugs, Spring 2000, (2600 Enterprises, Inc.)

    Resources

    MSDN Library Visual Studio 6.0 (Microsoft, 1998), Visual C++ Documentation, Using Visual C++, Visual C++ Programmer's Guide, Debugging. Actually, the whole of the MSDN Visual Studio Library is my number one resource for Windows development. Lean on the F1 key first. It's available online at: http://msdn.microsoft.com/library/default.asp

    Microsoft Knowledge Base: http://search.support.microsoft.com/kb/c.asp

    MSDN Magazine (the combined former Microsoft Systems Journal and MIND Magazine) is the official Microsoft developer's publication. You'll find all the current and upcoming Microsoft acronyms detailed here. The writing is usually top notch, but the content is usually based on the over-hyped acronym of the month, and is frequently too specialized to be of real value. Since about 2000, it's been the official mouthpiece of .NET. But, it's pretty much required reading to stay on top of what Microsoft is rolling out the door next month.

    C/C++ Users Journal is a cutting edge independent magazine that offers the latest developments in C++ techniques, STL work, exception handling research, and C++ language development. It has articles written to many levels, from beginners to experts.

    Dr. Dobb's Journal is another independent magazine that more broadly approaches development with a wider variety of tools including C, C++, Java, perl, and Python. It is very strong in any subject it touches, but it is not Windows specific (it has a definite UNIX slant and a frequent anti-Microsoft bias) and some of it will be of less value to a Windows-only programmer.

Arithmetic is being able to count up to twenty without taking off your shoes. -- Mickey Mouse

Working...