Forgot your password?

typodupeerror
Programming Software

Learning and Maintaining a Large Inherited Codebase? 532

Posted by timothy
from the bequeathed-and-devised dept.
An anonymous reader writes "A couple of times in my career, I've inherited a fairly large (30-40 thousand lines) collection of code. The original authors knew it because they wrote it; I didn't, and I don't. I spend a huge amount of time finding the right place to make a change, far more than I do changing anything. How would you learn such a big hunk of code? And how discouraged should I be that I can't seem to 'get' this code as well as the original developers?"
This discussion has been archived. No new comments can be posted.

Learning and Maintaining a Large Inherited Codebase?

Comments Filter:
  • by Anonymous Coward on Friday February 12 2010, @08:12PM (#31122192)
    Yes it's still a bitch to maintain it. But 30k to 40k is by no means large.
  • Visualisation (Score:5, Informative)

    by gilleain (1310105) on Friday February 12 2010, @08:17PM (#31122290)

    Anything ranging from just sketching out some informal package diagrams on some paper (I quite like using an A3 sketchpad) to something more like Code City [inf.usi.ch] which can work with code in smalltalk, java, and c++. There are UML diagram makers, of course, but automated diagrams like that probably need to be edited.

    In fact, it is not the finished diagram that helps so much as the drawing of it, which is why paper and pencil is so good. Or a vector graphics package.

  • by martin-boundary (547041) on Friday February 12 2010, @08:37PM (#31122538)
    No, he meant that as an actual offering to the Perl God, Quetzal$@[&shift]L. It's a bloodthirsty god, who never sends the Divine Debugger without at least two pints of the red stuff. I would have immolated a coworker, but the parent poster seems to have been alone in the room :-/
  • by Garridan (597129) on Friday February 12 2010, @08:45PM (#31122642)

    Oh yeah, well I just inherited a code base of 2.8 trillion lines of assembly code, and I have to read it over a 12.734 baud VAX connection! Why, back in my day...

    Anyway... I've taken on a few large-scale software projects before, and my approach has always been "read twice, hack once". I agree with the the parent, and I'll add a note: for the love of everything sacred and unholy, use revision control, and don't trust it -- that is, back up incessantly. Document the hell out of your process. Once you've really learned the system, you might want to back out some of the newbie mistakes that you're making right now.

    And yes. Learning a big system takes a lot of time -- you should be reading much more than writing until you've learned it. I find it helpful to diagram dependencies / draw up finite state machines.

  • Re:Use Doxygen (Score:2, Informative)

    by eggy78 (1227698) on Friday February 12 2010, @08:47PM (#31122668)
    I have found that equally useful to Doxygen's standard documentation are the caller/callee graphs (and the source browser as well!). These features are invaluable but they don't get used when you generate documentation with a more-or-less default config.
  • Divide and Conquer (Score:4, Informative)

    by Whomp-Ass (135351) on Friday February 12 2010, @08:57PM (#31122788)

    Identify each major portion of functionality. If you are working with a sales/billing system you would probably end up with : Orders, Invoices, Payments, Admin.

    Go through each of those portions and identify the major portions. Orders: Order headers, Order details, business logic, ui logic, reports, datalayer, etc. Repeat until reduced into easily consumable units.

    Pick and stick to an SDLC. Use whatever fits the situation and the resources. For a small project (under 100k lines of code) you should be good by yourself. Anything more and you'll have to involve at least 1 other person for testing. For medium (100k-500k lines) you'll need an additional dev...For large projects (500K-5M lines) you'll need a project manager, lead dev, 2 devs, 1 test, and a UAT team...For larger projects you'll have something unique and frightening to the specifics of the software project and corporation/agency making it...anyway, I digress...

    Go through each subdivision line-by-line and re-write it yourself (even if you aren't going to put your re-written version into production); the only way you're going to truly understand what is going on is if you do it yourself. Use whatever language you are most comfortable with or is most appropriate to the task (or languages), it does not need to be the same as the original.

    Verify that for a given input, your version produces an exact output.

    Take a deep breath. It's not a race. It's a one-to-one functional mapping of your software (your mindspace) and the original software (the other developer(s) mindspace(s)). The code probably will not be straight forward. It has also been battle-scarred and will be warty. Changes of initial requirements through time and feature enhancements (feature creep) will have taken it's toll on what may have originally been something simple or even elegant. It's something of a niche mindset and if it is not for you, there exist many other exciting things to be programming.

    Ultimately, if you do as outlined above, you'll solve many problems, be able to make whatever changes you like, and in so doing have a way to present your design as a replacement if you want...Or not, if you don't; for 30-40k lines parallel development makes sense, in a way, for one person.

  • Re:Use it (Score:4, Informative)

    by mosb1000 (710161) <mosb1000@mac.com> on Friday February 12 2010, @09:12PM (#31122924)

    Not without variables, but without unnecessary ones. For example, someone might write:

    int a;
    int b;
    int c;
    int d;
    int e;
    int f;
    int g;
    a = dropBox1.Value;
    b = dropBox2.Value;
    c = dropBox3.Value;
    d = dropBox4.Value;
    e = a + b;
    f = c + d;
    g = e * f;
    result.Value = g;

    While I would write:

    result.Value = ( dropBox1.Value + dropBox2.Value ) * ( dropBox3.Value + dropBox4.Value );

  • by npsimons (32752) * on Friday February 12 2010, @09:42PM (#31123202) Homepage Journal

    As someone who has done probably 90% of his work in maintenance programming, let me give you my tips:

    • Snapshot what you get - don't change it, don't even look at it. As soon as you get it, check it in, binaries and all, to a change tracking system (eg, CVS, SVN, etc).
    • Now that you know what they gave you, and you can get back to it at any time, your options are seemingly limitless, but for the quickest way to get up to speed, I would recommend writing unit tests for the software. This will be long and tedious, but by writing unit tests you will a) learn what to expect out of the software, b) be able to tell when you break something and c) truly learn the software.
    • Automate, automate, automate! It's a close call as to whether you should start right away on your first unit test, or get the build system automated, but let me just say that it will save you a ton of time to have a "one button push" way to build, run and test the software. From there, you should be having your machine build and run the unit tests automatically, preferably nightly, from a clean checkout of the repository, just in case you forget to run a test after you change something or you forget to check something in.
    • Run the software (including unit tests) through the gauntlet - valgrind's memcheck, electric fence, fuzz, bfbtester, rats, gcc's -fstack-protector-all flag, libc's MALLOC_CHECK_=3, gcc's _FORTIFY_SOURCE=2 define, gcc's -fmudflap flag, gcc's -Wall -Wextra and -pedantic flags; any way you can think to flush out bugs, do it, and start fixing them; you will learn much, not just about the code, but about the thought process of the original coder(s) this way. Change tools as appropriate for your programming language and environment (including compiler/interpreter, libs, OS, etc). As you can tell, I do a lot of C and C++ programming.

    BTW, the fact that you have a hard time understanding this code may be more a reflection on the original authors' coding skills than on your abilities; any idiot can write code that "just works"; it takes a lot of thought, time and effort to write code that is maintainable, and more often than not, the original coders were short on at least one of those (if not all three). Here's hoping you have the time to follow my above tips; they take a lot of time, but can be worth it if you really need to maintain the code. It's funny to note that apart from the first one, most of those tips apply equally well to developing software from scratch. If the code already has a change tracking system, unit tests, a build/run/test system, *and* automated testing, consider yourself lucky and just start picking apart the unit tests.

  • Re:Large? (Score:4, Informative)

    by snowgirl (978879) on Friday February 12 2010, @09:44PM (#31123224) Journal

    Most of these machines require no more input than logging in and starting up a single app... thus no reason to install special software on them.

    Then, something would break, and I would have to read logs, and/or code on the actual box that had the exact problem. Spending an hour installing apps to do my job would be an unacceptable use of my time, and delay the build unnecessarily.

    "Then something would break" contradicts the earlier statement "no more input than logging in"

    The fact that something is likely to break, and you will need to troubleshoot it, should be reason enough in itself to install some (small) convenient, unobtrusive troubleshooting tools, as standard practice, and as part of the standard initial installs for those servers, to make troubleshooting faster and not require software installations or elaborate practices when things do break.

    You missed a part before the quote that you pulled out. "Most of the machines required no more input".

    My statements remains consistent and not contradictory when only 2 machines typically need direct interfacing.

    And small convenient, unobtrusive troubleshooting tools WERE installed as standard practice on the machines... I already said that there was dir /s /b, and findstr... do I have to have "find" and "grep" when I had tools with the same functionality?

    When I started off, there was a big learning curve because of the new tools, but by the time I left, it was as second nature to me as was find and grep when I joined.

  • Re:Large? (Score:3, Informative)

    by benjamindees (441808) on Friday February 12 2010, @09:46PM (#31123246) Homepage

    At my job at Microsoft, we were in the support end of the core os group.

    Windows doesn't really have find and grep, but it does have "dir /s /b [pattern]" and "findstr /sipc:"[pattern]""

    When I joined Microsoft, I hadn't used any version of Windows at all for any reason other than playing games.

    I did almost all of my programming at Microsoft in notepad.exe

    it took me about a month before I understood that my entire group would be replaced by a few scripts in the Open Source world.

    Dear lord, this is the most hilarious thing ever posted to /.

  • by mstockmyer (565058) <mark&markris,net> on Friday February 12 2010, @09:52PM (#31123280)
    When I joined a group that had a 2 Million SLOC program, I learned the most by fixing defects. It gave me a good reason to go traipsing through the codebase. It's painful, but it gives you purpose while reading the code. Just plain reading it gets boring.
  • by tjstork (137384) <todd.bandrowsky@ ... m minus language> on Friday February 12 2010, @10:05PM (#31123400) Homepage Journal

    he ports of GNU utilities to Windows [sourceforge.net] or Cygwin [cygwin.com] or even your own company's Interix [wikipedia.org] and Services for UNIX [wikipedia.org] products?

    I had Win7 and Vista Ent with Services for Unix I downloaded, and it just did not feel right or work right. The command line utilities work, in part, because the whole OS in Unix is basically a tree of text files. windows isn't, and so, the utilities tend to be less effective. Plus, some gotchas like how Windows handles open files with applications, its all different.

    I thought interix would be the ultimate, but it instead it taught me the opposite. If you want unix, use unix. It's that simple.

  • by BerntB (584621) on Friday February 12 2010, @10:16PM (#31123468)

    Funny you should say that.,,

    I quite like this reference from the Perl world about understanding large systems: http://www.perlmonks.org/?node_id=788328 [perlmonks.org]

  • Re:That's small (Score:2, Informative)

    by Kagetsuki (1620613) on Friday February 12 2010, @10:24PM (#31123524)
    25 Million lines compiled in 3 hours is actually pretty fast (unless you are talking about say assembing 25M lines of ASM).

    An associate of mine was working at a very high-tech electric (as in production and distribution of electricity) company. Apparently they had this very complex control system for a huge proprietary piece of hardware that was basically the core of the control rooms. It had to take in data from all sorts of different devices spread out across 100's of kilometers over a variety of proprietary protocols, make sense of all that data, try and figure out what the most likely scenarios for failures were and automatically implement control scenarios to mitigate damage or keep parts of the system running etc. So the story is the thing was written in a combination of C and assembler, and the file count alone was in the hundreds of thousands. They had two extremely beefy boxes set up to just do compiles, incremental compiles and re linking taking a few hours and clean compiles taking basically an entire work day (which is why they had two boxes, so they could start one compile after the other so different people could test their changes more often). The thing is to test their changes they actually had a small control room and a collection of devices on a grid they used to test, and to push the new binaries and data files and get a test set up would take hours as well. Needless to say most of the developers would basically just live in the office during the last month or so of development, but the facility was running 24 hours a day either way so they had a full service cafeteria, lounges, etc. all in the building. Anyway, THAT is the biggest code base I have ever heard of; and I'd bet there are quite a few similar situations around the world.
  • Re:Use it (Score:5, Informative)

    by ciggieposeur (715798) on Friday February 12 2010, @10:37PM (#31123616)

    What do you think about intermediate variables that are not strictly necessary?

    My general rules of thumb:

    1) I don't care how many variables are declared, so long as each makes sense on its own. Like another poster's example, 'fullName' is perfectly acceptable (especially for i18n/l10n aware code that may have different rules for generating a name).

    2) I ABSOLUTELY HATE clever arithmetic / pointer arithmetic / expressions all crunched into one line that can be split out. Example: in C-like languages that support pre- and post-increment, I expect the code to use only one or the other consistently, and never mix it with another expression. So this is fine:

    i++;
    j = i + 4; ...but this I can't stand:

    j = ++i + 4;

    #2 I picked up from a very experienced developer who pointed out that making the code harder to read is never worth it, the compiler produces the same code as the easy-to-read version. And that making code that looks 'too easy to be clever' is quite a bit harder than making code that looks 'too clever to always work'.

  • Re:Use Doxygen (Score:2, Informative)

    by erictheturtle (1675730) on Friday February 12 2010, @10:39PM (#31123622)

    I feel the same way as OP when trying to make sense of some open source library I'm interested in extending. Doxygen has been a big help. In the future I might also try Source-Navigator [sourceforge.net].

  • Re:Time (Score:1, Informative)

    by Anonymous Coward on Friday February 12 2010, @11:11PM (#31123838)

    Anonymous Coward rarely gets any mod points, no matter how good his/her commentary is, which is really annoying to me.

    This commentary is a Guru level advice but no mod points, and yet there will be mod points for all sorts of fluffy comments.

  • Re:Large? (Score:3, Informative)

    by StuartHankins (1020819) on Friday February 12 2010, @11:42PM (#31124048)
    Sysinternals has a great tool you can use to automate installs / run software on multiple machines at once, called psexec. Depends on whether you need to run them interactively, in which case you'd have to also script a login. In the future maybe that's a workable solution for you, especially if you have to use large numbers of computers running Windows. Without grep, head, tail, less, etc I'd feel a bit frustrated. Of course if you're discouraged from installing something that's another issue as well. If nothing else there's always group policy. YMMV.
  • Re:Large? (Score:1, Informative)

    by Anonymous Coward on Saturday February 13 2010, @12:01AM (#31124136)

    Explains a lot about MS products.

  • by bill_mcgonigle (4333) * on Saturday February 13 2010, @12:40AM (#31124370) Homepage Journal

    truly learn the software.

    And then if your unit tests work you'll know enough to comment the code correctly for the next time you or your successor comes back to it.

  • Re:Large? (Score:1, Informative)

    by Anonymous Coward on Saturday February 13 2010, @12:51AM (#31124420)
    I used to work in a similar environment in a university. Tons of windows machines, that I didn't have admin access to. I just carried a usb with me with all sorts of tools that didn't require any more access than a user would have. Seriously borland made a grep for dos that was 7 k back in the 90's. It doesn't sound like you were very creative, but your story does illustrate why the lack of decent command line tools *by default* sucks.
  • by chentiangemalc (1710624) on Saturday February 13 2010, @05:39AM (#31125514) Homepage
    you're using windows 7 and batch files? use powershell, more powerful than any of the unix based shells (that i've seen) and there must be something wrong with your system...or non-OS processed using up CPU & memory....because i've used windows 7 on below minimum spec machines ( 1 GHz CPU and 512 MB ram and the command prompt was still very responsive.)
  • by Hal_Porter (817932) on Saturday February 13 2010, @08:08AM (#31126060)

    Source Insight [sourceinsight.com] lets you browse source code - very useful for largish codebases. It's much quicker than findstr or grep because it has an index rather than having to search the whole thing. It's not free of course but I'd never go back to findstr having used it.

It's NO USE ... I've gone to "CLUB MED"!!

Working...