Learning and Maintaining a Large Inherited Codebase? 532
Posted
by
timothy
from the bequeathed-and-devised dept.
from the bequeathed-and-devised dept.
An anonymous reader writes "A couple of times in my career, I've inherited a fairly large (30-40 thousand lines) collection of code. The original authors knew it because they wrote it; I didn't, and I don't. I spend a huge amount of time finding the right place to make a change, far more than I do changing anything. How would you learn such a big hunk of code? And how discouraged should I be that I can't seem to 'get' this code as well as the original developers?"
don't feel bad at all (Score:5, Insightful)
So you have been handed the steamin' pile o' code, it is great that you are very cautious and deliberate when modifying it. Make a set of regression tests, that is, make a set of test data and procedures and expected results to ensure original functionality that is still desirable is still working and no other errors introduced. It is hard, much more tedious than just creating new code with few constraints.
Use Doxygen (Score:5, Insightful)
Doxygen is your friend. run it over the source code and keep the HTML handy for searches and cross references.
Not lots of code (Score:5, Insightful)
First of all, 30-40,000 lines of code is not lots of code. Try, 250,000 of code.
To start, use a good programming editor/environment (Xcode, Vslick, Visual Studio, etc.) that gives you the ability to easily go to definition or references to variables, functions, structs and such. Run some sort of profiler or flowchart type program on it to get a high level view of the code and how it fits together. If you can get the person(s) who worked on it before you to give you an idea of it fits together.
Not at all. (Score:5, Insightful)
Re:A good starting point (Score:3, Insightful)
Trace sessions and time (Score:5, Insightful)
I'll echo some earlier comments.
Set up an execution environment with debugger, and run several typical scenarios and trace them with debugger. Get the feel of the big-picture execution scenarios/paths.
It will take time for your brains to get comfortable with it, though. And the details, when you look into them, will throw odd stuff at you. But that's the nature of our work.
Tried and True (Score:3, Insightful)
The time and money you spend tracing and inserting noodles in the spaghetti will end up being larger than the time it takes to cook a new batch (no pun intended).
For auto folks...
The time and money you spend bondo-ing, welding, rewiring, duct-taping, and C'n'Cing parts for the car will end up being larger than the time it takes to design and build a new car. (Although restoring an old/vintage car for the sake of nostalgia is a much more pleasing experience than buying a new one).
Gain an understanding of the purpose of each pivotal region. Know what your desired result should be, then begin the rewriting endeavor.
Large? (Score:3, Insightful)
Ha, ha! Just 4 months ago I joined a project with a code base of about 500k lines. I would call that (the 500k lines one) intermediate in size. There are code bases with many millions of lines. I now feel pretty comfortable finding things in it. And I mostly use find and grep.
Hope your management understands (Score:4, Insightful)
I have inherited projects and do my best to convince management that a pause is needed to document the code. Personally I try to flowchart the functionality and cover a couple of office walls with Visio printouts. Later on I can use such work to add detail and further documentation.
I inherited some code where the developer used names of girlfriends in variable names, it was just dumb and completely unprofessional. I didn't worry so much about keeping track of those, I was more worried about a change in one spot having unintended (and perhaps unknown until too late) consequences. Rather than spend time fixing problems, I thought it best to do some up-front documenting to at least provide a path to successful maintenance.
When I left the project, the manager had a binder of documentation and almost cried.
Try to learn the structure (Score:5, Insightful)
Try to understand the structure of the program. What is the basic flow? It should have an initialization routine, a main loop, and a shutdown routine. Find out roughly where they are, then focus on the main loop. Usually there will be one piece of code that is central, and it will occasionally pass control into other large pieces of the program. Sometimes there will be more than one main loop, and control switches back and forth between the various main loops. If the program is event drive, this will make a difference in the structure.
If you are just trying to make a small change, try to find the sequence of events that will lead up to where that change needs to be made. Follow the sequence of execution until you get to the line you need to change. If you are changing a single variable, sometimes it's helpful to do a search and find all the places that variable is used, to make sure your change won't have any side effects. This may seem time consuming, but it can save 10 times more in debugging.
Learn to follow code execution with your eyes, without running a debugger. One thing that separates good coders from not so good coders is the ability to follow code that isn't being executed.
Re:Not at all. (Score:5, Insightful)
Man, always when I run out of mod points.
Nothing like being handed a steaming plate of spaghetti and hearing about how much of a "genius" its creator was.
Re:Large? (Score:5, Insightful)
Ha, ha! Just 4 months ago I joined a project with a code base of about 500k lines. I would call that (the 500k lines one) intermediate in size. There are code bases with many millions of lines. I now feel pretty comfortable finding things in it. And I mostly use find and grep.
At my job at Microsoft, we were in the support end of the core os group. That meant that core os wrote WinXP, Server 2003, Vista, etc, and then it got completely moved over to us to maintain.
Unfortunately, Windows doesn't really have find and grep, but it does have "dir /s /b [pattern]" and "findstr /sipc:"[pattern]"" Once I learned those, that's a lot of what I used to find the code that I needed to fix.
All I can say is that it takes time, and effort to become familiar... and you're just stuck with it.
Re:Not lots of code (Score:5, Insightful)
First of all, 30-40,000 lines of code is not lots of code. Try, 250,000 of code.
To start, use a good programming editor/environment (Xcode, Vslick, Visual Studio, etc.) that gives you the ability to easily go to definition or references to variables, functions, structs and such.
30-40,000 lines can be lots of code, it really depends on how maintainably it is written. I've had to pick up codebases that were somewhat smaller but were still diabolical ... good programming environments don't buy you much when the code consists of functions that are many thousands of lines long making little or no use of typedefs or structs (arrays and lots of variables should be enough right?) and convenient variable names like 'e', 'ee', and 'eee'. Even small codebases can become practically incomprehensible if written with little thought given to long term maintenance.
Re:Large? (Score:0, Insightful)
Are you Microsofties really so stupid and ignorant that you're not aware of the ports of GNU utilities to Windows [sourceforge.net] or Cygwin [cygwin.com] or even your own company's Interix [wikipedia.org] and Services for UNIX [wikipedia.org] products?
Re:30 to 40 thousand lines isn't large by any meas (Score:3, Insightful)
I currently maintain several million lines of perl. It's not hard, it mostly just works, and when it doesn't, it's not that hard to figure out where it's broken IFF there is a consistent repro case for the problem.
If you have a proper development/production divide, there shouldn't be any weird production issues unless you or your predecessor missed some test cases. If you don't have test cases, that's a problem, if you don't have a properly firewalled and complete development environment, that's a problem, the code itself? Shouldn't be a problem.
Re:Not at all. (Score:4, Insightful)
Re:Large? (Score:1, Insightful)
What the hell? Are you serious?
So Microsoft themselves hired you to work on Windows, although you were a Mac user and had absolutely no real experience with Windows?
Not only that, but you had to manually log in to hundreds of systems just to run a script? They didn't push for this to be automated, and you tossed back on the street where you belong? What the hell?
Don't get me wrong, I don't doubt that your story is true. It's the sort of shit that we should expect from any large company, especially Microsoft. Please tell me you're an H1B, though. At least then it'd make some sense why they'd hire you. H1Bs typically aren't worth more than a batch file.
Re:Time (Score:5, Insightful)
Everyone, including me, always wants to go for the clean rewrite. But in my experience it almost never turns out for the best. There's a reason for all that messy code. Much of it was bug fixes that real-world users needed. Other complexities were needed in the first place to make the user experience simple (natural, giving it that "hey, it's just works like I expected" feeling).
The reason you don't understand the code is that you weren't part of the original design discussions, in which weeks or months were spent learning, debating, arguing, etc., about many different design decisions at many different levels of abstraction. You don't know why the trade-offs were made. You just see the finished product.
Rewriting the code won't give you insight into any of this. Learning the code the hard way, fixing bugs, rewriting *small* pieces and seeing what breaks the regression tests, etc. will eventually help you to understand it.
There is no point in rewriting it before you fully understand it. Attempting that can kill a product. Conversely, by the time you fully understand it, there won't be any need to rewrite it, because you'll own the code.
Re:Large? (Score:1, Insightful)
Mentioning that you work/have worked for Microsoft on Slashdot is one of the quickest ways to a flaming.
Re:Tried and True (Score:4, Insightful)
Re:Use it (Score:4, Insightful)
What do you think about intermediate variables that are not strictly necessary?
Use them if they make things clearer for someone reading the code, otherwise don't. For example, you can write:
screen.displayName = user.firstName + user.lastName;
or you can write
String fullName = user.FirstName + user.lastName;
screen.displayName = fullName;
Thus making it more clear to someone reading that you are trying to use the full name. That is probably not the best example because anyone would probably understand that user.firstName + user.lastName is the full name, but I think you can see the main point, that sometimes it can be easier to read a few meaningfully named intermediate variables than a long equation. If it isn't easier to read, don't do it. But really when I read code, or even write it, I am willing to conform to either way of doing it if someone else feels strongly about it, because that is far less important than things like flexibility of major structures in the code.
Re:don't feel bad at all (Score:5, Insightful)
I have inherited huge code bases. I actually kind of like it. Lots of people whom I thought were idiots, and cursed their code, I later found out that they were quite smart. Others, I found that they just thought about problems vastly different than I, and learning how they tackled problems gave me many more tools in my personal arsenal.
That said, find a big wall or something. Use a debugger or code analysis tool to find the main execution paths (what calls what and when, etc). Diagram that up on the wall really large. Then use the tools to determine when and why certain auxiliary functions get called. Diagram that up, and you'll start getting a spider on your wall. Go from there using your new understanding to re-arrange the program flow not in terms that make sense to you, but rather seem to be how they are programmed (functional, objective, some pattern). Rinse and repeat until you know pretty much what the code is trying to accomplish in 90+% of the situations, and it's general plan for attack.
With that diagram, dive in! There's tons of little details in every function that look useless but are usually bug fixes. Use a scalpel, not a hatchet.
I was deployed remotely with no way for the main programmer to get at me. We had prepared 9 months to collect 4 minutes of data, and the test wouldn't wait for us. I found an odd bug hidden somewhere in ~22k lines of code. I did this over a weekend, and found about 4-5 nasty bugs that were combining to produce what I was seeing, and fixed them. I did this with zero input or help, over a weekend in code I had never seen spread around about 60 files. I spent the first half day just diving in and trying things, and nearly shot myself. That's when I went high-level and dug in from there.
When that was done, I the took over code maintenance and updates on that project. The other guy had wrote it 100% himself, but because after that exercise I knew the code better him. Sometimes being new is good; you don't have all that cruft of implementations that didn't work, etc, but still linger in the original programmer's head.
Re:Not at all. (Score:2, Insightful)
Nothing like being handed a steaming plate of spaghetti and hearing about how much of a "genius" its creator was.
I always thought clever code was code that everyone could understand, not code that no-one could understand.
It’s like Blaise Pascal’s apology for writing a long letter because he didn’t have the time to make it shorter: it’s often easier to produce some grandiose design that treats anything awkward as a special case than it is to identify a simpler, more consistent underlying concept and then write simpler code to model that.
Re:That's small (Score:1, Insightful)
You call *that* large? (Score:2, Insightful)
Re:30 to 40 thousand lines isn't large by any meas (Score:3, Insightful)
well that depends on how many developers we are talking about. The original question seems to indicate that the author has inherited the codebase. The need for this question wouldn't exist if the person were on some large team.
For one or two or five people, 40K lines is a sizable codebase, especially if it has been poorly maintained / designed.
Re:30 to 40 thousand lines isn't large by any meas (Score:5, Insightful)
I am currently working with a mission-critical codebase, which is written in PHP and has absolutely no cohesive design to it. Well, unless you consider making everything static and unnecessarily inheriting other classes and overwriting static variables willy-nilly a cohesive design. There are business rules just everywhere and API requests everywhere and all kinds of calls that overwrite static variables. If you don't methodically trace logic it's really easy to get lost. What makes it worse is that there are many many variables that are named very similarly and you don't really know which one is right and which one is just going to get overwritten in some method call you are not looking at right now. And if this software fails, the worst case scenario is that my company makes no money. It really has made my life over the last few weeks pretty horrid. Fortunately I enjoy the job and the co-workers and am well respected there. Otherwise, it wouldn't be worth the aggravation.
My advice: communicate your difficulties to everyone who will listen (refrain from complaining or bellyaching, just communicate). If you inherit something like this, and it is mission critical, then you need to take as long as it takes to get it right. That's right, AS LONG as it takes. Take the time to document everything. Bother the crap out of anyone who can help you. You are responsible for doing your job, and part of doing your job is figuring out how to maintain this beast. And in order to do that, you need to use every resource at your disposal. If anyone wants to rush you along, you need to communicate the difficulty and the importance of the task. If you have been working at a place for a while and have done a good job to date, then they should trust you. If you're brand new, then you'd better hope someone there values your opinion and doesn't merely think you are incompetent. If you are asked to make enhancements, don't refactor until you understand the code. So make enhancements, leaving the potentially crappy code in place, even copying it if necessary. Steadfastly resist the temptation to refactor until you understand the entire piece that you are trying ti refactor. Don't remove seemingly unnecessary variables, and don't reduce seemingly redundant database calls. That comes later when you actually know what you are doing in there. IOW, if you have to navigate a lion's den by touch, don't stop to groom the sleeping lion (unless of course, that is your given task.)
The word inherit seems to imply that either the original maintainer no longer works there or has moved on to a different position. This means that it's you on the hook to figure it out. You've gotta dig in, buckle down, and get to it.
Re:30 to 40 thousand lines isn't large by any meas (Score:2, Insightful)
That *totally* depends on the code base and the way the OP thinks. Sometimes they're a complete waste of time. Others...not so much.
I've worked with plenty of programmers who see pretty much every software problem in terms of FSMs. One size does not fit all.
Re:Not at all. (Score:3, Insightful)
Then again, the creator MAY have been a genius. Perhaps he was told "Put this enormous program together in one month or the company is screwed." In cases like that, poorly thought out algorithms, bloated classes, using variables with names like "x", "y", "z" with no comments, nothing really works except for the absolute bare minimum required and other coding no-nos probably do not seem that important. Given appropriate time and resources, perhaps he could have written the greatest code EVAR! Given a very limited time frame and managing to save the company would probably qualify them as a genius.
Re:30 to 40 thousand lines isn't large by any meas (Score:3, Insightful)
It somewhat depends on the language used - some languages are easier to penetrate than others. And some languages does more in 10 lines than other languages do in 100.
But anyway - to learn the code you may have to find a starting point (there is usually at least one logical point to start) and then make a flowchart in PowerPoint or something for the general structure. It's no point trying to get into the finer details, just a general sense of flow. You will get things wrong in the beginning, but don't worry. And you may end up finding a lot of dead code too.
When you have a satisfactory overview of the code it's time to really swim and drink the code. Many programmers have a tendency to accept that "it works" and stop there. By throwing the code into the compiler at maximum warning level and then try to fix all warnings you will be even more involved. And if you aren't satisfied you can take on the code with code analysis tools like Splint (for C) or FindBugs (for Java).
And don't forget that the commands "find" and "grep" in *NIX are your friends. Other environments usually have other tools, and IDE:s have their own, so you don't have to install Cygwin or something to get a grip on things.
And if you think that you don't understand the code well enough - try to port it to another operating system or other language.
Of course - this takes a lot of time and consumption of your favorite hacking beverage.
And yes - I'm involved as a single developer in a system with about 400k lines of code written in Java, and it was ported from an older system written in C, C++, Basic, Java, DCL...
Re:Not lots of code (Score:3, Insightful)
Sure, if you only have a trivial 250K lines of code, I guess you can use crappy tools like Xcode and Visual Studio to maintain your project. The rest of us have to use grown-up tools that look like this:
src$ find . -print | xargs wc | tail -n 1
1950894 7085675 56777966
There's only one way to learn your way around a new codebase, and the worst thing you can do is use a tool that aims to help with the job. Want to know how stuff flows through the program? Find where the program starts and draw the diagram yourself as you map it out. What I do is find something that I think I need to change, and a clear goal for what change I want to make to it, then map out exactly how the program reaches that point. You need to have a targeted goal to make progress with a stack of new code; just trying to read the whole thing or stare at diagrams of it won't teach you anything. Put the sucker into version control, generate regression tests of its output, figure out how to build after making a trivial change, and then try making a small non-trivial one. That's the only real way to learn how a program really works that internalizes enough of it into your brain that you can move upward to bigger maintenance tasks.
And, for the record, I would like to tell everyone who suggested using a debugger to trace through the code instead of figuring it out by inspection and experiments that you are all a bunch of pussies. Good luck with that when the code breaks in production and you've got nothing but log files from the period loading up to the crash to work with. If I can get a debugger to attach to a broken program when the problem exists, it is by definition a trivial problem to solve; if I can even get a backtrace of where the thing is stuck at when it goes bad that's automatically an easy one. The only way to learn what you should be logging and defensively doing is by only relying on logs, assertions, and testing all the time--never a debugger. Because when things go really wrong, you won't have your debugger to save your ass--but if you built in good testing and logging capabilities, they'll be there.
Re:Large? (Score:2, Insightful)
Re:Time (Score:3, Insightful)
There is no point in rewriting it before you fully understand it.
I fully support this statement.
I recently worked with a guy new to contracting. He came onboard to a project that had a lot of problems. He argued for re-writing it thinking he could do it quickly and simply; I didn't dispute that the system could use significant changes, and I asked him to read through and understand the existing code.
He never did.
Consequently I suggested to senior managers that he should be let go. Reading other people's code, particularly undocumented code, is painful - even for experienced coders. But it is necessary and failure to do so before recommending changes is unprofessional, dangerous, and lazy.