Learning and Maintaining a Large Inherited Codebase? 532
Posted
by
timothy
from the bequeathed-and-devised dept.
from the bequeathed-and-devised dept.
An anonymous reader writes "A couple of times in my career, I've inherited a fairly large (30-40 thousand lines) collection of code. The original authors knew it because they wrote it; I didn't, and I don't. I spend a huge amount of time finding the right place to make a change, far more than I do changing anything. How would you learn such a big hunk of code? And how discouraged should I be that I can't seem to 'get' this code as well as the original developers?"
30 to 40 thousand lines isn't large by any measure (Score:0, Informative)
Visualisation (Score:5, Informative)
Anything ranging from just sketching out some informal package diagrams on some paper (I quite like using an A3 sketchpad) to something more like Code City [inf.usi.ch] which can work with code in smalltalk, java, and c++. There are UML diagram makers, of course, but automated diagrams like that probably need to be edited.
In fact, it is not the finished diagram that helps so much as the drawing of it, which is why paper and pencil is so good. Or a vector graphics package.
Re:It depends on the language (Score:5, Informative)
Re:30 to 40 thousand lines isn't large by any meas (Score:4, Informative)
Oh yeah, well I just inherited a code base of 2.8 trillion lines of assembly code, and I have to read it over a 12.734 baud VAX connection! Why, back in my day...
Anyway... I've taken on a few large-scale software projects before, and my approach has always been "read twice, hack once". I agree with the the parent, and I'll add a note: for the love of everything sacred and unholy, use revision control, and don't trust it -- that is, back up incessantly. Document the hell out of your process. Once you've really learned the system, you might want to back out some of the newbie mistakes that you're making right now.
And yes. Learning a big system takes a lot of time -- you should be reading much more than writing until you've learned it. I find it helpful to diagram dependencies / draw up finite state machines.
Re:Use Doxygen (Score:2, Informative)
Divide and Conquer (Score:4, Informative)
Identify each major portion of functionality. If you are working with a sales/billing system you would probably end up with : Orders, Invoices, Payments, Admin.
Go through each of those portions and identify the major portions. Orders: Order headers, Order details, business logic, ui logic, reports, datalayer, etc. Repeat until reduced into easily consumable units.
Pick and stick to an SDLC. Use whatever fits the situation and the resources. For a small project (under 100k lines of code) you should be good by yourself. Anything more and you'll have to involve at least 1 other person for testing. For medium (100k-500k lines) you'll need an additional dev...For large projects (500K-5M lines) you'll need a project manager, lead dev, 2 devs, 1 test, and a UAT team...For larger projects you'll have something unique and frightening to the specifics of the software project and corporation/agency making it...anyway, I digress...
Go through each subdivision line-by-line and re-write it yourself (even if you aren't going to put your re-written version into production); the only way you're going to truly understand what is going on is if you do it yourself. Use whatever language you are most comfortable with or is most appropriate to the task (or languages), it does not need to be the same as the original.
Verify that for a given input, your version produces an exact output.
Take a deep breath. It's not a race. It's a one-to-one functional mapping of your software (your mindspace) and the original software (the other developer(s) mindspace(s)). The code probably will not be straight forward. It has also been battle-scarred and will be warty. Changes of initial requirements through time and feature enhancements (feature creep) will have taken it's toll on what may have originally been something simple or even elegant. It's something of a niche mindset and if it is not for you, there exist many other exciting things to be programming.
Ultimately, if you do as outlined above, you'll solve many problems, be able to make whatever changes you like, and in so doing have a way to present your design as a replacement if you want...Or not, if you don't; for 30-40k lines parallel development makes sense, in a way, for one person.
Re:Use it (Score:4, Informative)
Not without variables, but without unnecessary ones. For example, someone might write:
int a;
int b;
int c;
int d;
int e;
int f;
int g;
a = dropBox1.Value;
b = dropBox2.Value;
c = dropBox3.Value;
d = dropBox4.Value;
e = a + b;
f = c + d;
g = e * f;
result.Value = g;
While I would write:
result.Value = ( dropBox1.Value + dropBox2.Value ) * ( dropBox3.Value + dropBox4.Value );
As a maintenance programmer (Score:5, Informative)
As someone who has done probably 90% of his work in maintenance programming, let me give you my tips:
BTW, the fact that you have a hard time understanding this code may be more a reflection on the original authors' coding skills than on your abilities; any idiot can write code that "just works"; it takes a lot of thought, time and effort to write code that is maintainable, and more often than not, the original coders were short on at least one of those (if not all three). Here's hoping you have the time to follow my above tips; they take a lot of time, but can be worth it if you really need to maintain the code. It's funny to note that apart from the first one, most of those tips apply equally well to developing software from scratch. If the code already has a change tracking system, unit tests, a build/run/test system, *and* automated testing, consider yourself lucky and just start picking apart the unit tests.
Re:Large? (Score:4, Informative)
"Then something would break" contradicts the earlier statement "no more input than logging in"
The fact that something is likely to break, and you will need to troubleshoot it, should be reason enough in itself to install some (small) convenient, unobtrusive troubleshooting tools, as standard practice, and as part of the standard initial installs for those servers, to make troubleshooting faster and not require software installations or elaborate practices when things do break.
You missed a part before the quote that you pulled out. "Most of the machines required no more input".
My statements remains consistent and not contradictory when only 2 machines typically need direct interfacing.
And small convenient, unobtrusive troubleshooting tools WERE installed as standard practice on the machines... I already said that there was dir /s /b, and findstr... do I have to have "find" and "grep" when I had tools with the same functionality?
When I started off, there was a big learning curve because of the new tools, but by the time I left, it was as second nature to me as was find and grep when I joined.
Re:Large? (Score:3, Informative)
At my job at Microsoft, we were in the support end of the core os group.
Windows doesn't really have find and grep, but it does have "dir /s /b [pattern]" and "findstr /sipc:"[pattern]""
When I joined Microsoft, I hadn't used any version of Windows at all for any reason other than playing games.
I did almost all of my programming at Microsoft in notepad.exe
it took me about a month before I understood that my entire group would be replaced by a few scripts in the Open Source world.
Dear lord, this is the most hilarious thing ever posted to /.
Fix some bugs in the defect tracking database (Score:2, Informative)
It's just not the same. (Score:3, Informative)
he ports of GNU utilities to Windows [sourceforge.net] or Cygwin [cygwin.com] or even your own company's Interix [wikipedia.org] and Services for UNIX [wikipedia.org] products?
I had Win7 and Vista Ent with Services for Unix I downloaded, and it just did not feel right or work right. The command line utilities work, in part, because the whole OS in Unix is basically a tree of text files. windows isn't, and so, the utilities tend to be less effective. Plus, some gotchas like how Windows handles open files with applications, its all different.
I thought interix would be the ultimate, but it instead it taught me the opposite. If you want unix, use unix. It's that simple.
Re:It depends on the language (Score:3, Informative)
Funny you should say that.,,
I quite like this reference from the Perl world about understanding large systems: http://www.perlmonks.org/?node_id=788328 [perlmonks.org]
Re:That's small (Score:2, Informative)
An associate of mine was working at a very high-tech electric (as in production and distribution of electricity) company. Apparently they had this very complex control system for a huge proprietary piece of hardware that was basically the core of the control rooms. It had to take in data from all sorts of different devices spread out across 100's of kilometers over a variety of proprietary protocols, make sense of all that data, try and figure out what the most likely scenarios for failures were and automatically implement control scenarios to mitigate damage or keep parts of the system running etc. So the story is the thing was written in a combination of C and assembler, and the file count alone was in the hundreds of thousands. They had two extremely beefy boxes set up to just do compiles, incremental compiles and re linking taking a few hours and clean compiles taking basically an entire work day (which is why they had two boxes, so they could start one compile after the other so different people could test their changes more often). The thing is to test their changes they actually had a small control room and a collection of devices on a grid they used to test, and to push the new binaries and data files and get a test set up would take hours as well. Needless to say most of the developers would basically just live in the office during the last month or so of development, but the facility was running 24 hours a day either way so they had a full service cafeteria, lounges, etc. all in the building. Anyway, THAT is the biggest code base I have ever heard of; and I'd bet there are quite a few similar situations around the world.
Re:Use it (Score:5, Informative)
What do you think about intermediate variables that are not strictly necessary?
My general rules of thumb:
1) I don't care how many variables are declared, so long as each makes sense on its own. Like another poster's example, 'fullName' is perfectly acceptable (especially for i18n/l10n aware code that may have different rules for generating a name).
2) I ABSOLUTELY HATE clever arithmetic / pointer arithmetic / expressions all crunched into one line that can be split out. Example: in C-like languages that support pre- and post-increment, I expect the code to use only one or the other consistently, and never mix it with another expression. So this is fine:
i++; ...but this I can't stand:
j = i + 4;
j = ++i + 4;
#2 I picked up from a very experienced developer who pointed out that making the code harder to read is never worth it, the compiler produces the same code as the easy-to-read version. And that making code that looks 'too easy to be clever' is quite a bit harder than making code that looks 'too clever to always work'.
Re:Use Doxygen (Score:2, Informative)
I feel the same way as OP when trying to make sense of some open source library I'm interested in extending. Doxygen has been a big help. In the future I might also try Source-Navigator [sourceforge.net].
Re:Time (Score:1, Informative)
Anonymous Coward rarely gets any mod points, no matter how good his/her commentary is, which is really annoying to me.
This commentary is a Guru level advice but no mod points, and yet there will be mod points for all sorts of fluffy comments.
Re:Large? (Score:3, Informative)
Re:Large? (Score:1, Informative)
Explains a lot about MS products.
Re:As a maintenance programmer (Score:3, Informative)
truly learn the software.
And then if your unit tests work you'll know enough to comment the code correctly for the next time you or your successor comes back to it.
Re:Large? (Score:1, Informative)
Re:No "find" and "grep"? (Score:2, Informative)
Re:30 to 40 thousand lines isn't large by any meas (Score:2, Informative)
Source Insight [sourceinsight.com] lets you browse source code - very useful for largish codebases. It's much quicker than findstr or grep because it has an index rather than having to search the whole thing. It's not free of course but I'd never go back to findstr having used it.