Reverse Engineering Large Software Projects? 104
stalebread queries: "Me and a team of other students have been tasked with reverse engineering a massive C/C++ (mostly C) computer game of about half a million lines. We have most of the source, but no clue of how to approach a task of this magnitude. Anyone have suggestions of programs, or techniques we could use to understand the structure of the game?"
Legal? (Score:1, Interesting)
Re:Legal? (Score:2, Interesting)
Re:Legal? (Score:3, Insightful)
Re:Legal? (Score:3, Informative)
"Human capital" is a rather common economics term to refer to those skills and knowledge that enable an employee to produce the desired works. Use the wiki, Luke. In this case, it is the experience and serenity which makes the Tao Master of programming worth several novice salaries :)
Re:Legal? (Score:1)
Re:Legal? (Score:3, Informative)
But the article poster has access to the source code, something not usually associated with 'reverse engineering'. Products are still protected by patents, copyright and trademarks, and writing Samba (for example) after seeing Microsofts code would open one up to legal woes.
IANAL, or USian.
Flowcharting might help (Score:4, Informative)
best solution? (Score:2)
Re:best solution? (Score:2)
Re:best solution? (Score:2)
pay big rates to skilled people if it's worth it to your business. pretty simple, really.
Re:Flowcharting might help (Score:2)
C-Scope [sourceforge.net] is a cool, free, class browsing tool that can make vi feel like a full-featured IDE. If you're an OS kind of person, take a look at this before you jump into the commercial tools.
Re:Flowcharting might help (Score:1, Flamebait)
Re:Flowcharting might help (Score:1)
Perhaps the C++ products would be better since a diagram would be acceptable if broken down at the class level but the C portion of the program is useless if broken down at the function level.
Also useless is output which is a simple listing on a printer. The output must be in UML. or something similar, to transmit useful information to the reader.
oh boy (Score:4, Informative)
I believe the instructor is assigning... (Score:3, Informative)
Whats more, some of these tools can be used to modify programs within the model, and then update the source code (forward-engineering). They can also
Re:I believe the instructor is assigning... (Score:3, Funny)
Re:I believe the instructor is assigning... (Score:2)
UML assumes OOP,
Re:I believe the instructor is assigning... (Score:2)
Yes, and we know that this practically never happens. Especially with performance-critical software like games.
<eyeroll/>
Re:oh boy (Score:2)
(paraphrased)"commanding many is the same as commanding few, it is generally a matter of organization".
So I would read some general stuff on how to do this (Practical C Programming has a short chapter for example, but you probably want a book all about it). I would then do what they do with their few to few thousand line sample meticulously to the whole thing as the parent post suggests.
You need to flow chart the whole thing with notes, t
Re:If you already have the source... (Score:1, Interesting)
Reverse Engineer or Refactor/Port? (Score:5, Interesting)
As for how to approach it - I think it depends on the size of your team, and what goals you set for the effort. Are you just wanting to learn? Or do you want to improve performance? Or make it work on another platform? What are the goals for this project?
Once you know those details, they might give you an idea where to begin.
Re:Reverse Engineer or Refactor/Port? (Score:5, Informative)
Re:Reverse Engineer or Refactor/Port? (Score:1)
Re:Reverse Engineer or Refactor/Port? (Score:2)
It could help... (Score:3, Insightful)
So it really depends on the kind of game it is. Since I'm assuming you know this, I would suggest trying to first think how you would write the game yourself, and then see if you find any similarities between your ideas for the engine structure and the games.
Re:It could help... (Score:1)
A UML reverse-engineering tool (Score:3, Informative)
Other UML tools exist, like Argo and Umbrello, but I'm not sure if they reverse engineer.
Re:A UML reverse-engineering tool (Score:2)
I feel bad for students nowadays to have to deal with these gigantic assignments in schools who never provide enough resources. But thats a different story altogether.
Re:A UML reverse-engineering tool (Score:2)
Isn't that exactly what they'll do when they get out of school as well?
Not Rational Rose unless... (Score:2)
Perso
Source navigator (Score:4, Informative)
I like to use it when browsing through code, you can search and browse as much as you like. It will still take an effort though.
Re:Source navigator (Score:2)
Profiling! (Score:3, Informative)
Re:Profiling! (Score:1)
lots of moutain dew.... (Score:4, Funny)
Tarzan and Tanto School of Communication Arts (Score:1, Offtopic)
You my friend obviously havn't attended the World Famous Tarzan and Tanto School of Communication Arts.
Come on Down! TaT School of Comm. Arts am accepting applications now!
WTF? (Score:2, Insightful)
If you wish to start getting a handle on a chunk of code, start by reading main() along with a profilers output. Grep is your friend.
Re:WTF? (Score:1)
Re:WTF? (Score:4, Insightful)
Re:WTF? (Score:2)
Re:WTF? (Score:2)
IDA (Score:1)
Re:IDA (Score:2)
Reverse engineering (Score:1, Informative)
Reverse engineering is generally thought of as a "cleanroom" technique that involves having the binary and/or specification but not the source. If you have the source, then you're just reading/rewriting it (or perhaps just copying it and doing s/Old Name/Our Cool New Game That's Nothing Like Old Name/).
>Anyone have suggestions of programs, or techniques we could use to understand the structure of the game?
If it's most
Re:Reverse engineering (Score:2)
Reverse engineering is generally thought of as a "cleanroom" technique that involves having the binary and/or specification but not the source. If you have the source, then you're just reading/rewriting it
If you have the source but not the spec, and you're working on recreating the spec, then you're reverse engineering.
Re:Reverse engineering (Score:1)
Graphviz and GNU Global (Score:2)
Use our tool :) (Score:2, Interesting)
Not that I'm biased or anything. The idea is to monitor the program while it's running and use the call graph to generate sequence diagrams and such. Feedback and ideas for further reasearch welcome
Whats your goal? (Score:2)
For the first, I'd try and find the functions called around when it occurs, and use a debugger to step through what happens.
For the second, I'd study the interface files and use cscope. Figure out what is calling what, and see how its interlinked.
For the third, you need to do the same as above on a local level- between files of the module. Then d
use the tools that are available (Score:2)
if you have access to (ir)rational rose, runing your code through that will probably speed up a lot of this process. otherwise, a combination of cccc and doxygen with the appropriate config files will give you about the best start you can hope for. hopefully, the code has reasonable documentation. if not, you're basically screwed --- you'll have to work out the use cases and reconstruct your software from there.
here you go (Score:1)
Have most of the code? (Score:3, Insightful)
Draw flow charts. Then assign a seperate person for each module to make sense out of it. Next you'll do what you plan to do....
Make mods for it? Make a clone? Rewrite the code and sell the code? Recompile and port to Linux?
There are some automatic UML generators (Score:2, Interesting)
http://droogs.org/autodia/ [droogs.org]
Re:What language is C/C++? (Score:1)
C/C++ more correctly but rarely known as C++/C is C++ written in the style of C, and is a wicked waste of Bjarne's time.
The guy behind xapian.org/xapian.com, Olly Bets knows how to write C++ with proper and repeated use of the base classes, iterators and templates and to be frank, his C++ looks almost like perl, and it is a delight to read.
C/C++ is just C with objects and falls so far short.
Sam
Re:What language is C/C++? (Score:1)
for ($you=('like');open(LY,read('ing',$_,('perl')));){ do{};you();}
Re:What language is C/C++? (Score:2)
Strictly speaking: Global symbol "$you" requires explicit package name
Maybe the per parser doesn't throw up any errors but its no more perl that a lot of C/C++ is C++
Sam
Re:What language is C/C++? (Score:1)
$you == $::you == $main::foo
So the package name is reall implicity implied.
Agreed though. Most perl poetry is to perl and Larry what C/C++ is to C++ and Bjarne. The problem is that many write perl code like poetry - bad poetry.
Re:What language is C/C++? (Score:2)
Re:What language is C/C++? (Score:1)
c was and still is a great generic universal (somewhat portable) assembler language which is why it is at the core of so many OSs.
Why didn't they just generate c from Smalltalk instead of writing C++? Gee I bet that is/was already being done!
And I don't even code in Smalltalk, but am stuck in the J
Cross-reference first: Doxygen is your friend (Score:5, Informative)
It sounds like you are unable to build the complete system and run it, since you're missing functionality. This removes the possibility of using runtime tracing tools.
The first thing I would do is run something like Doxygen [doxygen.org] over it to generate a cross-referenced description of the structures. It won't give you a global view of things, but it will give you a decent browsable view of the code itself. Another response mentioned GNU GLOBAL [gnu.org] which may work better for you. Yet another possibility is LXR [linux.no], though it may not work as well in C++. Regardless, a nice thing about Doxygen is that, when used with GraphViz, you can get useful diagrams generated showing class containment and file inclusion graphs.
After you have that, get out your paper and pencil, and start drawing and manually tracing things. That's how I go about coming up to speed on new code I can't execute and step through. Eventually transfer that knowledge into a text file (or, nowadays, a wiki) so that others can benefit from it.
Re:Cross-reference first: Doxygen is your friend (Score:1)
Re:Cross-reference first: Doxygen is your friend (Score:2)
Can you compile? (Score:1)
So my suggestion is start by getting it compiled, up and running ;) You can then use the debugger to breakpoint the code and follow it through. You say you have most of the source code. Is the rest available as libararies to link to? Otherwise you could create 'fake' libraries just to get it compiling and running.
Re:Can you compile? (Score:1)
Resources For the Code Janitor (Score:5, Informative)
Code Reading: The Open Source Perspective [spinellis.gr]
Object-Oriented Reengineering Patterns [unibe.ch]
Reading Computer Programs: Instructor's Guide and Exercise [deimel.org]
Tips for Reading Code [c2.com]
Re:Resources For the Code Janitor (Score:1)
A Couple of suggestions (Score:2, Informative)
I've been through this sort of exercise several times in my career so far. 500k LOC is too much for a small team to get a handle on in any reasonable amount of time, so don't feel too helpless... You're professor is throwing you guys to the wolves and seeing what you are able to accomplish.
As for the actual suggestions, read on:
First, you'll need a tool to generate some form of cross reference for the entire codebase... I'd recommend Doxygen (hack the config file to generate the inheritance and call
Scripts and Configuration files (Score:1)
Are there any configuration files? If no, there may be some code that's reading supposed to be found conf files.
I'm assuming that you have the source as a guide (Score:2, Informative)
I'd start buy actually reading the source - building it if you can. Run profilers on it and try to get some kind of visual representation of the underlying code tree. If you have source, try using something like DOXYGEN [doxygen.org] to autogen some documentation (and structure) out of it. Someone menti
Understand for C++ and Source-Navigator (Score:1)
We are evaluating some tools along these lines. The ones we liked most are RedHat's Source-Navigator [sourceforge.net] (GPL) and Scitool's Understand for C++ [scitools.com] ($$$).
Sorce-Navigator seems to be slow compared to Understand C++, I'm sure this has to do with the way they index the DB. On the other hand, the Linux version of Understand C++ needs some polishing IMHO (too many crashes on Debian/serge).
As for report-generating tools that just index and cross-reference the whole project, Gonzui [sourceforge.net] is a pretty good one.
Re:Understand for C++ and Source-Navigator (Score:1)
Massive? (Score:2, Interesting)
I'm not exactly sure what you're trying to do here. As many ppl have said reverse engineering something that you already have the source for is not really reverse engineering at all. However if I make the (somewhat suspect) assumption that your objective is to e
Forward Engineer instead... (Score:1)
Reversing Std C (Score:3, Informative)
Tools:
OllyDbg - Awesome usermode debugger, probably better suited than softice for this particular task. You can add assembly wherever you want, and it will create patches for the exe that can be automagically applied. It's also FREE.
Numega Softice - Just in case you need to bring in the big guns.
IDA Pro - Best reverse engineering tool available. Lots of extension scripts to do anything imaginable..
TSearch - Can search memory at runtime, set breakpoints, disassemble code on the stack, and dynamically insert new assembly at runtime. Nice for understanding the flow of the software as it runs, and identifying interesting variables and structures.
REC Decompiler - Awesome decompiler that produces a high level representation of the code. Not a replacement for your brain, but can save a lot of time tracing over assembly code to understand the purpose of a function.
WinPCap & Ethereal - For reversing game protocols, and understanding client-server interaction. Sometimes it's nicer to just figure out where the host name/IP string is located in the binary and replace it with 127.0.0.1, then write a little proxy program to sit in between the client and the server.
HVIEW: Hex editor with the ability to disassemble.
(Use Cygwin or mingw for the following) strace: Traces signals, system calls, and spits them out to the screen.
nm: Dump binary symbol table and names.
I've definitely forgotten a plethora of other useful tools (especially the binutils ones), but the above consist of some of my favorites.
For a game, you'll probably be dealing mostly with OllyDbg, HVIEW, REC, and winpcap/proxy. I'd recommend using nm to get a list of all of the symbols in the program, and then maybe split up and assign each student some number of symbols to understand and rewrite in C. Then they can use HVIEW or OllyDbg to navigate to those symbols, and try translating them. If they have a difficult time, have them use REC to get a higher level representation they can cheat off of.
-Jason Thomas.
Clarifications (Score:1)
Re:Clarifications (Score:1)
Re:Clarifications (Score:1)
We want to do both. Right now we're at the point where we're trying to document and understand the code. Eventually our goal will be to modify the source to add some features.
Re:Clarifications (Score:1)
Presumably you have some specific types of modification in mind so start by creating a list of each functional area that you think each modification will impact. You probably don't even need to look at the code for this part. For example, if you want to add a 'boost' feature to a racing game, then you'll
Re:Clarifications (Score:2)
an autodiagrammer? (Score:2)
See graphviz.org's resources section for some links to profilers
I wonder if something like that is avaiable for C++. Found ROCASE [ubbcluj.ro] which looks like a CASE tool that can "reverse-engineer" (analyze) C++ files and automatically format diagrams for you to help understand the code structure. Post back he
It's easy (Score:1)
Use Data as the X-Ray (Score:1)
For programs that primarily do file processing, you can get a similar understanding by analyzing the input files and the output files.
For database programs you often can get the DBMS to log the transactions or the SQL.
For embedded systems you wou
Understand for C++ (Score:1)
Just in case your still looking at this topic.
Reverse Engineer or Refactor/Port? (Score:1)
What is your target environment, language?
What is your objective?
What is your time line?
What are your available resources and what is their available time to apply to this effort?
Additional questions?
Do you have a working version of the original installed Game?
Can you get the missing source or if not possibly "Reverse Engineer" it?
The people who read (Score:1)
Prolog (Score:1)
Ok - that should have been under the grammer flame (Score:1)