Resources on the Theory Behind Decompilers? 9
An ever-questing Anonymous Coward asks: " I took a compiler design class last fall and found the material to be very interesting. I've recently started to become interested in decompilers. I looked around on the Internet to see if I could find a general description of how a decompiler is designed, and the theory behind them. Unfortunately all I was able to find were sites that had decompilers for some language and not a general discussion of decompilers. Does anyone know of any sites or books that discusses the theory of decompilers their design?"
Re:Ask... (Score:1)
http://www.csee.uq.edu.au/~csmweb/decompilation
is also interesting.
Re:Ask... (Score:2)
It's a description of a college course on the theory of Decompilers, with full references.
(First Post, With Content, Nothing but Net. Eat Flaming Death, Trolls.) --
Evan
Ask... (Score:5)
http://www.csee.uq.edu.au/~csmweb/decompilation/hi story.html
--
Evan
How decompilers work (Score:2)
The first step is deciding what's code and what's data, and isolating the individual instructions. Some disassemblers do this by starting at the start address and tracing through to all the reachable code. This is a good way to find subroutine entry points and jump targets.
Code areas must be parsed into individual instructions. For machines with variable-length instructions, like IA-32 (x86, etc) this can be tough. For fixed-length machines, it's trivial.
Each instruction changes the machine state in some way. The disassembler must understand this, and must convert the code into a representation in which this is explicit. In particular, what's in each register must be tracked, and, typically, turned into a representation which has lots of temporary variables, rather than a few registers. This representation looks like the flow graphs compilers use internally. Local flow analysis determines which register contents, stack contents, and state bits will not be used further.
This usually requires some knowledge about the idioms of compiled code representing data structures. The goal is to recover as much type and data structure info as possible. Typically, you don't have debug info available, so this has to be worked out by watching code that uses accesses offset from a base.
An attempt is made to recognize flow patterns that correspond to the usual "if", "for", "while", "until", and "case" patterns. It's not guaranteed that a proper nested structure will emerge, because the code may have been optimized, but for most reasonably written code, it will.
At this point, there's enough information to generate rather low-level C code. If that's enough, code can be output. It will be tough to read (few variable names), but it should compile and reproduce the semantics of the original program.
Interactive tools which allow working on the intermediate representation can be useful. High-quality decompilation often requires manual hinting to the system.
These books are good. (Score:2)
--
Modern Compiler Implementation in Java
Appel
Cambridge University Press, 1998
--
Compilers : Principles, Techniques, and Tools
Aho, Sethi, and Ullman
Addison-Wesley, 1985
MASM (Score:1)
Scarily, yes they do. Snipped from my MSDN Subscription list:
Disc 0232: Macro Assembler 6.11, Team Manager 97, FrontPage® 2000 Server Extensions
Is it just me, or is it rather disconcerting that this is bundled with the "applications" instead of the "development tools"? And what's with putting it on the same disk as FP Server Extensions? *shudder*
Re:Already got one.... (Score:1)
but you can't have the source [geocities.com]
Q1: Is the source code of Jad available?
A1: Currently I have no plans to release the Jad source code for any purposes including porting to other computer platforms.
Data Mining Analysis (Score:2)
One of the areas that I found myself spending the most time was data analysis -- determining what data was what type (long, int, char, et al).
Of course this all stemmed from poking around in Apple // games to figure out how to give yourself extra "lives" ;)
My Recommend: Build a Java decompiler. Looking at the virtual machine and the known world of how it deals with objects I think you'd have a ready project (open source?) that would not only be entertaining, but something you could (should?) write in java itself !
Already got one.... (Score:2)
--
He had come like a thief in the night,