Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Resources on the Theory Behind Decompilers? 9

An ever-questing Anonymous Coward asks: " I took a compiler design class last fall and found the material to be very interesting. I've recently started to become interested in decompilers. I looked around on the Internet to see if I could find a general description of how a decompiler is designed, and the theory behind them. Unfortunately all I was able to find were sites that had decompilers for some language and not a general discussion of decompilers. Does anyone know of any sites or books that discusses the theory of decompilers their design?"
This discussion has been archived. No new comments can be posted.

Resources on the Theory Behind Decompilers?

Comments Filter:
  • by Anonymous Coward
    The main page at:
    http://www.csee.uq.edu.au/~csmweb/decompilation/

    is also interesting.
  • (And obviously, the space was added by /.'s filter... remove the space in "hi story" to make it "history").

    It's a description of a college course on the theory of Decompilers, with full references.

    (First Post, With Content, Nothing but Net. Eat Flaming Death, Trolls.) --
    Evan

  • by JabberWokky ( 19442 ) <slashdot.com@timewarp.org> on Thursday May 10, 2001 @02:23AM (#232963) Homepage Journal
    ... and you shall receive:

    http://www.csee.uq.edu.au/~csmweb/decompilation/hi story.html

    --
    Evan

  • Here's an overview of how to do it the hard way. There are easier ways that will work for languages like Java, but with enough work, any program that doesn't do heavy self-modification can be decompiled.
    • Code/data recognition
      The first step is deciding what's code and what's data, and isolating the individual instructions. Some disassemblers do this by starting at the start address and tracing through to all the reachable code. This is a good way to find subroutine entry points and jump targets.
    • Disassembly
      Code areas must be parsed into individual instructions. For machines with variable-length instructions, like IA-32 (x86, etc) this can be tough. For fixed-length machines, it's trivial.
    • State tracking
      Each instruction changes the machine state in some way. The disassembler must understand this, and must convert the code into a representation in which this is explicit. In particular, what's in each register must be tracked, and, typically, turned into a representation which has lots of temporary variables, rather than a few registers. This representation looks like the flow graphs compilers use internally. Local flow analysis determines which register contents, stack contents, and state bits will not be used further.
    • Data structure recognition
      This usually requires some knowledge about the idioms of compiled code representing data structures. The goal is to recover as much type and data structure info as possible. Typically, you don't have debug info available, so this has to be worked out by watching code that uses accesses offset from a base.
    • Code structuring
      An attempt is made to recognize flow patterns that correspond to the usual "if", "for", "while", "until", and "case" patterns. It's not guaranteed that a proper nested structure will emerge, because the code may have been optimized, but for most reasonably written code, it will.
    • Output
      At this point, there's enough information to generate rather low-level C code. If that's enough, code can be output. It will be tough to read (few variable names), but it should compile and reproduce the semantics of the original program.

    Interactive tools which allow working on the intermediate representation can be useful. High-quality decompilation often requires manual hinting to the system.

  • I found these mighty useful...

    --

    Modern Compiler Implementation in Java
    Appel
    Cambridge University Press, 1998

    --

    Compilers : Principles, Techniques, and Tools
    Aho, Sethi, and Ullman
    Addison-Wesley, 1985
  • Scarily, yes they do. Snipped from my MSDN Subscription list:

    Disc 0232: Macro Assembler 6.11, Team Manager 97, FrontPage® 2000 Server Extensions

    Is it just me, or is it rather disconcerting that this is bundled with the "applications" instead of the "development tools"? And what's with putting it on the same disk as FP Server Extensions? *shudder*

  • yeah...

    but you can't have the source [geocities.com]

    Q1: Is the source code of Jad available?

    A1: Currently I have no plans to release the Jad source code for any purposes including porting to other computer platforms.

  • I did a fair amount of this on old x86 DOS applications in the early nineties. I found the most useful tool to be "sourcer" which would convert the .exe back into a .ASM file which 8 times out of 10 could be run through MASM (do they still even ship masm?) into a working .exe.

    One of the areas that I found myself spending the most time was data analysis -- determining what data was what type (long, int, char, et al).

    Of course this all stemmed from poking around in Apple // games to figure out how to give yourself extra "lives" ;)

    My Recommend: Build a Java decompiler. Looking at the virtual machine and the known world of how it deals with objects I think you'd have a ready project (open source?) that would not only be entertaining, but something you could (should?) write in java itself !

  • There already is a java decomipler. It is called jad, and can be found here [geocities.com].
    --
    He had come like a thief in the night,

What is research but a blind date with knowledge? -- Will Harvey

Working...