Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
News

Reverse Engineering? 129

codec7 asks: "Ever since I read the article a about Australia legalizing reverse engineering, I've been curious -- How DO you reverse engineer software? I'm an average programmer really interrested in computer graphics, and would love to get into some software packages to see how they work. Nothing underhanded, strictly educational. I get off on algorithms. Anyway, am I in over my head even contemplating it? I have a feeling that by the time I could really reverse engineer anything (even with help) the information would be grossly obsolete and I could pick up better tips and tricks from some gaming mags. I would appreciate any direction I could get from readers who know a little about this kind of stuff." I figure it's probably best to discuss this now while it is still legal someplace in the world.
This discussion has been archived. No new comments can be posted.

Reverse Engineering?

Comments Filter:
  • by Anonymous Coward
    I agree with the last part of this posting: >>If you want to learn, check Fravia's Pages of Reverse Engineering. While there's lot's of crap there, there's also some nuggets of good information. Fravia's [129.105.116.5] as eddy says. Quite the contrary: I am mining that site for valuable information since more than a year and I still seem to be only scratching the surface... Giglio
  • by Anonymous Coward
    Rational Rose will reverse engineer all type of code. www.rational.com.
  • by Anonymous Coward
    search for fravia and mammon's mirrors, read a lot from over there to get used on cracking/reverse engineering techniques/tricks - i saw url upper, so i wont repeat them. Cracking and reverse engineering is like anything else, it needs practice. On the other hand, ORC Tutorials are definitively the bests papers you can read around. they cover theory and are not so much about practice as they look. Programs you need: ** soft-ice - THE debugger around for dos/windows platforms. and gdb for un*x systems. you can do nearly all you want with these. ** IDA (Interactive DisAssembler) - THE disassembler. dont trust its name, it can reverse anything via an interresting technique called flirt. you have the compiler, you can decompile. (java, c, pascal, whatever...). works as well for un*x systems and support several processors. you can get both from http://protools.cjb.net [cjb.net]. anything else needs apropriate tools, so look at you favourite search engine.
  • by Anonymous Coward
    before you'll have any hope of gleaning interesting information from reverse-engineering graphics code, you'll really need to have a backround in graphics.

    it may sound much less exciting, but why not pick up a graphics book? once the material found in those are old hat, you can move on to recent SIGGRAPH articles, etc...

  • by Anonymous Coward
    Errrm, no. Rational Rose is just an Object Team clone that allows to go from code to design diagrams. Yes, that's called reverse engineering too, but it's not the same thing that is being talked about in this thread (going from binary to source code).
  • The snow blind alliance reverse engineered the software needed to upload/download mp3s to the Diamond Rio. They basically wrote a VxD to monitor traffic thru the parrallel port while running the windows version of the software. Interestingly they found any files can be uploaded or downloaded from the thing, essentially making it a small hard drive.

    Check out these links for more info

    Rio support under linux (GPLed)
    http://www.world.co.uk/sba/rio.htm
    How they did it
    http://www.world.co.uk/sba/rio.txt

    This is a perfect example of how reverse engineering benifited everyone including Diamond. It only increases Diamonds customer base if linux. NT, DOS users can use the Rio. Diamond only supported win9x.

    Zenor is a Dork
  • Reverse Assembling a program is a tedious process where you must single step through machine instructions, analyzing each register of the machine at each step, and, analyzing the data sent to each address accessed by the program. The unfortunate truth is that, even though you may have the patience of Job, and painstakingly record every piece of data and instruction in the program, you will have achieved only a small portion of the result you need. If a program is anything less than some trivial routine that you probably already understand, its behavior will be dependent on the data sent to the routine, and upon the state of the system at the time the routine is run. It could also be dependent on asynchronous data that could arrive via interrupts caused by some unrelated event. In those cases, program control would move to sections of code that might have nothing to do with the target routine, however, it would be extremely difficult for you to know this in advance, and it is very difficult to see real time interrupts during controlled execution. The act of placing breakpoints in the code could actually alter the result. You could spend quite some time in a rathole, analyzing program instructions that have no relevance to your problem. Reverse Engineering is a process where you first understand the output of a "black box" system for each possible input and state condition, and then you independently create your own black box that produces the same output for each possible input. This is a mathematically very difficult process that increases exponentially as more variables are introduced into the system. How could you know how a complex thing, say a person, will react to every possible set of circumstances? You may be able to observe a person in many situations and get a pretty good understanding of their behavior patterns, but there is no way you could test all possible combinations of input/output. Graphics systems are very complex. OpenGL, for example, is a huge state machine, where the value of each state variable controls the way each function behaves during execution. It is not even possible to test every combination of input under every possible state condition, never mind be able to reverse engineer the system that created the output. If you are like the rest of us, and only have one life, I'd suggest you spend it studying the relevant literature until you become an expert in the field. At that point, maybe other people can waste their time reverse engineering your work.
  • It's called back-solving. There are on or two, possibly more, chess programs that use a similar method when given a game to analyze/annotate.

    The basic premise is somewhat simple. Start at a won position, and work backwards to a known opening/middlegame position.
  • That's commonly known as random, or Fisher chess.
  • Wow, I haven't heard the name Raphael Quinet for a long long time. I used DEU for a ton of DOOM maps back in the day. I'm pretty sure I still have the zip file I downloaded off a BBS in San Diego, before I knew what the hell FTP was and the only exposure I had to the net was a Usenet feed that I could access through the UCSD library. Jeez, it's a big trip down geek memory lane today.
  • ..trying to work out what is on a billboard by using a microscope. At that level everything looks the same...

    Anyone who has spent time with a disassembler knows what I mean.

    --
    Simon

  • I couldn't live without this app. Cracking is a crapshoot without it.

    Also.. being that it's a cracker's tool.. it's readily available as an "evaluation" version.. heh.
  • "If the original IBM PC bios had been patented, we would probably still be forced to use it to this day."


    I think that it would be more likely that other competitors of that day would have been in a better position to compete and to expand their platforms. The Mac, which was always expensive, would have gained more acceptance. If the IBM were the only x86 game in town, the price would have been kept high, allowing for more sales of Mac's. The Apple II and Comodore would have continued to capture the home market. Clones meant more "action " for everybody and kept the price lower, meaning dad could buy the PC for the home that he uses at the office.

    Just my opinion and a darn good one at that!
  • We used to do these cheats on the old Commodore 64. Throw in a few 0xEA (NOP) here and there, wam! Unlimited lives.

    Those were the fun days...
  • Internet.com [internet.com] defines Reverse Engineering as: "The process of recreating a design by analyzing a final product", which is what I always thought it was. It does not mean disassembling, decompiling, or tracing. I don't understand how reverse engineering can possible be illegal anywhere that even pretends to maintain freedom of thought.
  • I don't know who or what the final arbiter of the meaning of the term would be, but as I understand the term, it specifically does not mean disassembly etc, and that's confirmed by that glossary that I quoted. Other definitions that I can find include: "The regeneration of a specification from a completed design", "analyzing a system and producing a representation at a higher level of abstraction, such as design from code", and according to Case Western Reserve University [cwru.edu], Ohio, "It is legal to use reverse engineering to learn a competitor's trade secret". The evidence isn't 100% in my favour, but it seems to be biased in that direction.
  • well, it depends what you're doing. If you want to reverse engineer an executable, it's basically going to be in assembler without any comments. So, if you're good with assembler, you might be able to derive something from it. If you are going to reverse engineer some sort of protocol, where you can sniff it, it's much more useful.
  • Wow, I (somewhat) map for Q2. If you hadn't been there 5 years ago, who knows how much dumber I'd be!? :-)

    DEU was THE best 2d FPS editor EVER. Good work.
    --
    -- Chris Dunham -- chameleo@xcelco.on.ca -- Chameleon --

  • This is tale is not true. It is now an exageration of what was initally just a rumor.
    Yea right: They lost all 3 custom chip "designs".

    Believe what you may but the sheer mega-bytage of CAD data involved on so many seperate manufacturing and design processes makes it practicaly un-destroyable.
    Every PET/C-64/Amiga chip that CBM's (east coast!) wafer fab ever made, (The fab makes high power/mixed signal HDD chips presently) is backed up on 9track&8mm&CD&MO&etc. and well professionally archived & indexed thanx to Amiga zealots who made it safe & secure, thru thick or thin. Thanx again to Petro T.

    I personally assisted in the above, so I know.

    Joe Torre
    Sr. HW Engineer
    Amiga Inc. 1998
  • Reverse engineering software is NOT quite the same as in other industries. It is MAINLY for just figuring out interfaces, formats and the like for interoperability purposes.

    E.g., the Samba team does this in order to figure out what a Windows client expects from an NT PDC/SDC Server, etc...

    You don't want to, ethically at least, start disassembling software and reusing code in your own. That is a GOOD WAY TO INVITE A LAWSUIT. Besides, it is pretty damn hard to learn an entire broad concept (like 3D graphics) from dissassembling. At most, you would use a dissassembler in such cases just to see how OpenGL, or DirectX handles a specific function or object, but not the entire subsystem.

    If anything, only disassemble to see how things work. Otherwise, get yourself a good book, or, if you are a professional with a budget, license a toolkit from an established vendor with a proven product.

    Good luck ...

  • It's also one of the most common practices in silicon chips development.
    While attending university lessons, my Adv. electronics professor candidly stated that something like 20% of all the R&D costs of a chip manifacturer goes into reverse engineering competitors' products....
  • He didn't supply any proof. But being him one of the most respected teachers in Italy's biggest Engienering University - he's got to have some connection, hasn't he?
    I'd consider him a trusted source on this one.
  • > It does not mean disassembling, decompiling, or tracing.

    It pretty much implies it..

    If I give you an executable capable of encrypting a file, which method would you suggest to "reverse engineer" the algorithm?[0] Disassembly, decompilation, etc are methods you can use to reverse engineer.

    [0] The author is long since dead and not available for torture.
  • NOP is shorthand for "no op", or a meaningless, place-holder operation, one which changes no state in the running machine except the program counter (or what's called the instruction pointer on the PC).

    --Corey
  • For example is sniffing packets a'la the ICQ clones RE?
    Or figuring out how a database or file format works?
  • I highly recomend the book by Michael Abrash, The Black Book of Graphics Programming. It has 'The Zen of Code Optimization' and 'The Zen of Graphics Programming' in it as well as most of his other work for DDJ and some unpublished stuff in the 2nd edition. Very good reference, it's easily readable, that covers most of the basics of 2d and 3d graphics. You will need to learn assembler regardless b/c you'll need hand optimize a few loops that eat cpu cycles and so forth. If you just want source look at some oss stuff, it will at least give you ideas if you're work on win32 or a good code base for *nix.
  • >[snip] back in the days of 8bit cpu's and 16k ram
    >one could easily disassemble code to see how for
    >example a parallax scrolling routine was
    >implemented in a game...not anymore I'm afraid :P
    >
    Well, yes and no.

    You just need to be reasonable about what you look for.

    Trying to look at an ASM dump of Quake.exe for instance, and figure out that it uses BSP trees to store 3D surfaces would be next to impossible. Especially if you didn't know what a BSP tree was, or how it could be used like this.

    But, looking at the same ASM dump of Quake.exe, looking at the texture-mapping routines, trying to see how they got such good speed, wouldn't be a waste.

    In the second case, you already know how the task works, and exactly where to look, so you're just looking at their refinements.

    In this case, perhaps, trying to see how Carmack got his 'free floating-point divides'.

    This is all assuming that you don't want to just go and buy M. Abrash's book _The Graphics Programming Black Book_ where he tells you all the secrets he and Carmack used. :)


    For the curious... The texture mapping used fixed-point math, which is just integer numbers that you pretend and real numbers (ie, the CPU treats them like integers) which was a tradeoff for speed, but sacrificed accuracy. This was nothing new, people had been doing this for a long time. The innovation was using some floating-point math, which was slow but accurate, at the same time to correct the results.

    For instance, imagine doing a long string of calculations and rounding off to one decimal place after each one. Your answer will be fairly impercise, less early on, and very far off later.

    Now, what if you rounded off your answers for speed, but had a friend give you a value every sixteen numbers which corrected your answer. This way you could do the problems really quickly, but you would only drift a little, and the answer would be corrected, drift a little, be corrected, etc.

    In this case, you would be doing the texture mapping, doing 'fuzzy' calculations to texture those sixteen pixels very quickly, and your friend would be doing one very accurate calculation in the background (the floating-point pipeline) to put you back on track after those sixteen pixels.

    (This whole correction thing is needed because when you look at a wall from the side it doesn't look rectangular anymore. The farther from rectangular, the more your line is likely to drift if you use fuzzy calculations. But fuzzy calculations are sometimes hundreds of times faster than accurate ones... You do the math.)
  • One of my favorite books is Win95 System Programming Secrets by Matt Pietrik. It is a good example of reverse engineering put to constructive use (rather than copy protection removal). Matt provides psuedo code for many of the system calls and a good understanding of what is going on under the hood. There are mountains of useful information in this book about reverse engineering, check it out...
  • The problem I have at the moment is that I am employed in Systems Management/Support, and seem to spend half my time dealing with badly written system management tools.

    What is so infuriating is most of the things I work with are badly documented, and interdepend on other bits of 3rd party code.

    I used to be able to support systems by knowing how they work, and the processes behind them. This is not possible in my current position, yet no-one will offer me anything in another field, due to what they call a lack of experience. Yet my problem solving experience is probably greater then what they'll encounter in 10 lifetimes. (I was turned down for a post with IBM in Leeds last week for this very reason).

    Maybe I'll have to get noticed by some other means. I own an obscure sound card with a Midi port that the current device driver fails to control. If I could discover how the Win95 driver works, by reverse engineering, I could discover why the Linux driver is failing. Unfortunatly I only have access to my Linux box for 2 days a week, and spend most of the time catching up with email.
  • Since nearly all compilers that produce executable code from source first generate assembly, it's a really good idea to understand assembly in the first place.


    The next step (which really applies to C programs but includes some other languages as well) would be to run 'gcc -S yourprog.c' on some C code that contains common constructs such as do {} while, for loops, function calls and library calls.
    You can then have a look at the output and see how a compiler produces assembly from C source.

    Compiler methods differ of course, but many are similar. After enough study, you'll find yourself almost "seeing" the C code that an assembly dump was made from.

    This is a very simplistic method but works well if you're prepared to put in the time and effort.

  • Check out the Decompilation Page at http://www.csee.uq.edu.au/~csmweb/ decompilation/ [uq.edu.au]. These guys have published papers on decompiling programs back into semi-readable C code. I'm not sure how well it worked on "real-world" programs. Also, do a search for "binary editors" or "executable editors" (e.g., EEL, ATOM/Alto, Etch for SPARC, Alpha and x86 respectively). These tools edit binaries and have to do some form of decompilation to figure out control flow and so on. But they were not designed for reverse engineering. You could use them for making small tweaks to a binary for which you have no source (or optimizing it).
  • Kindof like the matrix?
    -stax
    /. poster #104543567
  • Yes, good point - the guy did say exactly that. He was curious about reverse enginerring, but if you want info on just graphics algorithms, try the gimp code!!
  • Reverse Engineering?
    I think you will really need to use windows (not NT) for this.
    As much as I like linux and that, Windows actually
    lets you have control (mostly).
    Dos Debug is a very handy tool for this and there
    are many many tutorials out on its use.
    Firstly it would be best to learn assembler (a hint- dont touch AT&T, keep with Intel)

    BTW, anyone here into low level stuff also? ive been trying recently to learn how to write a micro kernel (ie OS), and its verry hard. Once id gotten the hang of real mode, then i had to learn pmode (currently) and im really getting stuck on this paging vs segmentation stuff.
    Any assembler gurus here got any pointers to webpages/stuff for me?

    Penguin aka Spatula
  • If it's graphics you're into, most of the algorithm stuff (other than some 3D games rendering) is very much in the public domain. Get one of the many excellent books and implement the algorithms therein, you'll learn a lot more.

    Foley & Van Dam's "The Art of Computer Graphics" is still widely regarded as a definitive text.
  • Greetings to each and everyone,

    I prefer working on the Palms themselves (Palm IIIx and palm V). I do not use pc tools. I work with:
    Quartus Forth (a forth IDE)
    RsrcEDIT (a resource editor)
    LispMe ( a lisp shell on the Palm / occasionaly, but more to test routines...)
    Insider (the best diassembler/hex editor on the Palm)
    and of course the complete Palm OS SDKs on the Palm in Isilo (Html reader/converter)...
    I do not patch as much as I used to... when I do, it usually takes me less than a few hours to get the bugger(s).. Have you had any probs with self modifying code ? I had...

    Did any one found a way of having a debugger on board ? Debuffer is a good one (PC/MAC app)http://www.pagesz.net/~sessoms/debuffer/
    Has anyone here found a way of implementing systraps in LispME http://www.geocities.com/SiliconValley/Lab/9981/(a s in creating a macro that would simulate the systrap function) ?

    Kind regards to all...
    a really kewl site ---> http://palmwarez.backroom.net/index2.html (I have not checked it for some days.. I dunno if it's still up.. Darken if you read this, I'm still expecting an answer.. 2 patches for Vrubix not 8 !!! *smiling, teasing)
  • *pondering...* thank you kindly,
  • I have done a little RE in DOS, and I have found Borland's Turbo Debugger to be helpful. I was trying to modify an old game so it would run off the hard drive. It had a strange copy protection scheme that relied on a bad track on the 5.25" disk. Turbo debugger isn't the best debugger, but it lets you step through the source code while its actually running, and change opcodes and memory contents on the fly. Once you figure out what you want to do (like insert a jump instruction to bypass a test), you can make the changes in a hex editor.
  • Check out DCC:

    [uq.edu.au]
    http://www.csee.uq.edu.au/~csmweb/dcc.html

    ...developed by Cristina Cifuentes, who was instrumental in making this kind of thing legal in Australia.
  • This mans site taught me it all. I havent been here in soooo long.
    -Kancer
  • loser
    hey man lets be k-rad and speak like a p00f73R!!
    Yay, im a HACKER now, goodie!
  • > I mainly cracked games on the ZX Spectrum, using a Z80 processor.

    I started out programming by cracking games on my ZX Spectrum, (by cracking I mean altering them to make sure I never ran out of lives, rather than for pirating.)

    I remember for a while there were some good articles in one of the Spectrum magazines by Jon North .. he explained step by step how to alter games to make yourself invincible, etc.

    Right now, as several people have mentioned, there is a resource that is very similar : Fravias [129.105.116.5] reverse engineering site - This is primarily a cracking resource but its very informative nontheless.

    > Uncommented assembler is very difficult to understand,

    This is very true, but for win32 dissasemblers at least, you tend to get the names of the functions that are being called in the dissasemblers. If you use something like IDA
    it will even give you the names of MFC functions that are being called...

    > I'm working with MS crap all the time, were back engineering is almost impossible, or too time consuming.

    This is true for some things .. but even Microsoft products can be decompiled. Just beware the size of the output files! (A 1Meg .exe can easily be decompiled as a 40Mb text file).

    I still think that a knowledge of dissasembly is a good tool for the general programmer, after all how many of us have to work with buggy/undocumented third party librarys?

    Steve
  • If you document it and then code it, then yes thats reverse engineering.
  • Another Java decompiler that I've found useful is wingdis [wingsoft.com]
  • If graphics programs were written as they were 10 years ago, disassembling the programs might prove useful. However, graphics applications nowadays and any application that runs on a windowed system is a big bundle of API code. Reversing from the assembly code to API code is, basically, impossible.

    Esperandi
  • I've always wondered about alien reverse engineering. comic here [smallgrey.com]
  • On the other hand, I'd say it is like chess moves, only rather than normal chess, it's a game of Kriegspiel [chessclub.com], in which you don't get to see your opponent's pieces, only your own...

    --
  • Maybe I misread the original, but i got the impression you were talking about graphics manipulation algorithms. If that's the case then check out the gimp (http://www.gimp.org/). Under the GPL you have full access to every aspect of the code, and I can't think of a better way to get a look at graphics manipulation algorithms. No need to RE from assembly--you've got the source.

  • Sounds interesting. But where did he come up with a figure like 20%. Did he supply references for his "candid" claim?

  • You forgot one:

    4) View the system's behavior under a hardware emulator, or tap into the data/address bus with a logic analyzer.

    To do this these days, you're best off using an older platform for your testbed. Say a slow 386 machine. I don't even want to think about how expensive a Pentium-class emulator is, and you can't just tap into the PCI bus with the logic analyzer *I* can afford (mine is an HP1630G). Obviously the "big boys" can afford some of this more than some kid.

    For the sake of completeness, though, real-time hardware monitoring schemes need to be mentioned.
  • I got some serious flashbacks when you mentioned deu. I can still remember making my first DOOM map. :) Interesting how I just happened to notice a post on slashdot from someone like yourself. Oh well, have a nice day.
  • I've rarely seen crackers and hackers working together. Although it is true that you can crack without being able to program, it almost the equivalent to being a 'script kiddy'. Many crackers are programmers by profession and all the good ones understand how to program and understand assembly. Take a look at http://win32asm.ownz.com, it's probably the best site for win32asm programming.
  • On RISC machines which have delay slots due to pipelining, the NOP is needed. e.g.

    JSR r1 NOP
  • On RISC machines which have delay slots due to pipelining, the NOP is needed. e.g.

    JSR r1 <-- branch instruction with delay slot
    NOP <-- delay slot, this is executed before the branch is taken so if we don't want to execute anything here, we place a nop
  • by Anonymous Coward


    Some of the best Reverse Engineering Tools

    IDA Pro From Data Rescue


    www.datarescue.com [datarescue.com]

    Soft-Ice From Numega


    www.numega.com [numega.com]
  • by Anonymous Coward
    See also Fravia's site [129.105.116.5]!
  • by Anonymous Coward


    I would have to say the best place to start is with Fravia's Pages of Reverse Engineering

    Fravia's Site of Reverse Engineering [129.105.116.5]

    trust me i know :-) I used to host the pages before

    Signed, 53 68 61 72 70

  • I didn't see this one, but under Unix type systems the strace command can show all of the system calls that a program makes.


    This is quite helpful in reverse engineering networking stuff, and other fun stuff.
    ---------------

  • One friday night a friend and I sat down with a mac boot floppy and reverse engineered it. (If you have never done something like this you are not a true geek and should get off of /. IMHO

    We sat down with a hex editor motorola's 68000 book. After a while I could recignise a mov command and where it was moving to/from just by the hex value.

    I remember clearly that it set up a few registers, and then did a JSR to something in rom, then a few more registers and anouther JSR, and then a few more things. We decided to test our work out by modifying things just a little. (I think we put a yellow square on the screen) IT DIDN'T WORK! after much futher analysis we discovered that the first JSR never returned on that machine. It seems that on some other macs there were ROM bugs that the boot disk would fix, and the resgiers the first JSR had set up was just enough to tell the ROMs if they needed to be patched or not.

    Overall this was fun, but it took us 6 hours to deal with a 512 byte sector. We didn't work with the ROMs at all (we are just guessing what happened in the non-returning JSR case above because analysis revelaed that the rest of the secotor set things up to disable rom, and write a couple fixes to obviously ROM loactions, now copied to ram so it was writeable)

  • We all remember Commodore - makers of that much adored Amiga (of which 3 still reside in my bedroom - 1 of which I still use). Well, I remember hearing that Commodore lost the design of the Custom Chips and had to reverse engineer them. Not quite a Competitor reverse engineering someone elses software/hardware, but I thought it was interesting.

    And some people wonder why Commodore went down the tubes...
  • As a Tech Support guy, I'm often forced to "reverse-engineer" my own company's products, when the engineers don't provide documentation on how the stuff works, and when everybody's too busy working on the next release to talk to a lowly support rep.

    It's the FUN part of troubleshooting.

    It was MUCH more fun on Novell, because it had a built-in debugger. It's a PAIN IN THE ASS with NT, because they not only don't have a built in debugger, but so far I haven't found a decent one I can give to a customer free of charge to have them do something over the phone to gather info when a process has gone into the weeds.

    I'll check out some of those links that are provided in some of the other messages on this topic, maybe my prayers have been answered. . .
    (funny, I can understand assembler, but not C++)

    "The number of suckers born each minute doubles every 18 months."
  • My best friend has been working on his own 3D engine for rendering as well as gaming opportunities, and you are not going to find the tips he's aquired through gaming magazines. What you need to do is look through Siggraph archive papers, find books, listserves, forums, on complex mathmatics, different spline architectures, display methods, and a host of other detailed goodies. If you want to be in this field you're going to get your brain fried more than once, but it's worth it. If you don't want to do that, then contact game engine owners for student deals, because building something that's going to do truly neat things is not a year project, but a long road that will teach you a great many things.

    -Malachi



  • I realy didn't expect to se this question on /. ever. The reason is that this question is of exactly the same type as "how do I hack ?".

    And the answer is simple, you have to master the underlying technology, no matter what you are going to reverse engineer/hack.

    There is no tool that will do this for you, there is no magic bullet. There are tools of course, to help, but you will have to apply the brainpower yourself.

    So, in case of reverse engineering, get yourself a book on assembler and programming. Everybody has his/her motives to learn certain things, and curiosity is a wonderful excuse to learn a new thing.

    Happy hacking.

    --
    Why pay for drugs when you can get Linux for free ?
  • > How DO you reverse engineer software?

    Using disassembly/decompilation, debugging and/or probing (as in "black-box"). Have a look at this [passagen.se] essay I've written. It's about my analysis of the program "Net Nanny", but the techniques used are fairly typical.

    >Anyway, am I in over my head even contemplating it?

    No. But it depends on why you want to do it. This is not a good way to pick up on new graphics algorithms, unless there is something very specific that you are after. However, you should give it a try if you think you might enjoy this kind of low-level puzzle (for me, it's a puzzle).

    > I have a feeling that by the time [...]

    Possibly, but I really think you would have the sense to give up before spending that much time :-)

    I might aswell tell you what tools I use:

    The most powerful tool is NuMegas SoftIce [numega.com]. It's a systemlevel debugger for the Win32 platform (would love a linux version).

    After SoftIce comes IDA [datarescue.com]. IDA is a very competent disassembler. It runs under Win32, but it supports many different processors and file-formats (MZ/NE/PE/ELF/DLL/etc).

    Of course, you also need a good hex-editor. I use HIEW [kemsc.ru].

    I primarily use reverse-engineering techniques to discover backdoors and extract encryption algorithms in commercial software (Me and a friend reversed the censorware CyberSitter earlier, which lead to the downfall of the Scientologists "ScienoSitter").

    I also use the techniques to explore unknown file formats, see for example the project [passagen.se] to reverse the fileformat used in the game Baldur's Gate. When doing this it is much less "debugging/disassembling" then it is hanging around the hexeditor.

    If you want to learn, check Fravia's Pages of Reverse Engineering [129.105.116.5]. While there's lot's of crap there, there's also some nuggets of good information. You can also use his messageboard [insidetheweb.com] to interact with competent reversers, but beware, you will have to show that you are working on your side too. Don't ask for ready solutions.

    Hope this was of help, be in touch if you have any questions.
  • As the last poster said, NOP is short for "No Operation".

    This is an instruction that's just ignored. It used to be, back on non-pipelined, single execution unit CPUs that a NOP took n (usually 1 or 2) cycles, and could thus be used in timing loops where you couldn't read from the clock chip.

    Nowadays the NOP instruction gets thrown out in the early stages of the pipeline so the execution units never see it, and it usually takes zero cycles to execute.

    So what is it used for now? Taking up space. Why? Either so that you can put in a bunch of NOP instructions where you want to have a subroutine call later, or to put a bunch of NOP instructions over a subroutine call. (Similar to commenting out a few lines in a higher-level language.)

    The removal of a subroutine call is what's usually don't in cracking, remove the part where it would call the 'bad registration key' subroutine by putting NOP instructions over the other instruction, and no matter what key you enter, you're fine. (This is the easy part, the hard part is finding the right place to put the NOPs.)

    This has many legitimate uses as well, such as writing an infinite lives cheat for a game, or patching a program that tries to jump to a subroutine that crashes on your hardware, etc.
  • Two things.
    First, there are several disassemblers out there - things like Mocha for java which produces .java from .class files, rather than the other way round, etc. (There are ways of circumventing this, too.) Other stuff exists for DOS / Windoze binaries, etc, where you can get your hands on the assembler underneath.
    (Been there, done had, had the shareware "you've used this thing 10, 11, 12.. times" counting backwards by flipping one bit in the executable :)
    I don't know of anything that'll take winword.exe and give you the source though (thank heavens ;) - it's only possible to get it into assembler and from there you have to have the linker's .map file.

    Second, though: how much of this is just talk? If you consider the various "hack this machine & keep the box" sites around, how long have they been up for? You'd think someone with their finger on the pulse of the underground "cracking" world would actually have managed to *do* something about them by now.

    So you might well be better off with gaming magazines, if that's what you want, unless you've got a /lot/ of time to spare...
  • I reverse engineered a few 8-bit games in the mid 80's. I have honestly not even thought of this for at least 10 years. I mainly cracked games on the ZX Spectrum, using a Z80 processor.

    Uncommented assembler is very difficult to understand, althogh a couple of games did leave an ASCII dump of part of the source giving clues to the purpose of a couple of symbols. Looking at the individual operations may give some clues - XORs were common in many display manipulation routines.

    The best mechanism I had was to look for addresses pointing to the start of the data areas, which could usually be easily found with an ASCII or HEX dump. This worked well with adventure games, until they started to use complex data compression mechanisms. I did manage to reverse engineer some simpler compression mechansims, but these mainly used substitution through look-up tables.

    Another method I used, which worked with the action games, was to look for an instruction that assigned the accumulator register with the number of lives. This instruction was often infrequent, and by a process of elimination the correct one could be deduced. The next step was to look at the code around this instruction to find the address that this counter was stored in, and then look for all other occurrances of this memory address. One of these would be the code to decrease the number of lives. NOP out this instruction, and bingo, infinite lives.

    The other notable reverse engineering I did was to write a printer driver for Framework that allowed my printer to both handle bitmap graphics and the pound sign by merging two drivers.

    Since then, most reverse engineering has been by means of ASCII dumps of executables to discover hidden command line arguments or other messages. I also discovered a root exploit on Banyan fileservers by means of reverse engineering the toolkit install process - basically the installation mechanism ran as root, extracting a tar file to a temp location, then running a named script in that tar-file. By creating script to run sh, it was possible to access root which was otherwise impossible.

    And now all these talents go wasted. I'm working with MS crap all the time, were back engineering is almost impossible, or too time consuming.
  • Soft-ICE from NuMega technologies is the only tool you need for reverse engineering under Windows.

    Reverse engineering is inspecting how existing software works, typically so that you can change it in some manner, usually by integrating your software.
  • I've had to do some reverse engineering. A modem was having problems with a third party communications package. A large customer used this package exclusively, and we needed to temporarily patch it for a demo for them. Since the modem used a rommed microcontroller, it had to be done in the software package to meet the demo date.

    So first I used a protocol analyzer to figure out where the modem and package were going wrong. This was pretty easy, turned out to be a timing problem with one of the AT commands.

    Then came the task of disassembling the package, which was written in Pascal. This part wasn't so easy, since there was no better way to find the spot that needed patching than to figure out what the code was doing until we happened upon the part in question. I used DOS's DEBUG.COM for the disassembly. It took 3 days of about 18 hours each to find the spot, and I ended up fully disassembling about a quarter of the program. The patch (1 byte change) worked, as did the demo, and all were happy, especially after the modem's rom got fixed.

    In terms of skills, it helps a great deal if you have looked at a lot of assembly generated by high level compilers before. Then you can more easily see the ifs, fors, and cases instead of strange assembly sequences, and you're familiar with how parameters pass into and out of routines thru the stack. You also have to be pretty familiar with what the code is supposed to be doing to have much hope of recognizing the function of blocks once you've disassembled them.

    It was a challenge, and kinda fun for that reason, but it's most certainly not something I'd like to do for extended periods of time. Most of it is very boring grunt work, with a rare "aha!" to lighten the mood. Some parts are *very* opaque when you only have numeric addresses and field offsets. Lots of things remain guesses for a long time, and you can easily go down blind alleys by assuming wrong things.

    Another piece of reverse engineering that was much more fun was discovering the protocol and CRC generator polynomial used in another PC communication package, so I could write something that would file transfer with it from a VAX/VMS system. That was mostly a mathematical problem, and much more interesting. No code disassembly there, just probing with test blocks and watching the CRCs returned with a protocol analyzer.
  • There was a recent show on NPR about chess playing which involved the players starting off with a blank board, then placing the pieces on it one by one, effectively playing backwards, until finally both ended up with the "normal" initial placement. It struck me as quirky and intriguing. I believe this was on a Harry Shearer show, so it was most likely a parody, but it seems remarkbly similar to reverse engineering on a logical level. Not to mention interesting - have to try it out sometime and see what it's like.

    L.
  • There are two methods being talked about here:

    reverse assembly: This takes executable code and produces source code

    reverse engineering: This is where a programmer works to replicate the functions of a program without referencing the original.

    The latter is by far the hardest to do. The original BIOS clones were done this way. They knew that an interrupt call produced certain end results, so they wrote new code to reproduce this effect.

    This is where software patents come in. Reverse Engineering doesn't affect copyright, unless you have been very unlucky and wrote the code exactly as the original programmer.

    Patents protect methods. This means that if you have a patent that protects "a method of using x to produce y " even if you produce a system that contains no code from the original program, as long as x produces y you would still have to pay the patent holder a fee (or even be blocked from selling your code) and face a legal battle.

    This is why laws to legalise reverse engineering are useful. It means that people can produce systems that are functionally compatible with existing systems and usually are better or less buggy.

    If the original IBM PC bios had been patented, we would probably still be forced to use it to this day. Things like this are why I consider software patents A Bad Thing.
  • If you are familiar with assembly language, reverse engineering is "merely" very difficult :-)

    In the early days of microcomputers, it was relatively easy (with sufficient knowledge of the relevant assembly language) since all the games (which were the only thing one wanted to hack) were "monolithic" blocks of code - no shared libraries, everything in a single self-contained block of code (aside from the calls to what was humourously refered to as the OS!)

    Things are somewhat different now. Often one can find clues through mistakes (nt service pack 5 for example) made such as forgetting to remove (strip) details about variable names and other identifiers. (This was where the infamous "NSAKEY" idenfifier came from). Programmers are (usually) human and tend to use logical names for variables; once compiled and stripped, these names are lost.

    Basically, reverse engineering takes a LOT of effort (=time=money)

    S.
  • These guys are into reverse engineering and several far shadier activities. There's no real "beginners guide", nor do I think they're the kind of folks to produce one.

    You need to be a genuine super-hairy assembly wizzard to even contemplate RE anyway, so if you don't understand what they're saying, it's best to go away and read up on the whole assembly thang.

    http://www.phrack.com/main-index.html

    They do recommend a few really groovy tools, if you hunt around the site a bit.

  • by tzanger ( 1575 ) on Wednesday September 29, 1999 @09:04AM (#1651276) Homepage
    I'm an old RE... first the C64, then the PC (DOS), then Palm... never cared much for Win. Anyway...

    Reverse Engineering to discover an algorithm is MUCH more difficult than reverse engineering to get around a timebomb or serial number check or dongle. If you want that, go to Fravia's site... It's been mentioned a thousand times here already and is very very good for teaching you how to think like an RE. Too bad it wasn't around when I was in the heyday of RE, I would probably be a lot smarter. :-)

    RE for algorithms starts out the same as RE for cracking -- you need to identify the code that is performing what you want to discover. Is there a button you click to invoke the function? Perhaps when it goes to save, you want to see how it encrypts... Find out what triggers the algorithm you're interested in or your job will become much harder.

    After that's been done you fire up the debugger (many of them have been mentioned, use whatever you feel is best) and trap for that action. When the debugger comes back, you'll be looking at raw assembly or, if you're lucky, pseudo-commented assembly. Since you're interested in the algorithm, start dumping this info out somewhere where you can play with it later.

    Now comes the fun part. Here's where you start using your brain. Identify the inputs and outputs. Try to identify what the registers and memory locations are being used for. Since you've dumped out code regarding the algorithm to a file, try to assemble it with some stubs at the start and finish to feed it your data and deal with the output. This process is iterative. You'll make many many passes, with the code becoming more and more obvious as you go about this. Printouts and pencils are your friends. Don't be afraid to scribble and question mark and feed it data, try forcing loop updates, etc. Remember you're trying to understand what the memory locations represent.

    After some amount of time, you will have a chunk of code with scribblings all over the place and comments and hopefully only a few question marks left. Try to understand what the "small steps" (the assembly instructions) come together to form, and you can then rewrite the algorithm in a higher level language and see if you understood it. That was the goal, wasn't it?

    Obviously, a good solid working knowledge of assembly is required to understand the code. For mathematical functions which use the MMX and 3DNOW! instructions you will need to get the books from the chip guys to figure out what they do, since they're not simple instructions. The single most useful thing I ever used in my RE days was the knowledge of how C, Pascal, etc. created stack frames and how they manipulated the stack, both from a called function and on a calling function standpoint. Not many programs were written in assembly, and without that knowledge you might still be able to deal with the code but it will be much more confusing when you see things referencing [EBP+8] and the like.

    That, and when your debugger throws you in the middle of a function and you wanted to be just before that, you can analyze the stack frame and see where you should have placed the break statement. :-)

    Back when I hacked/cracked/whatever you want to call it, I did deal with a lot of assembly programs. I probably knew the int21h/25h/26h/27h calls better than Microsoft. I disassembled many BIOS' and learned lowlevel hardware control. It's an innate knowledge now that I still possess, although I guess it dates me now. Nobody much cares how to access the keyboard controller to toggle A20 or program the PIC to change DRAM refresh rates or look at the actual data stream coming off an RLL hard drive.

    There's an old, old database out there called HelpPC. I've used it since it came out and have added to its database extensively for all the Mode-X VGA graphics, hardware controls, etc. ftpsearch should find a version for you.

    Since most people these days take the easy way out with trying to thwart reverse engineering you generally only have one or two layers to get through right at the beginning. However if the original author was wise, the anti-cracking code will be sprinkled throughout, possibly including the code containing the algorithm you want to learn about! It's unlikely, but you may have more of a task than you first thought.
  • by mvw ( 2916 ) on Wednesday September 29, 1999 @03:03AM (#1651277) Journal
    The latest good trick in the area of reverse engineering I noticed was the use of entropy (the entropy from information theory) to spot interesting parts of an executable, in this case the location of a hidden cryptographic key.

    Could be useful in other areas too, like embedded hidden compressed code.

    Read this [ncipher.com] article for more.

  • by crovira ( 10242 ) on Wednesday September 29, 1999 @07:07AM (#1651278) Homepage
    Reverse engineering is not a problem, in an open-source community. But sometimes you just want to make sure your backers get a return for their investment. (We'll get back to the open-source model in a minute.)

    I work for a company which invested hundred's of millions of dollars, that nine integers worth of dough, in a row folks, and fifteen years in the doing, developping a complex monster of a financial data model. This thing is REALLY complete.

    Now you may just think that anything less than a billion is pocket change for Bill G. But we're not Bill G. Neither were our investors and some of us sweated blood to evolve this beast.

    How would you feel if YOU and a couple of hundred of your friends had worked for years on something only to see your potential for break-even vanish to null, zip, nada, nothing, by somebody swiping a copy of your database, publishing the data dictionary and reverse-engineering the software you worked fifteen years on to build interfaces to all the data tables.

    I'd venture to offer: "Very broke and broken hearted." Not to mention angry enough to sick a law firm full of angry paperwork at the perpetrators to get them to "cease and decist."

    No, Reverse engineering is not a problem in an open-source community. Because it shouldn't happen. The development should have been collaborative from the get go.

    Open-source is a great concept, if a project was started as open-source and everybody chips in to improve the product and its place in the market and doesn't rip-off the concept or the source code depriving the originators of revenue by contributing nothing and reaping the rewards.

    Also the project has to come first and be acknowledged as THE project. Its no good if we have another Apache project competing for web services or another Samba competing for intersystem operability. You have to contribute to Apache and Samba and not just grab the code and, uh, fork off.

    That's what the corporate world, the backers and users of the fruits of our labors are really worried about. The technical issues don't bother them. Like everybody else, they don't understand them.

    I'm still a little leary of all these Linux distributions. I'm not the only one. Luckily, GNU/Linux, Apache, Samba, (Mozilla some day, I hope,) and a host of other products were well controlled and evolved in a collaborative yet well-controlled atmosphere.

    That's rare and I'm going to OpenSource / OpenScience 99 at Brookhaven labs tomorrow to see what is being done to spread the faith.

    Because its the the competitive aspects of the development process for all of the other 'stuff' that's a real worry.

    That's why there's a hundred lousy accounting packages out there rather than just ONE great one... That's why there's a hundred lousy payroll packages out there rather than ONE great one. We haven't yet learned to share and play nice with the other children.

    Say we learn to spread the wealth, that begs the question "How do you spread the cost?"

    So far, lets face it GNU/Linux, Samba, Apache and a whole lot of other software out there is at the beginning of the cost curve. We're not talking millions of dollars here. The development has to date been very Mom-and-Pop and devoted hobbyist.

    Will the development slow to a crawl when its not something that's universally needed, like an OS or a Web server, but gets into niches, like financial models or if something get really expensive to build.

    Can an open-source approach work in the alleys off of the Bazzar? That's THE question and we have to come up with a right answer, a complete answer.

    Because if we're to reach farther by standing on the shoulders of giants, let's make sure the giants are not heading in separate directions and leaving us, the development community (not just the hackers) hovering precariously over a growing chasm.
  • by LL ( 20038 ) on Wednesday September 29, 1999 @12:06AM (#1651279)
    Despite the rather black box connotations of reverse engineering, it is a legitamate R&D exercise. Car manufacturers reverse engineer their competitors, chefs try to dissect recipes, etc. In the computer context according to the Centre for Software Maintenance [uq.edu.au],

    Reverse Engineering is the process of analysis of an existing software system to create representations of a different form or higher level of abstraction.

    and

    Reengineering is the process of analysis and modification of an existing software system to reconstitute it in a new improved form.

    Given the preenial occupation of engineers is to make things better, faster, or cheaper, tinkering with the electronic toys or source code is a natural pasttime. It is only the marketeers and financial managers that want things to be "hidden" so that the cost (and thus in their mind = value) is higher (basic economics, remember scarcity == higher price). Obfuscation of code is an obvious mechanism to exclude competitors, however, it significantly adds to the long-term cost of maintenance and also reduces the potential market. How many times have you've been given a piece of code with the design specs/architecture residing in someone's head who've just left? Wouldn't it be nice if some intelligent bit of software did the analysis and gave you the answer (yeah, wishful thinking but still ....).

    Perhaps people don't realise it but there are 2 information monopolies, one when one party controls everything and there is no alternative, the other when everything is freely available so that there is no competition (and thus no alternative).

    LL
  • by kekoap ( 37035 ) on Wednesday September 29, 1999 @10:08AM (#1651280)

    Reverse engineering is just like science. You pose hypotheses about the system you are reverse engineering, then you find ways to test those hypotheses.

    Like Raphael [gamers.org], I have been involved in reverse engineering a number of fun systems -- Quake network protocols, Quake map formats, OpenGL programs, LEGO Mindstorms. All of these systems required the same general strategy but different tools and background knowledge.

    My experience has been that the hardest step of reverse engineering a system is getting started. You typically find yourself needing some tool to analyze a system that you just don't have.

    For the Quake network protocol, that tool was a UDP proxy that dumped data in a format I could understand. For OpenGL programs, getting a tracing infrastructure set up was required before meaningful analysis of how programs use OpenGL could proceed.

    For LEGO Mindstorms, the hardest part would have been figuring out the baud and bit encoding of a serial stream, since I didn't have easy access to an oscilloscope at the time, and I do not like trial and error when something unrelated -- like my serial port setup -- could go wrong; however, somebody had figured out the serial encoding already, and the starting hump ended up being obtaining a serial line data analyzer. (I ended up using a SGI Indy as a serial proxy.) Later Mindstorms reverse engineering required a disassembler/assembler/compiler tool suite.

    Quake map files were easy; the tools were a hexdump program, a program to factor numbers to find strides, an HP calculator, and some programs to convert number formats.

    The second part of reverse engineering something is finding useful ways to sort through the data that gets collected or generated. A lot of times I found that this boiled down to writing a program to analyze and print out the data, which I could then look over and study.

    For example, the Quake 2 network protocol included some compressed information whose presence or absence was indicated by a bit vector; to figure out which bits mapped to which data, I used a program that tabularized and printed out the data in a really wide format; I then looked for patterns in the compressed data across many, many packets. By lining up columns of numbers that were clearly the same data, it was possible to infer which bits mapped to that data. Kind of like playing a really long game of Mastermind where somebody else gets to choose most of the guesses.

    For Quake map files, after figuring out the basic layout of the records in the file (which hasn't changed much from version to version), the important part was figuring out the meaning of all the data. Early on, a useful tool was one that started at a given offset and printed out the range of numbers located at a particular stride from the starting point; this helped associate records of different types to one another. Later, and by far the most useful tool for analyzing Quake map files, was a level renderer used to verify the meaning of the map data. Related tools verified not only the meaning of certain data structures, but also high-level aspects of the algorithms that used these data structures, e.g. collision detection.

    A single-stepping, single-buffered OpenGL trace player helps enormously when trying to figure out what algorithms an OpenGL program uses.

    In any event, along with these common aspects of reverse engineering (getting started, developing the right tools), the general strategy of posing hypotheses and testing them holds throughout. Once you think you have figured out something new, you need to come up with a way of testing and verifying (or rejecting) the new idea. Unverified knowledge is just a guess, it's not really valid until you have confirmed it with at least one test; the more independent tests the better, as this leads to more confidence in both the new and the established knowledge. Hacking is of the essence here; the faster you can test an idea, the faster you can move on to testing new ones. Not only that, but the results of testing one new idea often opens up more questions and leads to further progress, at least early on.

    This is just like science. The only difference is that when you are reverse engineering something, presumably the underlying mechanisms are already known by others -- the original engineers.

    Since the original poster was interested in graphics, I will add that for OpenGL programs, I use a "DLL proxy" replacement for SGI's OpenGL Stream Codec [sgi.com] based on ideas from a program called gltrace [sgi.com]. The proxy dumps a trace of OpenGL/GLX/WGL calls that can later be replayed, single-stepped, run through a simulator, etc.

    -Kekoa

  • by dr0n3 ( 47107 ) on Wednesday September 29, 1999 @12:36AM (#1651281)


    Quite frankly, if you're interested in graphics algorithms, you'll learn a LOT more by reading a book such as "computer graphics: principles and practice" by foley et al:

    http://www.amazon.com/exec/obidos/ASIN/020184840 6/o/qid=938597329/sr=8-1/002-6530067-26288 16

    Beware though...don't even bother reading this without a knowledge of matrix algebra in the very least...and it won't hurt to know some multivariate and vector calc. The book gives the algs. in C so you can use them in any way you want. Honestly, I think you can probably learn all the math and algorithms you need to be a CG whiz quicker than trying to even partially RE any graphics package out there by looking at asm code.

    Trying to reverse engineer graphics packages will be a pretty big waste of your time...you'll definetely be a PRO at gdb though by the time your through :)

    The reason is, compilers can do some pretty crazy optimizations of the code, and trying to understand what's going on can be nearly impossible (given you have other things to do besides trace through jump tables and stack ptrs all day). Disassembling code is mainly done for easy things such as cracking software that requires reg. keys and such...where in the simple case, you're just making sure some conditional (ie: if (keycheck) blahblah ) always evals to true so that the system thinks you have a valid key (it can get much more complicated than this, but this is the easiest and most common case).

    So, if you get off on gfx algorithms, buy a good book such as the one I mentioned and don't bother trying to disassemble anything...the complexity of today's software systems has led to a big decrease in the use of disassemblers....back in the days of 8bit cpu's and 16k ram one could easily disassemble code to see how for example a parallax scrolling routine was implemented in a game...not anymore I'm afraid :P

    -dr0ne
  • by Kitsune Sushi ( 87987 ) on Wednesday September 29, 1999 @12:02AM (#1651282)

    Here's a *gasp* GPL'ed decompiler for Java, of all things: Homebrew Decompiler [freshmeat.net]. I came across it while searching for GPL'ed software on Freshmeat. The annoying thing comes in when you decompile something, you just get the straight source.. no comments. Because comments and what have you are stripped out during compile time, for hopefully obvious reasons.

    Of course, if you're bothering to decompile something, chances are likely that you're doing so because you know code inside and out. If not, the added benefits comments give to code readability are /really/ going to hit home.. and how.

  • by Anonymous Coward on Wednesday September 29, 1999 @04:30AM (#1651283)

    I used to work at Chipworks [chipworks.com], which reverse-engineers integrated circuits. Here's how *hardware* RE works; I'll get to software later.

    You remove the chip from the package by popping it open or (if it's a plastic package) dropping it in boiling sulphuric acid. You prepare several samples, etching each one to a different level of interconnect. The last sample is etched down to the transistor level.

    You then create large photomosaics of the chip. If you do it with conventional film cameras, you end up (for simple memory chips) with huge "carpets" of images about 8 metres long and 1.5 metres high.

    You get a team of engineers to crawl around on the photos for a few months, marking interconnect, labeling signals (first with tentative names; then with real names) and extract circuitry.

    You get a team of engineers to eyeball the schematics for a few months and organize them. Gradually, the picture of how the chip works emerges. Note that this is for simple chips like DRAM or Flash memory chips. It's totally impractical for a complicated chip like a microprocessor.

    If you're dealing with flash memory, you have to worry about programming algorithms. These chips usually have on-chip ROMs or PLAs which control programming signals. You spend another few months decoding the PLA's and coming up with the algorithms.

    For software, you have to know the background. It really helps if you know which language, compiler, OS, etc. was targeted. Most compilers produce standard assembly blocks for common constructs, so this helps you recognize things.

    In my youth, I partially reverse-engineered the ROM of the TRS-80 Color Computer. Since this was written in assembler, the reverse engineering was not too hard. It's basically a lot of staring.

  • by mischief ( 6270 ) on Wednesday September 29, 1999 @12:16AM (#1651284) Homepage Journal
    Fravia's Pages of Reverse Engineering [instinct.org] are perhaps the most comprehensive pages on the net that cover all sorts of different aspects of the subject. It's great - everything is written in a friendly way and there's absolutely loads of information there.

    --
  • by akey ( 29718 ) on Wednesday September 29, 1999 @12:10AM (#1651285)
    There is no simple answer to the question -- it depends on the platform and on what your specific goals are. To completely reverse engineer a program, ie. generate source code that can be compiled to exactly the same executable files, is extremely difficult. Commercial dissasemblers are generally expensive. So it's usually better to define a smaller section that you want to analyze, instead of trying to recreate equivalent source code. Since I only have experience disassembling/analyzing programs running on x86 DOS/Win/Win9x, I'll have to limit practical tips to those platforms -- and in most cases, you can use a combination of the methods. You'll also need to at least be able to follow assembly code, and will need to understand the function calling conventions that the particular program and operating system use.

    1) Watch the program under a debugger. This is probably the most time-consuming method, as you've got to single step until you find the section of code you're interested in (and this assumes that you can recognize what you're looking for). Most modern Windows debuggers allow you to break when a DLL is loaded, and you can then set breakpoints in the loaded module.

    2) Use an API Spy program (ala Matt Pietrek, which is unfortunately out-of-print). Windows programs make heavy use of calls to functions in DLL's -- it is often possible to intercept these calls. You can find out what DLLs a program is linked to by either disassembling it, or by looking at the executable under a hex editor. To get the source code to the original API Spy from Pietrek's MSJ article, look for MSDN [microsoft.com] Knowlege Base article Q122274 [microsoft.com].

    3) Rename the DLL that you want to intercept calls to, and write a "wrapper" DLL with the original name of the target DLL. The "wrapper" should have stubs for all functions in the target, which simply log information about the function call, and then call the intended function in the target DLL. But if you don't have header files for the functions you want to intercept, you'll need to watch at least one call to the function under the debugger to determine the number and type of arguments, as well as the calling convention.
  • by segmond ( 34052 ) on Wednesday September 29, 1999 @02:58AM (#1651286)

    Reverse engineering is very important. It is important in some cases, where the owner of a software refuses to tell you how a certain future works. When you reverse engineer, you work on low level, you cannot work with high level languages like C. You will need to understand the assembly langauge of the platform you want to reverse. When you obtain the software, you will have to load it into a debugger or disassembler, dump it into assembly code, and figure it out.

    The best way to build this skill is quite easy, tho it takes time and dedication, write lots of small programs in C. compiler then, but generate assembly output instead of executable output. gcc -S. Take a look at the assmebly output and study it, with time you will easily be able to recognize how compilers generate their source, you can take a look at an assembly source, and easily tell if a loop is a while, for, do-while loop, and such.

    Reversing this source into C is a whole new story, if you don't have access to include files and if the binary have been stripped, then it makes it harder. Here is an example of such C source

    _0x8024639c()
    {
    _0x8032d584 = 1; /* char */;
    _0x8032d588 = 0; /* char */
    _0x8032171c();
    _0x8032174c(2, 0x37a);
    _0x8024922c(90);
    }

    This tells us that in address 0x8024639c we have 5 instructions. We load 1 to address 0x8032d584, and 0 to the next address. We call a function at 0x8032171c, we then call another function at 0x8032174c with two arguments which are 2 and 0x37a, then we call another function at 0x8024922c with the argument 90. Not pretty, but with time we will be able to understand the structure of the program, and be able to assign the functions meaninful names.

    Reverse engineering is a task which requires utmost patience! It takes pain and pratice and time. If you want to figure out algorithms, it is better you read papers and come up with your own. If you want to figure out closed algorithms, or learn how a software works when it is closed. then reverse engineering is for you. By the way, the sample snippet of source was done by a friend, and is part of mario64 reversed enginerred.

    In the sample snippet below, he was able to make out some of the variable and this code is much more readable.

    _0x8024b13c()
    {
    if (!_0x8032ddd0) {
    if (mario->Power > 0) {
    block = mario->Power / 256;
    } else {
    block = 0;
    }
    if (Level > 0) {
    DisplayStats |= 2;
    } else {
    DisplayStats &= 0xfffffffd;
    }
    if (CoinCount Coin && _0x8032d5d4 & 1) {
    if (mario->_0x0c & 0x00006000) {
    a = 0x38128081;
    } else {
    a = 0x38118081;
    }
    CoinCount++;
    SetSound(a, mario->0x54);
    }
    if (mario->Life > 100) {
    mario->Life = 100;
    }
    if (mario->Coin > 999) {
    mario->Coin = 999;
    }
    if (CoinCount > 999) {
    CoinCount = 999;
    }
    StarCount = mario->Star;
    LifeCount = mario->Life;
    _0x8033b268 = mario->_0xac;
    if (PowerBar _0xb2 > 0) {
    DisplayStats |= 0x8000;
    } else {
    DisplayStats &= 0xffff7fff;
    }
    }
    }

    If you think you can handle such stuff, then jump aboard and have fun. :-)
  • by cjw ( 96656 ) on Wednesday September 29, 1999 @12:05AM (#1651287)
    Unless you're really after ultra-secret proprietry algorithms, you're time would probably be much better spent looking at some of the recent research literature down yer local Uni library. You won't get full implementations, but you will get explanations, and it'll be easier to understand than pages upon pages of disassembly.
  • by Raphael ( 18701 ) on Wednesday September 29, 1999 @12:46AM (#1651288) Homepage Journal

    There are at least two things that you can do when attempting to reverse engineer a piece of software. The first one (not legal in several countries) is to decompile the code: take a debugger or decompiler and check what instructions are executed. The second one (legal in most countries) is the "blackbox" approach: consider the software as something that produces some output(s) depending on its input(s), and try to guess what is inside.

    This second approach is the "real" reverse engineering. By carefully crafting some inputs and observing the outputs, you can often draw some conclusions about how the software behaves. With some patience and a lot of trial and error on simple inputs, you can find some patterns in the software: stuff that does not change, stuff that changes depending only on one of the inputs, and so on.

    In the good old days (well, five years ago), I was the author of DEU [doomworld.com] (Doom Editing Utilties), the first program that was able to create new levels for Doom [idsoftware.com]. I also contributed to Matt Fell's Unofficial Doom Specs [gamers.org] and Olivier Montannuy's Unofficial Quake Specs [gamers.org], the documents that describe the WAD and PAK file formats and other internal details about Doom and Quake. Almost everything in the Unofficial Doom Specs was gathered by reverse-engineering. It was only later (with the release of Doom II) that id Software [idsoftware.com] released some information to the community, presumably after they saw that editing Doom levels was a very popular activity. I am grateful for id Software's support of the editing community in their later games, but the first informations about Doom had to be found the hard way.

    Most of my efforts in decoding Doom's WAD file format (and later Quake's PAK file format) involved an hex editor for viewing and editing the raw files, and custom tools that I built along the way for making editing easier (or tools that I received from other people, like DEU 3.0 from Brendon Wyber). A key thing is also to share as much information as possible with other people who are progressing on the same front because you often get more in return than what you found by yourself. For WAD files, it was easy to find that the file was organized a bit like a tar archive: a header, a directory containing names of objects and offsets within the file, and the data for the objects. Then the trial and error starts: try to guess what an object might be, modify a few bytes, run the game and see what happens. If your changes produced something useful, write it down and share the info with others. If the game crashed, try again. Repeat until you have understood everything.

    Sometimes, you will find data structures that you do not understand. That was the case for Doom's NODES, SEGS and SSECTORS data. If you share enough information with others, maybe someone will have an idea and find that the data structures are related to something that they know. This is exactly what happened for Doom: Alistair Brown and a group of students from Bradford suggested that the unknown data might be a BSP tree [wisc.edu]. After reading some papers on that topic (I didn't know anything about BSP trees), I was able to implement a first BSP builder in DEU. And then it became possible to create brand new levels for Doom, instead of only changing the textures and location of the monsters as we did in the first few months. Releasing the source code for the tools has probably helped a lot. Other people were able to create their own tools based on that, and then the next reverse-engineering steps became much easier when the other games based on the same engine were released (Doom II, Heretic, Hexen, Strife,...)

    Ah well... The good old times... Sigh!

  • by stevey ( 64018 ) on Wednesday September 29, 1999 @01:49AM (#1651289) Homepage
    I've been reverse engineering, or reversing, Intel binaries for a while now, and although I'm not really qualified this is roughly how I started.


    First of all you need a target program, something that you'd like to reverse. Initially I'd suggest writing a smallish C/C++ program yourself, compiling it, then reversing that - I say this because it'll be small, and you should know how it works.


    Once you have a program to reverse - Around 20-40k would be a good size for a start, then you'll need a dissasembler there are several around, mostly commercial ones, and some free ones.


    Heres the few that I've heard of / used:-


    • IDA This is a great dissasembler, with different back-ends that can dissasemble different things. (For example .class files). Its a commercial one, but you can get a demo. Find this at www.datarescue.com.
    • WinDasm this is also a good one, I can't remember where I saw it.
    • Dis is the best free dissasembler I found, with source code at: http://www.geocities.com/~sangcho/disa sm.html [geocities.com] - The beauty of this is that it builds under Linux, so you can decompile a Windows Program from your windows partition, and study it under Linux.


    Anyway by now you should be able to decompile most executables, and study the assembly language.
    Much of this is going to be strange to you, so try to seperate out the different parts of the assembly - such as the startup code, the function calling, and the error handling.


    After a bit of study you'll soon realise what a lot of the common code is doing.


    Heres a small example of the sort of thing the DIS.exe will produce:




    :00402001 E8AA220000 call 004042B0
    :00402006 83F801 cmp eax, 00000001
    :00402009 7434 je 0040203F
    :0040200B 6A00 push 00000000
    :0040200D 68A0034100 push 004103A0
    (StringData)"Startup Message"
    :00402012 6878034100 push 00410378
    (StringData)"Program Starting In Interactive Mode"
    :00402017 6A00 push 00000000
    :00402019 C705F839410000000000 mov dword[004139F8], 00000000
    :00402023 FF1560644100 call dword[00416460]
    ;;call USER32.MessageBoxA
    :00402029 EB0A jmp 00402035




    From this you can see the names of the win32 function calls that the program is making - this will help you "copy" the program back into C.


    This is what I've done - with a good read of the assembly language you can see which Win32 API calls the program is making, and that should give you a good head starting into reimplimenting the code... *grin*


    Of course if you are just interested in cracking, (Removing protection from programs, etc), then the same things apply - you just search through your listing till you find "Incorrect Serial", etc, and change the conditional jumps appropriately - But thats' bad so I'm not going to encourage you.


    Once you have your program, you can then try to translate it into C .. could be tricky .. or modify it in place. For that you'll need a good hex editor, and some understanding of assembly language. (A quick tip, you can see the instructions, and op codes in the dissasembly so to replace JNZ, with JMP you could search the dissembly for a JMP and find the opcode to use. ;)


    Another to decompiling via static analysis is to study the program inside a debugger. Without a double NuMega's Soft Ice is the best debugger - but its also very, very terse, and quite hard to learn.


    To give you some idea of the power of soft ice, when it is loaded you can set a breakpoint on a function such as "MessageBoxA", (Called from AfxMessageBox, et al), with




    bpx MessageBoxA



    Then when any running program calls this function Soft-Ice will pop up, allowing you to study / modify the running process.


    Anyway thats enought encouragement for now. Just have patience and it will all come to you.


    Steve

  • by The Musician ( 65375 ) on Tuesday September 28, 1999 @11:36PM (#1651290) Homepage
    Check out these tools:

    --

  • by JoeShmoe ( 90109 ) <askjoeshmoe@hotmail.com> on Wednesday September 29, 1999 @01:33AM (#1651291)
    Okay...first of all, the most common reason for reverse-engineering something is to remove or bypass the copy protection scheme. I know this because I see the results float by every day on IRC channels. I bought every game Blizzard ever made, but yet I am extremely glad some talented person reverse-engineered their copy to get rid of the damn CD checks...which I just happened to acquire as a "offsite copy for backup purposes".

    In the interest of education about reverse-engineering, I'm going to discuss a step-by-step process as it relates to the most popular use for it...copy protection. If you want to flame me, or moderate this down to -2, or post hateful comments go ahead...your local library has instuctions on how to make bombs so I see no reason to feel guilty for teaching something that requires at least ten times the brain power of bomb making.

    Not to mention, if you seriously think that someone who has never reverse-engineered a program in his or her life is going to somehow magically take the information I post here and never have to pay for software again, get real. Warez are just a search engine away so if someone actually take the time to LEARN a new skill, I say good for them. Okay, here we go...

    Required definitions:

    1) PRC : Palm Resource File. Like an EXE. Contains app's code, graphics and forms
    2) Form (FRM) : A Palm window filled with text, buttons or dropdowns
    3) Alert (ALT) : Popup form, often used to comment on the validity of one's reg code
    4) String (STR) : ASCII characters like "Registration Successful!"
    5) Offset : Location in the PRC file where we will do some editing
    6) ID : 2 byte hex code such as 05 DC that identifies a Resource
    7) Trap : Palm function to perform a task such as sysTrapStrCompare

    Required tools:

    Yes, they are all for Windows, but if you are smart enough to read /. then you are smart enough to have access to a Windows box or know how to VMWare one.

    1) PilotDis [palmgear.com] to thoroughly break down PRC files
    2) Prc2Bin [palmgear.com] to untangle PRC files into Alerts, Forms and Strings
    3) Palm Emulator [palm.com] (POSE)to run PRC's on your Windows machine for testing
    4) Hex WorkShop [bpsoft.com] to reach into PRC files and change the most delicate parts of them
    5) UltraEdit [ultraedit.com] to quickly find text occurrences in files

    Now, you don't need to own a Palm to learn how to reverse engineer a Palm program, but the emulator isn't going to run without a PalmOS ROM file. If you can't figure out how to get a ROM file on the Internet, forget about learning to reverse engineer and instead learn how to use a search engine. Of course, if you own a Palm, or know someone who does, POSE has a button to download the ROM from it.

    Fire up the Palm Emulator (POSE) and load the OS ROM to begin a new emulation session. Load up whatever program it is you want to reverse engineer. I recommend starting with a nice simple program like Yearly [muenster.de] (stand-by for /. effect) because it is easy to understand.

    Click the menu button and navigate to the Info menu where you'll find an About option. Choose that option and note the text "Unregistered Copy" (write this text down). Now choose the Register option and notice the test "Yearly Registration" (write this down too). Enter a bogus number like 111 and notice the message "Registration Failed: You entered a wrong code!"...yes, you need to write this down too.

    Now, let's see where those resources are in the program file. Run PilotDis with the command "dis yearly.prc". Then run PRC2Bin with the command "prc2bin yearly.prc". If everything was done properly then your should have many .BIN files and a file called "yearly.prc.s"

    We know that the "Registration Failed" window is an Alert because it pops up when we enter the wrong number. If you've installed UltraEdit then right-click on one of the Alert files like "Talt138c.bin" and open it. What do you see inside? It says "Registration Successful!" Check out the other Alerts. Open them one by one. You'll notice that A#138D (Alert ID #138D) contains the text "Registration Failed".

    Now, where do these ID's show up in the program? Open up UltraEdit and load "yearly.prc.s". Search for $138D to locate calls to the Failed Alert.

    Here is the code nearby the call:

    00004a02 4e4fa0c5 TRAP #15,$A0C5 = sysTrapStrCopy
    00004a06 6100bcf4 BSR L48 ;What is this?
    00004a0a defc000c ADDA.W #12!$c,A7
    00004a0e 4a6c0028 TST.W 40(A4)
    00004a12 6708 BEQ L607
    00004a14 3f3c138c MOVE.W #5004!$138c,-(A7) ;Successful
    00004a18 60000006 BRA L608
    00004a1c 3f3c138d L607 MOVE.W #5005!$138d,-(A7) ;Failed
    00004a20 4e4fa192 L608 TRAP #15,$A192 = sysTrapFrmAlert

    It is called at x4A1C (Address 4A1C), right after the #5005. Right above it is a call to $138C after #5004. This is our Successful Alert. Where does it decide what Alert to branch to? See the instruction 'BEQ'? That means 'branch if the compare or test equals 0'. The TST.W 40(A4) code above it checks memory location 40(A4). Therefore, somewhere in the program, 40(A4) is set to a value and depending on the value, flags either Pass or Fail responses. In this case, a 0 means we've Failed the check. Let's take a look at the the code immediately above it: L48 (label 48), part of the BSR (Branch Subroutine).

    Here is truncated routine L48 that you found by searching for 'L48':

    0000071e 3e06 MOVE.W D6,D7
    00000720 9e40 SUB.W D0,D7
    00000722 426c0028 CLR.W 40(A4) ;Our memory address! ~~~~~
    000007fa 4e4fa0c8 TRAP #15,$A0C8 = sysTrapStrCompare
    000007fe 4a40 TST.W D0
    00000800 6606 BNE L53 ;Leave 0 or make 1?
    00000802 397c00010028 MOVE.W #1,40(A4)
    00000808 4cee04f8ffe8 L53 MOVEM.L -24(A6),D3-D7/A2
    0000080e 4e5e UNLK A6
    00000810 4e75 RTS

    Noticed that the instruction CLR.W 40(A4) refers to the key address? This makes the memory location equal to 0 which it remains until another instruction affects 40(A4). The only way around it is at x0802 where 40(A4) may become 1. The BNE instruction above x0802 steers the program from the Pass outcome. Farther up, the instruction sysTrapStrCompare is a big tip-off things are coming to a close in L48. Memory location D0 will hold a 0 if the two compared values are equal and a 1 if they are not. The BNE instruction at x0800 means "branch if the compare or test does not equal 0". So, if we can ensure that the routine always returns a 1, it will always Pass.

    Let's take the quickest path and plan to get rid of the BNE instruction, ensuring that we will always MOVE.W #1 into 40(A4). When you want to remove an instruction, the easiest thing to fill it with is a NOP, short for no instruction. The 2 byte opcode for NOP is 4E 71.

    "Huh?" Well, unfortuately, Palms use Motorola DragonBall processors and the list of instruction codes is copyrighted material. I can't provide a link to it here. If you are seriously interested in reverse engineering on the Palm platform, you'll have to contact Motorola and request a copy from them. I'm providing the NOP number here so that its possible to learn how a reverse-engineering process works.

    Anyway, at x0800 we want to place 4E 71. Because our BNE L53 instruction is also 2 bytes we only need one NOP. Open Hex Workshop or another hex editor and go to address x0800. In UltraEdit, type CTRL+G and type '0x0800'. You should find '66 06' there. Type over it with '4E 71' and save.

    Now, reload the modified yearly.prc file into POSE. Try to register with any number. Does it work? Of course it does. Check the About screen. It says "Registered" now.

    Thus ends the lesson. You now know why reverse-engineering is such a hot topic on the Internet today.

    - JoeShmoe

    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= -=-=-=-=-=-=-=-

Math is like love -- a simple idea but it can get complicated. -- R. Drabek

Working...