Getting Started Reverse Engineering

As engineers and computer folks, we all like to figure out how things work. We just can't leave well enough alone, we have to poke and prod at things until we can see exactly how the implementers did it. While we sometimes pull things apart just out of curiosity, sometimes we have to get in there and figure out how something was done so that we can take advantage of a feature or work around a bug in the implementation. Reverse engineering allows us to peel the layers of engineering back one at a time until you can see enough information to see the item works.

For some odd reason, I've spent a good bit of time in my career figuring out how other people did things. In this column, I want discuss how you can get started with your own reverse engineering tasks. I'll start out with the biggest mistakes that most people make reverse engineering. Finally, it takes quite a bit of knowledge to become really good at reverse engineering so I'll point out the areas were you can study to learn more. In my next column, I'll go through a real life example of something I reversed engineered so you can see the thought process in action.

Before we get started, I have to discuss a bit about the legal ramifications to reverse engineering. Most of your software licenses have clauses in them where you are not supposed to do any reverse engineering. What I discuss in this column might cause you to break those licenses. Therefore, caveat emptor, buyer beware. If a software manufacturer does sue you, you cannot hold me responsible as I am giving you plenty of warning. I am not a lawyer so check with your own legal council before you proceed. The final note about reverse engineering is that it can make your life very difficult in future releases of your product. While you might figure out how something works internally, if you rely on that internal knowledge in your product, you can easily break the next time the operating system or third party product you integrate changes. You should never rely on hacks you figured out through reverse engineering unless you are prepared to spend a considerable amount of time re-reverse engineering each time a new release comes out. Operating system writers and third party vendors spend a considerable amount of time working on documented interfaces for you to use. If you circumvent them, you can pay an exorbitant price down the road. They are not called undocumented interfaces or techniques for nothing!

The Big Mistakes

The best way to show the first mistake is to start out with the first two lines of an email I received recently: "I need to figure out how Word does word wrapping with variable pitched fonts. How do I start?" The mistake is that people think they can reverse engineer their way to an algorithmic design for their product. While I'm sure if you were given enough resources and infinite time, you could probably figure it out. However, you would take the remainder of your 30+-year career looking at the same four billion assembly language instructions. Reverse engineering will never take the place of designing your application.

The second, and most common, mistake is that people try to reverse engineer far more than they should. It's the age-old case of biting off more than you can chew. To successfully reverse engineer, you need to have a clear and concise goal. My rule of thumb is to never embark on a reverse engineering task unless I feel it's solvable in less than a day or two. It's just not worth the effort to reverse engineer something for several weeks when you could spend a couple of days designing around the problem or issue right up front.

What You Have To Know

While most people think being an assembly language programming god is the first step to reverse engineering, it really isn't. It helps quite a bit, but I've figured out how many things work without ever cracking a disassembler. The most important thing when reverse engineering is to step back and figure out how you would implement the functionality you are reverse engineering. By writing out the algorithm you would use to solve a problem, you can many times "see" very quickly how something works.

An excellent example is when I needed to figure out how compiled VB binaries and p-code VB binaries called into the VB run time, MSVBVM60.DLL. My first thought was that if I were responsible for designing the VB run time, I would want the interfaces to be the same no matter how the VB code was compiled. That way I would have only one way of testing interface calling. I had heard that p-code executes directly and not run through a Just-In-Time (JIT) compilation process. Therefore, the p-code calls would have to go through some "thunk" to call the run time. In scripting languages, thunks allow the scripting language to call into actual CPU code. The interesting thing with thunks is that they are allocated memory that the programmer has the CPU instruction pointer jump to. With this thought, I figured that if I were writing the compiled VB portions, I would use the same technique.

When I was going through this thought process, I never once used the debugger or looked at a disassembly. In essence, I was making a hypothesis. The good old scientific method proves itself yet again. Armed with my hypothesis on how VB made the calls, loaded up one native compiled application and one p-code compiled application into two debuggers. I set a breakpoint on rtcBeep exported from MSVBVM60.DLL because I guessed that the VB intrinsic function, Beep, must call down into rtcBeep. When each compiled program stopped on rtcBeep, I looked up the call stack at the calling function. The Call stack window showed that the address for the caller did not have symbols. I then checked the address of the memory against the Modules dialog and noticed the address of the memory did not appear in any of the loaded modules. I then when through the same process with the p-code compiled application, so I could verify my hypothesis again. Therefore, memory containing the thunk callers came from allocated memory and both native compiled and p-code compiled VB both called through thunks the same way. It didn't take any knowledge of assembly language to figure out the solution, just a hypothesis on how I would have implemented the functionality if I were to write it, and a way to verify that hypothesis.

As you can see from the previous discussion, it also helps to have an idea how different problems can be solved using the facilities provided by the operating system. In the Windows world, that means knowing about how Windows itself works. The first book you need to read cover to cover is Charles Petzold's Programming Windows. Charles covers how the basics of Windows and shows you how it all fits together. Fundamentally, Windows is a simple messaging based system and if you know messaging like the back of your hand, you will have a much better chance at figuring out how to consider solving various reverse engineering challenges. You will learn more about Windows if you sit down and write Notepad in straight C programming than almost anything else. The second book you need to read from cover to cover is Jeffrey Richter's Programming Applications for Windows. Once you understand the fundamentals of Windows, Jeffrey's book will get you up to speed on things like memory management and DLLs. Once you have a good grasp of those two technologies, you will be able to see how many problems in Windows get solved. Depending on what you are doing, a few other books might be useful as well. David Solomon and Mark Russinovich's Inside Windows 2000 can give you insight as to how Windows 2000 works at the kernel level. If you want to learn how to take advantage of the debugger, my own Debugging Applications can show you how to do advanced things with the Visual C++ debugger.

As much as you would like to avoid it, you do need to know assembly language in order to do the most advanced reverse engineering. There are still a few books floating around on how to program Intel x86 assembly language. The one I used to learn with was Mastering Turbo Assembler by Tom Swan, which I am sure is out of print. Assembly language is still taught at the college level so there are good learning books out there. In order to learn assembly language you should look at using the Microsoft Assembler (MASM), which is available with your Universal MSDN subscription, to write either a few simple programs or a DLL with some routines in them. You don't have to get super proficient at assembly language, you just need to be able to read it.

What You Have To Use

After reading the books, you need to start developing your toolkit. There are many tools you can use, but I thought I would list the tools that I have purchased or acquired and I move from machine to machine when reverse engineering. I'll start out with the free products and work my way to the commercial products.


Matt Pietrek wrote PEDUMP and it's available on the MSDN CD or MSDN Online. PEDUMP dumps all the information about a Portable Executable (PE) binary. You can get the same output with DUMPBIN from Visual Studio, but I like the format of PEDUMP better. When looking for imported and exported functions, you need PEDUMP.


Mark Russinovich wrote both REGMON and FILEMON, which are free and downloadable from REGMON monitors and completely reports all registry access on your computer. FILEMON monitors all disk and file accesses on you computer. Both of these tools allow you to easily see who's doing what to whom. One time I purchased a product that was downloadable and as a challenge, I wanted to see if I could break their registration scheme before I entered my valid, purchased ID. A total of two minutes with REGMON and I broke the scheme.


The DEPENDS program from the Platform SDK reports all imported functions used by a program. You can even run an application under depends and see what functions it acquires through GetProcAddress. DEPENDS is the tool for monitoring what exports are used out of a DLL.


BoundsChecker is a commercial error detection tool from Compuware/NuMega. You can get more information about BoundsChecker by visiting What many people don't realize about BoundsChecker is that it will monitor and record each and every API call a program makes and show them in the wonderful Event view. What makes it even more interesting is that BoundsChecker will record the complete parameter information and function return values as well. While you can't see into the APIs, BoundsChecker makes it quite easy to see API functions an algorithm called to get the work done. When I worked at NuMega, one of the demos we had was to show how the Solitaire game did the card magic at the end of the game.


SoftICE is also a commercial product from Compuware/NuMega. When you think of reverse engineering in Windows, SoftICE is right there because it's used by more people to reverse engineer things than anything else. I described how to get started with SoftICE in a previous column so you can turn there to get an idea how to use it. What I've always found amusing is that SoftICE is one of the most heavily pirated pieces of software around today. The beauty of SoftICE is that it allows you to see anywhere and everywhere, as well as get more information about the operating system than anything else.

A Disassembler

The final tool you need for larger reverse engineering chores is a disassembler. You already have one with the -DISASM switch to DUMPBIN. What makes DUMPBIN a little more useable is that it will use any symbols it can find so you can get more information. What you will probably want to do is to write a Perl script to process the output to make it more readable. While you can always use the debugger's Disassembly window, you sometimes need the disassembly in a text file.

Wrap Up

I hope I've given you an idea on how to get started with your reverse engineering challenges and how to deploy it properly. It's a big commitment to reverse engineer something so use it only when you have no other choice. In my next column, I'll apply the lessons and reverse engineer a few things in the operating system so you can see how they work.


  • Great

    Posted by Heather M TechnoWidgets on 02/21/2014 05:40pm

    Good article. Did a search on Google and there it was great info. Thanks

  • please answer

    Posted by roldamn on 01/30/2013 12:30am

    ..its possible to get the source code of any software with the use reserve engineer software? i mean.. i buy software from US through paying credit card. they send me the code and email so that i can complete manipulate the software. so i want now.. is to open the source code of the software that i buy while ago .. i want to change the color and some staff of that software? it possible to get the source code or not?? using the reserve enginer software?? bcoz i saw that . it can dis assemble? please answer tnx

  • Automatic binary file parsing

    Posted by Synalysis on 01/19/2013 02:46am

    There's a new reverse engineering tool that allows to incrementally interpret a file's contents. Additionally a powerful search feature, file histogram, hex editor and scripting are included.

  • mr

    Posted by parente on 01/17/2013 03:57am

    Congratulations for your introductory lecture into RE.

  • Reverse Java

    Posted by rajesh.jadhav.18 on 02/02/2010 03:36am

    See for a dynamic reverse engineering application which generates UML Sequence diagram and view of Participating Class diagram from any Java Application at runtime

  • There has a good Reverse Engineering Tool.

    Posted by pengch on 03/31/2004 02:21am

    There has a good Reverse Engineering Tool.
    Try it now.

    Auto Debug Tool

  • I think you explain nothing!

    Posted by Legacy on 01/31/2004 12:00am

    Originally posted by: RAHP

    I think you explain nothing! You lose the point to write only a few things that I can find just surfing the internet.

    A good recomendation its a step by step process to do that.

  • My appreciation

    Posted by Legacy on 01/03/2004 12:00am

    Originally posted by: Jim

    Just a person lucky enough to find a page that is useful in my search for knowledge.I think your article was very well worth reading ! No one can give another all the answers they require. It is up to the individual to use the brain they have(or not)-- to do that. Thanks for the info !

  • i dont think you realy explained the cracking itself

    Posted by Legacy on 05/25/2003 12:00am

    Originally posted by: abire

    u would had explain further cos as for me i do not know anything about programming .so must i learn all those stuffs

  • IDA Pro?

    Posted by Legacy on 08/31/2002 12:00am

    Originally posted by: Blazde

    Great article, but I'm surprised to see no mention of IDA Pro in the disassemblers section. It's an interactive dissasembler, so while it figures a huge amount of the code out itself, you can tweak it, add labels and comments etc... as you figure out what the code does. It'll also recognise and disassemble most any kind of binary. I was playing with some PIC code from my digital TV box in it the other day. Ermm, I'll go now, I feel like a salesperson.

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • As all sorts of data becomes available for storage, analysis and retrieval - so called 'Big Data' - there are potentially huge benefits, but equally huge challenges...
  • The agile organization needs knowledge to act on, quickly and effectively. Though many organizations are clamouring for "Big Data", not nearly as many know what to do with it...
  • Cloud-based integration solutions can be confusing. Adding to the confusion are the multiple ways IT departments can deliver such integration...

Most Popular Programming Stories

More for Developers

RSS Feeds

Thanks for your registration, follow us on our social networks to keep up-to-date