Getting Started Reverse Engineering

As engineers and computer folks, we all like to figure out how things work. We just can't leave well enough alone, we have to poke and prod at things until we can see exactly how the implementers did it. While we sometimes pull things apart just out of curiosity, sometimes we have to get in there and figure out how something was done so that we can take advantage of a feature or work around a bug in the implementation. Reverse engineering allows us to peel the layers of engineering back one at a time until you can see enough information to see the item works.

For some odd reason, I've spent a good bit of time in my career figuring out how other people did things. In this column, I want discuss how you can get started with your own reverse engineering tasks. I'll start out with the biggest mistakes that most people make reverse engineering. Finally, it takes quite a bit of knowledge to become really good at reverse engineering so I'll point out the areas were you can study to learn more. In my next column, I'll go through a real life example of something I reversed engineered so you can see the thought process in action.

Before we get started, I have to discuss a bit about the legal ramifications to reverse engineering. Most of your software licenses have clauses in them where you are not supposed to do any reverse engineering. What I discuss in this column might cause you to break those licenses. Therefore, caveat emptor, buyer beware. If a software manufacturer does sue you, you cannot hold me responsible as I am giving you plenty of warning. I am not a lawyer so check with your own legal council before you proceed. The final note about reverse engineering is that it can make your life very difficult in future releases of your product. While you might figure out how something works internally, if you rely on that internal knowledge in your product, you can easily break the next time the operating system or third party product you integrate changes. You should never rely on hacks you figured out through reverse engineering unless you are prepared to spend a considerable amount of time re-reverse engineering each time a new release comes out. Operating system writers and third party vendors spend a considerable amount of time working on documented interfaces for you to use. If you circumvent them, you can pay an exorbitant price down the road. They are not called undocumented interfaces or techniques for nothing!

The Big Mistakes

The best way to show the first mistake is to start out with the first two lines of an email I received recently: "I need to figure out how Word does word wrapping with variable pitched fonts. How do I start?" The mistake is that people think they can reverse engineer their way to an algorithmic design for their product. While I'm sure if you were given enough resources and infinite time, you could probably figure it out. However, you would take the remainder of your 30+-year career looking at the same four billion assembly language instructions. Reverse engineering will never take the place of designing your application.

The second, and most common, mistake is that people try to reverse engineer far more than they should. It's the age-old case of biting off more than you can chew. To successfully reverse engineer, you need to have a clear and concise goal. My rule of thumb is to never embark on a reverse engineering task unless I feel it's solvable in less than a day or two. It's just not worth the effort to reverse engineer something for several weeks when you could spend a couple of days designing around the problem or issue right up front.

What You Have To Know

While most people think being an assembly language programming god is the first step to reverse engineering, it really isn't. It helps quite a bit, but I've figured out how many things work without ever cracking a disassembler. The most important thing when reverse engineering is to step back and figure out how you would implement the functionality you are reverse engineering. By writing out the algorithm you would use to solve a problem, you can many times "see" very quickly how something works.

An excellent example is when I needed to figure out how compiled VB binaries and p-code VB binaries called into the VB run time, MSVBVM60.DLL. My first thought was that if I were responsible for designing the VB run time, I would want the interfaces to be the same no matter how the VB code was compiled. That way I would have only one way of testing interface calling. I had heard that p-code executes directly and not run through a Just-In-Time (JIT) compilation process. Therefore, the p-code calls would have to go through some "thunk" to call the run time. In scripting languages, thunks allow the scripting language to call into actual CPU code. The interesting thing with thunks is that they are allocated memory that the programmer has the CPU instruction pointer jump to. With this thought, I figured that if I were writing the compiled VB portions, I would use the same technique.

When I was going through this thought process, I never once used the debugger or looked at a disassembly. In essence, I was making a hypothesis. The good old scientific method proves itself yet again. Armed with my hypothesis on how VB made the calls, loaded up one native compiled application and one p-code compiled application into two debuggers. I set a breakpoint on rtcBeep exported from MSVBVM60.DLL because I guessed that the VB intrinsic function, Beep, must call down into rtcBeep. When each compiled program stopped on rtcBeep, I looked up the call stack at the calling function. The Call stack window showed that the address for the caller did not have symbols. I then checked the address of the memory against the Modules dialog and noticed the address of the memory did not appear in any of the loaded modules. I then when through the same process with the p-code compiled application, so I could verify my hypothesis again. Therefore, memory containing the thunk callers came from allocated memory and both native compiled and p-code compiled VB both called through thunks the same way. It didn't take any knowledge of assembly language to figure out the solution, just a hypothesis on how I would have implemented the functionality if I were to write it, and a way to verify that hypothesis.

As you can see from the previous discussion, it also helps to have an idea how different problems can be solved using the facilities provided by the operating system. In the Windows world, that means knowing about how Windows itself works. The first book you need to read cover to cover is Charles Petzold's Programming Windows. Charles covers how the basics of Windows and shows you how it all fits together. Fundamentally, Windows is a simple messaging based system and if you know messaging like the back of your hand, you will have a much better chance at figuring out how to consider solving various reverse engineering challenges. You will learn more about Windows if you sit down and write Notepad in straight C programming than almost anything else. The second book you need to read from cover to cover is Jeffrey Richter's Programming Applications for Windows. Once you understand the fundamentals of Windows, Jeffrey's book will get you up to speed on things like memory management and DLLs. Once you have a good grasp of those two technologies, you will be able to see how many problems in Windows get solved. Depending on what you are doing, a few other books might be useful as well. David Solomon and Mark Russinovich's Inside Windows 2000 can give you insight as to how Windows 2000 works at the kernel level. If you want to learn how to take advantage of the debugger, my own Debugging Applications can show you how to do advanced things with the Visual C++ debugger.

As much as you would like to avoid it, you do need to know assembly language in order to do the most advanced reverse engineering. There are still a few books floating around on how to program Intel x86 assembly language. The one I used to learn with was Mastering Turbo Assembler by Tom Swan, which I am sure is out of print. Assembly language is still taught at the college level so there are good learning books out there. In order to learn assembly language you should look at using the Microsoft Assembler (MASM), which is available with your Universal MSDN subscription, to write either a few simple programs or a DLL with some routines in them. You don't have to get super proficient at assembly language, you just need to be able to read it.

What You Have To Use

After reading the books, you need to start developing your toolkit. There are many tools you can use, but I thought I would list the tools that I have purchased or acquired and I move from machine to machine when reverse engineering. I'll start out with the free products and work my way to the commercial products.

PEDUMP

Matt Pietrek wrote PEDUMP and it's available on the MSDN CD or MSDN Online. PEDUMP dumps all the information about a Portable Executable (PE) binary. You can get the same output with DUMPBIN from Visual Studio, but I like the format of PEDUMP better. When looking for imported and exported functions, you need PEDUMP.

REGMON and FILEMON

Mark Russinovich wrote both REGMON and FILEMON, which are free and downloadable from www.sysinternals.com. REGMON monitors and completely reports all registry access on your computer. FILEMON monitors all disk and file accesses on you computer. Both of these tools allow you to easily see who's doing what to whom. One time I purchased a product that was downloadable and as a challenge, I wanted to see if I could break their registration scheme before I entered my valid, purchased ID. A total of two minutes with REGMON and I broke the scheme.

DEPENDS

The DEPENDS program from the Platform SDK reports all imported functions used by a program. You can even run an application under depends and see what functions it acquires through GetProcAddress. DEPENDS is the tool for monitoring what exports are used out of a DLL.

BoundsChecker

BoundsChecker is a commercial error detection tool from Compuware/NuMega. You can get more information about BoundsChecker by visiting www.numega.com. What many people don't realize about BoundsChecker is that it will monitor and record each and every API call a program makes and show them in the wonderful Event view. What makes it even more interesting is that BoundsChecker will record the complete parameter information and function return values as well. While you can't see into the APIs, BoundsChecker makes it quite easy to see API functions an algorithm called to get the work done. When I worked at NuMega, one of the demos we had was to show how the Solitaire game did the card magic at the end of the game.

SoftICE

SoftICE is also a commercial product from Compuware/NuMega. When you think of reverse engineering in Windows, SoftICE is right there because it's used by more people to reverse engineer things than anything else. I described how to get started with SoftICE in a previous column so you can turn there to get an idea how to use it. What I've always found amusing is that SoftICE is one of the most heavily pirated pieces of software around today. The beauty of SoftICE is that it allows you to see anywhere and everywhere, as well as get more information about the operating system than anything else.

A Disassembler

The final tool you need for larger reverse engineering chores is a disassembler. You already have one with the -DISASM switch to DUMPBIN. What makes DUMPBIN a little more useable is that it will use any symbols it can find so you can get more information. What you will probably want to do is to write a Perl script to process the output to make it more readable. While you can always use the debugger's Disassembly window, you sometimes need the disassembly in a text file.

Wrap Up

I hope I've given you an idea on how to get started with your reverse engineering challenges and how to deploy it properly. It's a big commitment to reverse engineer something so use it only when you have no other choice. In my next column, I'll apply the lessons and reverse engineer a few things in the operating system so you can see how they work.


Comments

  • Great

    Posted by Heather M TechnoWidgets on 02/21/2014 05:40pm

    Good article. Did a search on Google and there it was great info. Thanks

    Reply
  • The Secret dominate the mizuno-world Is Pretty Basic!

    Posted by Acuddence on 05/03/2013 01:57pm

    Different queries about mizuno have been answered in addition to reasons why you ought to study each concept within this report.[url=http://www.nikejpgolf.biz/]nike ゴルフ[/url] One more double take on mizuno [url=http://www.nikejpgolf.biz/nike-ゴルフボール-c-23.html]nike ボール[/url] Beginner questions about nike have been answered and in addition reasons why you should view every message in this document. [url=http://www.nikejpgolf.biz/nike-アイアン-c-1.html]nike ゴルフ[/url] Independent guide unveil Ten completely new stuff surrounding nike that nobody is talking about. [url=http://www.nikejpgolf.biz/nike-アイアン-c-1.html]ナイキ[/url] Their mizuno Organisation Talk : Which means that, who loves doubts is successful?!? [url=http://www.nikejpgolf.biz/nike-ゴルフシューズ-c-15.html]nike dunk[/url] Resources and production throughout Houston - nike has left with no hasta la vista [url=http://www.nikeyasuyi.com/]nike[/url] Outfits and assembly throughout Big Apple - mizuno has left without any good bye [url=http://www.nikeyasuyi.com/nikeナイキRunning-c-3.html]ナイキランニング[/url] All mizuno Business organisation Speak : Employees who cares for nothing wins?!? [url=http://www.nikeyasuyi.com/nikeナイキDunk-c-9.html]ナイシューズ[/url] The main mizuno Sales Meet - Consequently, who cares for nada profits?! [url=http://www.nikeyasuyi.com/nikeナイキDunk-c-9.html]nike シューズ[/url] mizuno can provide completely new life span into an old subject- golden basic

    Reply
  • Hawaii license plate lookup

    Posted by Hawaii license plate lookup on 03/13/2013 06:38am

    It’s essential that you produce and also create about very straight solutions such as this. The post is usually finish of awesome and also thought-provoking material. I believe much of the post. Thanks a lot associated with talking about. Hawaii license plate lookup

    Reply
  • Hawaii license plate lookupHawaii license plate lookup

    Posted by Hawaii license plate lookup on 03/13/2013 06:37am

    Your material is obviously very well examined and arranged, as well as being launched in awesome circumstances. Thank you so much for awesome useful material. Hawaii license plate lookup

    Reply
  • please answer

    Posted by roldamn on 01/30/2013 12:30am

    ..its possible to get the source code of any software with the use reserve engineer software? i mean.. i buy software from US through paying credit card. they send me the code and email so that i can complete manipulate the software. so i want now.. is to open the source code of the software that i buy while ago .. i want to change the color and some staff of that software? it possible to get the source code or not?? using the reserve enginer software?? bcoz i saw that . it can dis assemble? please answer tnx

    Reply
  • please answer :(

    Posted by roldan on 01/30/2013 12:29am

    ..its possible to get the source code of any software with the use reserve engineer software? i mean.. i buy software from US through paying credit card. they send me the code and email so that i can complete manipulate the software. so i want now.. is to open the source code of the software that i buy while ago .. i want to change the color and some staff of that software? it possible to get the source code or not?? using the reserve enginer software?? bcoz i saw that . it can dis assemble? please answer tnx

    Reply
  • hi please answer

    Posted by roldan on 01/30/2013 12:28am

    ..its possible to get the source code of any software with the use reserve engineer software? i mean.. i buy software from US through paying credit card. they send me the code and email so that i can complete manipulate the software. so i want now.. is to open the source code of the software that i buy while ago .. i want to change the color and some staff of that software? it possible to get the source code or not?? using the reserve enginer software?? bcoz i saw that . it can dis assemble? please answer tnx

    Reply
  • hi

    Posted by roldan on 01/30/2013 12:28am

    ..its possible to get the source code of any software with the use reserve engineer software? i mean.. i buy software from US through paying credit card. they send me the code and email so that i can complete manipulate the software. so i want now.. is to open the source code of the software that i buy while ago .. i want to change the color and some staff of that software? it possible to get the source code or not?? using the reserve enginer software?? bcoz i saw that . it can dis assemble? please answer tnx

    Reply
  • Automatic binary file parsing

    Posted by Synalysis on 01/19/2013 02:46am

    There's a new reverse engineering tool that allows to incrementally interpret a file's contents. Additionally a powerful search feature, file histogram, hex editor and scripting are included. http://www.synalysis.net

    Reply
  • mr

    Posted by parente on 01/17/2013 03:57am

    Congratulations for your introductory lecture into RE.

    Reply
  • Loading, Please Wait ...

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Hybrid cloud platforms need to think in terms of sweet spots when it comes to application platform interface (API) integration. Cloud Velocity has taken a unique approach to tight integration with the API sweet spot; enough to support the agility of physical and virtual apps, including multi-tier environments and databases, while reducing capital and operating costs. Read this case study to learn how a global-level Fortune 1000 company was able to deploy an entire 6+ TB Oracle eCommerce stack in Amazon Web …

  • Corporate e-Learning technology has a long and diverse pedigree. As far back as the 1980s, companies were adopting computer-based training to supplement traditional classroom activities. More recently, rich web-based applications have added streaming audio and video, real-time collaboration and other new tools to the e-Learning mix. At the same time, the growing availability of informal learning tools--a category that includes everything from web searches to social media posts--are having a major impact on …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds