Inside the Executable


The Portable Executable Format is the data structure that describes how the various parts of a Win32 executable file are held together. It allows the operating system to load the executable and locate the dynamically linked libraries required to run that executable and to navigate the code, data, and resource sections compiled into that executable.

Getting Over DOS

The PE Format was created for Windows but Microsoft had to make sure that running such an executable in DOS would yield a meaningful error message and exit. To this end, the very first bit of a Windows executable file is actually a DOS executable (sometimes known as the stub) that writes "This program requires Windows" or similar, then exits.

The format of the DOS stub is:

   e_magic As Integer           ''\\ Magic number
   e_cblp As Integer            ''\\ Bytes on last page of file
   e_cp As Integer              ''\\ Pages in file
   e_crlc As Integer            ''\\ Relocations
   e_cparhdr As Integer         ''\\ Size of header in paragraphs
   e_minalloc As Integer        ''\\ Minimum extra paragraphs needed
   e_maxalloc As Integer        ''\\ Maximum extra paragraphs needed
   e_ss As Integer              ''\\ Initial (relative) SS value
   e_sp As Integer              ''\\ Initial SP value
   e_csum As Integer            ''\\ Checksum
   e_ip As Integer              ''\\ Initial IP value
   e_cs As Integer              ''\\ Initial (relative) CS value
   e_lfarlc As Integer          ''\\ File address of relocation table
   e_ovno As Integer            ''\\ Overlay number
   e_res(0 To 3) As Integer     ''\\ Reserved words
   e_oemid As Integer           ''\\ OEM identifier (for e_oeminfo)
   e_oeminfo As Integer         ''\\ OEM information; e_oemid specific
   e_res2(0 To 9) As Integer    ''\\ Reserved words
   e_lfanew As Long             ''\\ File address of new exe header
End Type

The only field of this structure that is of interest to Windows is e_lfanew; it is the file pointer to the new Windows executable header. To skip over the DOS part of the program, set the file pointer to the value held in this field:

Private Sub SkipDOSStub(ByVal hfile As Long)

Dim BytesRead As Long

'\\ Go to start of file...
Call SetFilePointer(hfile, 0, 0, FILE_BEGIN)
If Err.LastDllError Then
   Debug.Print LastSystemError
End If

Call ReadFileLong(hfile, VarPtr(stub), Len(stub), BytesRead, ByVal 0&)
Call SetFilePointer(hfile, stub.e_lfanew, 0, FILE_BEGIN)

End Sub

The NT Header

The NT header holds the information needed by the Windows program loader to load the program. It consists of the PE File signature followed by an IMAGE_FILE_HEADER and IMAGE_OPTIONAL_HEADER records.

For applications designed to run under Windows (in other words, not OS/2 or VxD files) the four bytes of the PE File signature should equal &h4550. The other defined signatures are:

Public Enum ImageSignatureTypes
   IMAGE_DOS_SIGNATURE    = &H5A4D    ''\\ MZ
   IMAGE_OS2_SIGNATURE    = &H454E    ''\\ NE
   IMAGE_OS2_SIGNATURE_LE = &H454C    ''\\ LE
   IMAGE_VXD_SIGNATURE    = &H454C    ''\\ LE
   IMAGE_NT_SIGNATURE     = &H4550    ''\\ PE00
End Enum

Following the PE file signature is the IMAGE_NT_HEADERS structure that stores information about the target environment of the executable. The structure is:

   Machine As Integer
   NumberOfSections As Integer
   TimeDateStamp As Long
   PointerToSymbolTable As Long
   NumberOfSymbols As Long
   SizeOfOptionalHeader As Integer
   Characteristics As Integer
End Type

The Machine member describes what target CPU the executable was compiled for. It can be one of the following:

Public Enum ImageMachineTypes
   IMAGE_FILE_MACHINE_I386      = &H14C  ''\\ Intel 386.
   ''\\ MIPS little-endian,      = &H160 big-endian
   IMAGE_FILE_MACHINE_R3000     = &H162
   IMAGE_FILE_MACHINE_R4000     = &H166  ''\\ MIPS little-endian
   IMAGE_FILE_MACHINE_R10000    = &H168  ''\\ MIPS little-endian
   IMAGE_FILE_MACHINE_WCEMIPSV2 = &H169  ''\\ MIPS little-endian WCE v2
   IMAGE_FILE_MACHINE_ALPHA     = &H184  ''\\ Alpha_AXP
   IMAGE_FILE_MACHINE_POWERPC   = &H1F0  ''\\ IBM PowerPC Little-Endian
   IMAGE_FILE_MACHINE_SH3       = &H1A2  ''\\ SH3 little-endian
   IMAGE_FILE_MACHINE_SH3E      = &H1A4  ''\\ SH3E little-endian
   IMAGE_FILE_MACHINE_SH4       = &H1A6  ''\\ SH4 little-endian
   IMAGE_FILE_MACHINE_ARM       = &H1C0  ''\\ ARM Little-Endian
   IMAGE_FILE_MACHINE_IA64      = &H200  ''\\ Intel 64
End Enum

The SizeOfOptionalHeader member indicates the size (in bytes) of the IMAGE_OPTIONAL_HEADER structure that immediately follows it. In practice, this structure is not optional, so that is a bit of a misnomer. This structure is defined as:

   Magic As Integer
   MajorLinkerVersion As Byte
   MinorLinkerVersion As Byte
   SizeOfCode As Long
   SizeOfInitializedData As Long
   SizeOfUninitializedData As Long
   AddressOfEntryPoint As Long
   BaseOfCode As Long
   BaseOfData As Long
End Type

and this in turn is immediately followed by the IMAGE_OPTIONAL_HEADER_NT structure:

   ImageBase As Long
   SectionAlignment As Long
   FileAlignment As Long
   MajorOperatingSystemVersion As Integer
   MinorOperatingSystemVersion As Integer
   MajorImageVersion As Integer
   MinorImageVersion As Integer
   MajorSubsystemVersion As Integer
   MinorSubsystemVersion As Integer
   Win32VersionValue As Long
   SizeOfImage As Long
   SizeOfHeaders As Long
   CheckSum As Long
   Subsystem As Integer
   DllCharacteristics As Integer
   SizeOfStackReserve As Long
   SizeOfStackCommit As Long
   SizeOfHeapReserve As Long
   SizeOfHeapCommit As Long
   LoaderFlags As Long
   NumberOfRvaAndSizes As Long
   DataDirectory(0 To 15) As IMAGE_DATA_DIRECTORY
End Type

The most useful field of this structure (to my purposes, anyhow) are the 16 IMAGE_DATA_DIRECTORY entries. These describe where (if at all) the particular sections of the executable are located. The structure is defined thus:

   VirtualAddress As Long
   Size As Long
End Type

And the directories are held in order, thus:

Public Enum ImageDataDirectoryIndexes
   IMAGE_DIRECTORY_ENTRY_EXPORT       = 0  ''\\ Export Directory
   IMAGE_DIRECTORY_ENTRY_IMPORT       = 1  ''\\ Import Directory
   IMAGE_DIRECTORY_ENTRY_RESOURCE     = 2  ''\\ Resource Directory
   IMAGE_DIRECTORY_ENTRY_EXCEPTION    = 3  ''\\ Exception Directory
   IMAGE_DIRECTORY_ENTRY_SECURITY     = 4  ''\\ Security Directory
   IMAGE_DIRECTORY_ENTRY_BASERELOC    = 5  ''\\ Base Relocation Table
   IMAGE_DIRECTORY_ENTRY_DEBUG        = 6  ''\\ Debug Directory
   IMAGE_DIRECTORY_ENTRY_ARCHITECTURE = 7  ''\\ Architecture Specific Data
   IMAGE_DIRECTORY_ENTRY_TLS          = 9  ''\\ TLS Directory
   ''\\ Load Configuration Directory
   ''\\ Bound Import Directory in headers
   IMAGE_DIRECTORY_ENTRY_IAT          = 12 ''\\ Import Address Table
   ''\\ Delay Load Import Descriptors
End Enum
Note: If an executable does not contain one of the sections (as is often the case), there will be an IMAGE_DATA_DIRECTORY for it, but the address and size will both be zero.

The Image Data Directories

The exports directory

The exports directory holds details of the functions exported by this executable. For example, if you were to look in the exports directory of the MSVBVM50.dll it would list all the functions it exports, that make up the Visual Basic 5 runtime environment.

This directory consists of some info to tell you how many exported functions there are, followed by three parallel arrays that give you the address, name, and ordinal of the functions respectively. The structure is defined thus:

   Characteristics As Long
   TimeDateStamp As Long
   MajorVersion As Integer
   MinorVersion As Integer
   lpName As Long
   Base As Long
   NumberOfFunctions As Long
   NumberOfNames As Long
   lpAddressOfFunctions As Long     '\\ Three parallel arrays...(LONG)
   lpAddressOfNames As Long         '\\ (LONG)
   lpAddressOfNameOrdinals As Long  '\\ (INTEGER)
End Type

And you can read this info from the executable thus:

Private Sub ProcessExportTable(ExportDirectory As IMAGE_DATA_DIRECTORY)

Dim lBytesWritten As Long
Dim lpAddress As Long

Dim nFunction As Long

If ExportDirectory.VirtualAddress > 0 And ExportDirectory.Size > 0 Then
   '\\ Get the true address from the RVA
   lpAddress = AbsoluteAddress(ExportDirectory.VirtualAddress)
   '\\ Copy the image_export_directory structure...
   Call ReadProcessMemoryLong(DebugProcess.Handle, lpAddress, _
                   VarPtr(deThis), Len(deThis), lBytesWritten)
   With deThis
       If .lpName <> 0 Then
           image.Name = StringFromOutOfProcessPointer(DebugProcess.Handle,_
                 image.AbsoluteAddress(.lpName), 32, False)
       End If
       If .NumberOfFunctions > 0 Then
           For nFunction = 1 To .NumberOfFunctions
               lpAddress = LongFromOutOfprocessPointer_
                  (DebugProcess.Handle, _
                  + ((nFunction - 1) * 4))
               fExport.Name = StringFromOutOfProcessPointer_
                  (DebugProcess.Handle, _
                  image.AbsoluteAddress(lpAddress), 64, False)
               fExport.Ordinal = .Base + _
                  IntegerFromOutOfprocessPointer(DebugProcess.Handle, _
                  image.AbsoluteAddress(.lpAddressOfNameOrdinals) + _
                  ((nFunction - 1) * 2))
               fExport.ProcAddress = LongFromOutOfprocessPointer_
                  image.AbsoluteAddress(.lpAddressOfFunctions) + _
                  ((nFunction - 1) * 4))
           Next nFunction
       End If
   End With
End If

End Sub

The Imports Directory

The imports directory lists the dynamic link libraries that this executable depends on and which functions it imports from that dynamic link library. It consists of an array of IMAGE_IMPORT_DESCRIPTOR structures terminated by an instance of this structure where the lpName parameter is zero. The structure is defined as:

   lpImportByName As Long   ''\\ 0 for terminating null import descriptor
   TimeDateStamp As Long    ''\\ 0 if not bound,
                            ''\\ -1 if bound, and real date\time stamp
                            ''\\ in IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT
                            ''\\ (new BIND)
                            ''\\ O.W. date/time stamp of DLL bound to
                            ''\\ (Old BIND)
   ForwarderChain As Long  ''\\ -1 if no forwarders
   lpName As Long
   ''\\ RVA to IAT (if bound this IAT has actual addresses)
   lpFirstThunk As Long 
End Type

And you can walk the import directory thus:

Private Sub ProcessImportTable(ImportDirectory As IMAGE_DATA_DIRECTORY)

Dim lpAddress As Long
Dim byteswritten As Long
Dim sName As String
Dim lpNextName As Long
Dim lpNextThunk As Long

Dim lImportEntryIndex As Long

Dim nOrdinal As Integer
Dim lpFuncAddress As Long

'\\ If the image has an imports section...
If ImportDirectory.VirtualAddress > 0 And ImportDirectory.Size > 0 Then
   '\\ Get the true address from the RVA
   lpAddress = AbsoluteAddress(ImportDirectory.VirtualAddress)
   Call ReadProcessMemoryLong(DebugProcess.Handle, lpAddress, _
        VarPtr(diThis), Len(diThis), byteswritten)

   While diThis.lpName <> 0
       '\\ Process this import directory entry
       sName = StringFromOutOfProcessPointer(DebugProcess.Handle, _
            image.AbsoluteAddress(diThis.lpName), 32, False)

       '\\ Process the import file's functions list
       If diThis.lpImportByName <> 0 Then
           lpNextName = LongFromOutOfprocessPointer(DebugProcess.Handle,_
           lpNextThunk = LongFromOutOfprocessPointer(DebugProcess.Handle,_
           While (lpNextName <> 0) And (lpNextThunk <> 0)
               '\\ get the function address
               lpFuncAddress = LongFromOutOfprocessPointer_
                                 (DebugProcess.Handle, lpNextThunk)
               nOrdinal = IntegerFromOutOfprocessPointer_
                                  (DebugProcess.Handle, lpNextName)
               '\\ Skip the two-byte ordinal hint
               lpNextName = lpNextName + 2
               '\\ Get this function's name
               sName = StringFromOutOfProcessPointer(DebugProcess.Handle, _
                    image.AbsoluteAddress(lpNextName), 64, False)
               If Trim$(sName) <> "" Then
                   '\\ Get the next imported function...
                   lImportEntryIndex = lImportEntryIndex + 1

                   lpNextName = LongFromOutOfprocessPointer_
                      (DebugProcess.Handle, _
                      image.AbsoluteAddress(diThis.lpImportByName _
                      + (lImportEntryIndex * 4)))

                   lpNextThunk = LongFromOutOfprocessPointer_
                      + (lImportEntryIndex * 4)))
                   lpNextName = 0
               End If
       End If

       '\\ And get the next one
       lpAddress = lpAddress + Len(diThis)
       Call ReadProcessMemoryLong(DebugProcess.Handle, lpAddress, _
               VarPtr(diThis), Len(diThis), byteswritten)

End If

End Sub

The Resource Directory

The structure of the resource directory is somewhat more involved. It consists of a root directory (defined by the structure IMAGE_RESOURCE_DIRECTORY) immediately followed by a number of resource directory entries (defined by the structure IMAGE_RESOURCE_DIRECTORY_ENTRY). These are defined thus:

   Characteristics As Long    '\\Seems to be always zero?
   TimeDateStamp As Long
   MajorVersion As Integer
   MinorVersion As Integer
   NumberOfNamedEntries As Integer
   NumberOfIdEntries As Integer
End Type

   dwName As Long
   dwDataOffset As Long
   CodePage As Long
   Reserved As Long
End Type

Each resource directory entry can either point to the actual resource data or to another layer of resource directory entries. If the highest bit of dwDataOffset is set, this points to a directory. Otherwise, it points to the resource data.

How Is This Information Useful?

Once you know how an executable is put together, you can use this information to peer into its workings. You can view the resources compiled into it, the DLLs it depends on, and the actual functions it imports from them. More importantly, you can attach a debugger to the executable and track down any of those really troublesome general protection faults.

About the Author

Duncan Jones

Freelance developer with 10 years experience in Visual basic and SQL - now moving on up to the next generation with .NET


  • Very nice work!

    Posted by Tom Archer on 06/24/2004 11:35am

    Definitely worth a 5

  • Part #1 of 3 (or more)

    Posted by Clearcode on 06/23/2004 11:38am

    This is the first part of 3 - culminating (I hope) in a working application debugger written in VB6.0 I am having some difficulty in the last part - reading the symbols. If anyone has experience with the "dbgHelp.dll" functions in VB6.0, drop me a line...

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • With JRebel, developers get to see their code changes immediately, fine-tune their code with incremental changes, debug, explore and deploy their code with ease (both locally and remotely), and ultimately spend more time coding instead of waiting for the dreaded application redeploy to finish. Every time a developer tests a code change it takes minutes to build and deploy the application. JRebel keeps the app server running at all times, so testing is instantaneous and interactive.

  • The exponential growth of data, along with virtualization, is bringing a disruptive level of complexity to your IT infrastructure. Having multiple point solutions for data protection is not the answer, as it adds to the chaos and impedes on your ability to deliver consistent SLAs. Read this white paper to learn how a more holistic view of the infrastructure can help you to unify the data protection schemas by properly evaluating your business needs in order to gain a thorough understanding of the applications …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds