Application Modernization: What Is It and How to Get Started
This article assumes that the reader has installed MASM32. If you have not, it is available from http://www.masm32.com/.
In the last article, you saw how to set up Visual Studio to compile an Assembler file with the Microsoft Assembler. In this article, I will begin to describe the language itself, and some of the instructions that it contains.
For instance, the processor doesn't know that you have an integer called 'nMyInteger'. It doesn't know you have a class called "CMyClass". All it knows about is the 'registers' and that it can access memory given an address.
So what are registers? And how do I access memory?
Put simply, a register is like a variable, but just for the processor's use. When I said that there are no variables in the sense of higher level languages, I meant what I said. There are only a set number of registers that exist on the processor chip. Think of it this way: A register is a hard-coded variable for the processor; it exists physically on the chip.
These registers can represent numbers the same size as the 'bit' count of the processor. In other words, in a 32-bit processor these numbers are 32 bits in size. In C++ terms, they are DWORDs.
They are also unsigned. Negative numbers (if required) are represented by 0x100000000 + (negative number). So, -1 would be represented by 0xFFFFFFFF, -2 by 0xFFFFFFFE, and so forth.
There are a quite a few registers in modern Intel processors, but there are only six you should be using in your applications:
eax - Accumulator Register ebx - Base Register ecx - Counter Register edx - Data Register esi - Source (for memory operations) register edi - Destination (for memory operations) register
The registers eax, ebx, ecx, and edx can be split into their constituent bytes by changing the way that they are referred to. For instance, for the accumulator (in other words, the a register):
al : First (lower) byte of the low word in the eax register ah : Second (higher) byte of the low word in the eax register ax : Lower word (i.e. 2-bytes) of the eax register ; (i.e. (ah << 8) + al) eax : The whole register (4-bytes)
The same naming convention goes for ebx, ecx, and edx (not esi or edi) as shown in the following image.
The names come from the origins of the processor. The 'e' notation means 'extended' register; in other words, the 32-bit flavours of each of the registers when 16-bit processors gave way to 32-bit processors. 16-bit processors only had ax, bx, cx, dx, and so forth; so, when the 32-bit processors came along, the extra 16 bits available were denoted by a preceding 'e'. There is no way of accessing the top 16 bits of the registers directly.
The Mov (Move) Instruction
Now, start with the simplest instruction: the mov (move) instruction. The mov instruction is how you 'move' values about inside of the processor. For instance:
mov eax, 100
This 'moves' 100 into the eax register. It's the same as saying eax=100. To define the move instruction, think of it as this:
mov (destination), (source)
The source and destination have to be the same size (in bits). Here are some examples of 'mov' instructions:
mov al, bl ; move the lower byte of ebx into the lower byte ; of eax mov al, 0ffh ; move 0xFF into the lower byte of eax mov ah, 0ffh ; move 0xFF into the high byte of the low word ; (2-bytes) of eax mov ax, 0ffffh ; move 0xFFFF into the low word of eax mov eax, 0ffffh ; move 0xFFFF into eax
We can move the contents of memory into a register and vice-versa by using square brackets to indicate 'contents of'. The number of bytes moved is determined by the register name:
mov al, [esi] ; move the byte contained in the memory address ; in register esi into the lower byte of eax mov [edi], bl ; move the byte value in the lowest byte of ebx ; into the memory address in register edi mov cx, [esi] ; move the word (2-byte) value contained in the ; memory address of register esi into the lower ; word of ecx mov [edi], edx ; move the dword (4-byte) value contained in edx ; into the memory address contained in register edi
You also can include an offset when using the 'contents of' (square brackets) operator:
mov al, [esi + 3] ; move the byte contained in the memory address ; in register esi + 3 into the lower byte of eax mov [edi + 2], dx ; move the lower word (2-bytes) contained in ; edx into the memory address contained in the ; register edi + 2