An Introduction to Assembly Language: Part II

Prerequisite

This article assumes that the reader has installed MASM32. If you have not, it is available from http://www.masm32.com/.

Introduction

In the last article, you saw how to set up Visual Studio to compile an Assembler file with the Microsoft Assembler. In this article, I will begin to describe the language itself, and some of the instructions that it contains.

Variables

There are no variables in Assembler—or at least not in the C++ sense. In Assembler you have registers and memory addresses. Bear in mind, you’re talking the same language as the processor now.

For instance, the processor doesn’t know that you have an integer called ‘nMyInteger’. It doesn’t know you have a class called “CMyClass”. All it knows about is the ‘registers’ and that it can access memory given an address.

So what are registers? And how do I access memory?

Registers

Put simply, a register is like a variable, but just for the processor’s use. When I said that there are no variables in the sense of higher level languages, I meant what I said. There are only a set number of registers that exist on the processor chip. Think of it this way: A register is a hard-coded variable for the processor; it exists physically on the chip.

These registers can represent numbers the same size as the ‘bit’ count of the processor. In other words, in a 32-bit processor these numbers are 32 bits in size. In C++ terms, they are DWORDs.

They are also unsigned. Negative numbers (if required) are represented by 0x100000000 + (negative number). So, -1 would be represented by 0xFFFFFFFF, -2 by 0xFFFFFFFE, and so forth.

There are a quite a few registers in modern Intel processors, but there are only six you should be using in your applications:

eax - Accumulator Register
ebx - Base Register
ecx - Counter Register
edx - Data Register
esi - Source (for memory operations) register
edi - Destination (for memory operations) register

The registers eax, ebx, ecx, and edx can be split into their constituent bytes by changing the way that they are referred to. For instance, for the accumulator (in other words, the a register):

al    : First (lower) byte of the low word in the eax register
ah    : Second (higher) byte of the low word in the eax register
ax    : Lower word (i.e. 2-bytes) of the eax register
      ; (i.e. (ah << 8) + al)
eax   : The whole register (4-bytes)

The same naming convention goes for ebx, ecx, and edx (not esi or edi) as shown in the following image.

The names come from the origins of the processor. The ‘e’ notation means ‘extended’ register; in other words, the 32-bit flavours of each of the registers when 16-bit processors gave way to 32-bit processors. 16-bit processors only had ax, bx, cx, dx, and so forth; so, when the 32-bit processors came along, the extra 16 bits available were denoted by a preceding ‘e’. There is no way of accessing the top 16 bits of the registers directly.

The Mov (Move) Instruction

Now, start with the simplest instruction: the mov (move) instruction. The mov instruction is how you ‘move’ values about inside of the processor. For instance:

mov eax, 100

This ‘moves’ 100 into the eax register. It’s the same as saying eax=100. To define the move instruction, think of it as this:

mov (destination), (source)

The source and destination have to be the same size (in bits). Here are some examples of ‘mov’ instructions:

mov al, bl         ; move the lower byte of ebx into the lower byte
                   ; of eax
mov al, 0ffh       ; move 0xFF into the lower byte of eax
mov ah, 0ffh       ; move 0xFF into the high byte of the low word
                   ; (2-bytes) of eax
mov ax, 0ffffh     ; move 0xFFFF into the low word of eax
mov eax, 0ffffh    ; move 0xFFFF into eax

We can move the contents of memory into a register and vice-versa by using square brackets to indicate ‘contents of’. The number of bytes moved is determined by the register name:

mov al, [esi]     ; move the byte contained in the memory address
                  ; in register esi into the lower byte of eax
mov [edi], bl     ; move the byte value in the lowest byte of ebx
                  ; into the memory address in register edi
mov cx, [esi]     ; move the word (2-byte) value contained in the
                  ; memory address of register esi into the lower
                  ; word of ecx
mov [edi], edx    ; move the dword (4-byte) value contained in edx
                  ; into the memory address contained in register edi

You also can include an offset when using the ‘contents of’ (square brackets) operator:

mov al, [esi + 3]    ; move the byte contained in the memory address
                     ; in register esi + 3 into the lower byte of eax
mov [edi + 2], dx    ; move the lower word (2-bytes) contained in
                     ; edx into the memory address contained in the
                     ; register edi + 2

More by Author

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read