MASM Assembly
Last updated
Last updated
The Microsoft Macro Assembler (MASM) provides several advantages over inline assembly. MASM contains macro features that include loops, arithmetic, and text string processing. MASM gives you greater control over your hardware such as CPU and memory.
MASM is generally used for programming firmware, developing operating systems, and programming at system level.
There are several assembly syntax types, the two most important are:
CPU Registers are small, high-speed storage locations within the CPU used to store data and addresses during the execution of instructions. Registers are the single place where mathematical functions (additions, multiplication, subtractions) can be carried out. Registers often hold pointers that refer to the memory.
CPU registers can mainly be classified into 4 different categories.
General Purpose Registers
Segment Registers
Special purpose application-accessible registers
Special Purpose Kernel-Mode Registers
In this page we will only go over general purpose registers since their commonly used by programmers.
Used to store temporary data. It's content can be accessed by assembly programming. Numbered: R0, R1, R2,...Rn-1.
In Windows x86 the general purpose registers look like this:
EAX
AX
AH / AL
EBX
BX
BH / BL
ECX
CX
CH / CL
EDX
DX
DH / DL
ESI
SI
SIL
EDI
DI
DIL
EBP
BP
BPL
ESP
SP
SPL
R8D
R8W
R8L
R9D
R9W
R9L
R10D
R10W
R10L
R11D
R11W
R11L
R12D
R12W
R12L
R13D
R13W
R13L
R14D
R14W
R14L
R15D
R15W
R15L
The following diagram shows the first two registers. EAX & EBX.
In Windows x64 the general purpose registers look like this:
RAX
EAX
AX
AH / AL
RBX
EBX
BX
BH / BL
RCX
ECX
CX
CH / CL
RDX
EDX
DX
DH / DL
RSI
ESI
SI
SIL
RDI
EDI
DI
DIL
RBP
EBP
BP
BPL
RSP
ESP
SP
SPL
R8
R8D
R8W
R8B
R9
R9D
R9W
R9B
R10
R10D
R10W
R10B
R11
R11D
R11W
R11B
R12
R12D
R12W
R12B
R13
R13D
R13W
R13B
R14
R14D
R14W
R14B
R15
R15D
R15W
R15B
The following diagram shows the first two registers. RAX & RBX.
**Therefore, modifying the value of RAX will change the value of EAX, and therefore the values of AX. **The hierachical relationship implies that changing the values of higher registers effects the value of lower registers, vice versa.
For example, modifying the value of BL
will impact the value of BX
, which will then influence the value of EBX
, and subsequently modify the value of RBX
.
During a function or procedure call (assembly functions are called procedures), certain registers automatically change value. These are called non-volatile registers.
Common Non-Volatile registers:
x64 Systems - RSP
, RSI
, RDI
, RBP
, RBX
, R12-15
.
x86 Systems - EBX
, EBP
, ESI
, EDI
, R12-R15D
.
On the other hand, volatile registers do not need to be saved across a function/procedure call:
x64 Systems - RCX
, RAX
, RDX
, R8-11
.
x86 Systems - ECX
, EAX
, EDX
, R8-11D
.
IMPORTANT: Whenever a value of a non-volatile register is changed by the routine (procedure), the old value has to be saved on the stack prior to changing the register and that value has to be restored before returning.
The RIP/EIP register is a special-purpose register that holds the memory address of the next instruction being executed. The processor automatically increments the RIP/EIP register after executing each instruction.
The RSP/ESP register is called the stack pointer register. It holds the memory address of the top of the stack. (The stack is a memory region that's used to store temporary data & function call information. The RSP/ESP keeps track of the stacks current location).
The general purpose registers rdi, rsi, rdx, rcx, r8, and r9 are typically used for parameter passing. These registers are known as "Arguments registers", they hold values that are passed to a function.
In the example above, the values 3 and 6 would be passed to the add function using registers. rdi might hold 3 and rsi might hold 6.
The RFLAGS (Register Flags) is a special-purpose register that contains several status and control flags that are used by the processor to control program execution.
64-bit machines the RLAG is 64 bits in size, 32-bit: 32 bits. The register comprises several single-bit values, where each bit corresponds to a single flag. A flag is set to 1 when activated and 0 when deactivated.
IMPORTANT: The majority of RFLAGS flags are reserved for kernel-mode functions, they are limited to general users.
The relevant flags are explained below:
Carry Flag (CF) -- This flag is set when an arithmetic operation generates a carry or borrow. It is also used in bitwise operations, where it indicates whether the result of the operation has a carry-out from the most significant bit.
Parity Flag (PF) - This flag is set when the least significant byte of the result of an arithmetic operation has an even number of set bits.
Zero Flag (ZF) - This flag is set when the result of an arithmetic operation is zero.
Sign Flag (SF) - This flag is set when the result of an arithmetic operation is negative.
Overflow Flag (OF) - This flag is set when an arithmetic operation generates a signed overflow, meaning that the result is too large to be represented in the available number of bits.
Here is a typical MASM program, the semicolon ;
denotes a comment.
Variables must be declared within the .data section of the program.
VarName is the variable name you want.
Here is a list of possible directives:
word
- Unsigned 16-bit value (word).
sword
- Signed 16-bit integer value.
dword
- Unsigned 32-bit value (double word).
sdword
- Signed 32-bit integer value.
qword
- Unsigned 64-bit value (quad word).
sqword
- Signed 64-bit integer value.
oword
- 128-bit value (octal word).
tbyte
- Unsigned 80-bit value.
real4
- 32-bit floating point value.
real8
- 64-bit floating point value.
real10
- 80-bit floating point value.
byte
- Unsigned 8-bit value.
sbyte
- Signed 8-bit integer value.
Declaring Value:
VarValue is our value:
Declaring value as Hexadecimal:
We can initialize a value with hexadecimal using the h
suffix.
Strings are declared using byte directive
The MASM assembler interprets the above string as an array of hexadecimal characters. We can incorprate a new line character as hexademical: 10
Since the byte
directive in MASM assumes that it is dealing with hexadecimal characters, it is unnecessary to include the h
suffix to represent the value of 10.
The following section goes over common Assembly instructions. A full list can be found:
The mov
instruction is the most frequently used instruction in assembly. As the name suggests, it is used to move data between registers or memory locations.
Both destination or source can be a general purpose register or memory variable. The mov instruction is limited to:
Only one of the source and destination operands can be a memory variable.
Both the source and destination operands must be of the same size. Mixing different operand sizes within a single mov
instruction will result in a compilation error.
Here is a list of all legal mov instructions
In assembly language, square brackets []
are utilized to indicate indirect memory access. It points to the source of the memory location. Similar to pointers in C.
add & sub insturctions adds and subtracts to operands. They share the same syntax.
Procedure calls are made with the call
instruction. The ret
instruction is then used to return execution back to the caller, which serves a similar purpose as C/C++.
The ret
instruction does not require any parameters / operands. It does not return any value, it's purpose is to indicate that the current function is finished executing. The address that is returned from ret
is determined by the value at the top of the stack.
Here is an example of ret
& call
instructions.
The Load Effective Address (lea) instruction returns the memory address of a location and load it into a register, without actually accessing the memory itself. It's essentially the &
address-of operator in C/C++.
Where reg64
(the destination operand) represents any 64-bit general-purpose register that will hold the address of the source memory location.
The logical operators and
, or
, xor
, and not
are all used to perform logical operations on bits.
The and
instruction performs a bitwise and operation between two operands and stores the result in the destination operand.
The or
instruction performs a bitwise or operation between two operands and stores the result in the destination operand.
The xor
instruction performs an exclusive OR operation between two operands and stores the result in the destination operand. One common use of the xor
instruction is to clear a register, which is achieved by XORing the register with itself. The syntax of the xor
instruction is as follows:
not
The not
instruction performs a bitwise not operation on the operand and stores the result in the destination operand. The syntax of the not
instruction is as follows:
The jmp
instruction, jumps to the destination operand. It can be a memory address, register, or a label. It's used for unconditional branching or jumping.
NOTE: In assembly language, a label is a name given to a specific location in the program's code, which is usually defined using a colon (:
) at the end of a name or identifier.
Example jmp
jz and jnz instructions are conditional jump instructions, which allow for conditional execution of code. These instructions work by checking a specified flag in the RFLAGS register.
jz
, which stands for "jump if zero", jumps if the zero flag is set (1), while jnz
("jump if not zero") executes the jump if the zero flag is clear (0). There are many other conditional jump instructions:
jc
Jump if Carry - Executes the branch if the Carry Flag is set (1).
jnc
Jump if Not Carry - Executes the branch if the Carry Flag is not set (0).
jo
Jump if Overflow - Executes the branch if the Overflow Flag is set (1).
jno
Jump if Not Overflow - Executes the branch if the Overflow Flag is not set (0).
js
Jump if Sign - Executes the branch if the Sign Flag is set (1).
jns
Jump if Not Sign - Executes the branch if the Sign Flag is not set (0).
je
Jump if Equal - Executes the branch if the Zero Flag is set (1).
jne
Jump if Not Equal - Executes the branch if the Zero Flag is not set (0).
ja
Jump if Above - Executes the branch if the left operand is greater than the right operand.
jae
Jump if Above or Equal - Executes the branch if the left operand is greater than or equal to the right operand.
jb
Jump if Below - Executes the branch if the left operand is less than the right operand.
jbe
Jump if Below or Equal - Executes the branch if the left operand is less than or equal to the right operand.
The cmp
instruction or "compare" is the most useful instruction to execute prior to a conditional jump instruction.
The cmp
instruction subtracts the second operand from the first operand and sets the condition code flags based on the result of the subtraction. NOTE: It does not store the difference back into the first (destination).
The following examples demonstrate how cmp
can set a flag's value based on the value of its operands.
If the first operand is greater than the second operand, the Carry flag is cleared and the Sign flag is set if the result is negative.
If the second operand is greater than the first operand, the Carry flag is set and the Sign flag is cleared.
If the two operands are equal, the Zero flag is set and the Carry and Sign flags are cleared.
cmp
instruction is usually used in conjuction with a jmp. Here's an example of dissembled C code:The following assembly code shows a je
instruction being found directly below a cmp
instruction.
The push and pop instructions are used to manipulate the stack.
push
takes a value from a register and pushed it onto the top of the stack.
pop takes the value at the top of the stack and pops it off, storing it in the destination register or memory location.
The leave
instruction is used to clean up or exit a subroutine or function.
When executed, it first moves the value of the base pointer register (RBP
) to the stack pointer register (RSP
). It then pops the value of the base pointer register from the stack, restoring it to its previous value.
Essentially, the leave
instruction performs the same task as the following instructions:
In assembly, memory access specifiers are used to determine the size and the type of data being accessed in memory. These specifiers act like type-casting in a programming language.
The most commonly used Memory Access Specifiers are:
A quadword pointer is used to access a 64-bit data value stored in memory. It is specified using the qword ptr
specifier. For instance, if you want to access a 64-bit integer value stored in a particular memory location, you can use the qword ptr
specifier with the mov
instruction. Here are two examples:
In the first example, the 64-bit integer value stored at the memory location pointed to by the rbx
register is accessed using the qword ptr
specifier with the mov
instruction. In the second example, the qword ptr
specifier is used with the mov
instruction to access the 64-bit integer value stored at an offset of 32h
bytes from the rsp
register.
A doubleword pointer is a memory addressing mode that specifies the size of 32-bit data in memory. It is used when manipulating data stored in memory, particularly 32-bit integer values. To access a 32-bit integer value stored at a specific memory location, the dword ptr
specifier should be used in the instruction, as shown in the following examples:
A byte pointer is used to indicate the size of 8-bit data in memory. To access a single byte of data stored at a specific memory location, the byte ptr specifier is used.
Calling functions in assembly can happen a couple ways:
1.) Calling the assembly function via call instruction with ret used to return the caller.
We can import an assemly function to a C file. The function prototype is defined with the extern
keyword. This informs the compiler that the function is already in another file, such as an .asm
file.
Example of calling assembly from C:
To do this, the assembly code must first declare the C function using the externdef directive. This tells the MASM assembler that the function is in another file.
Here the externdef
is the name of the function and the type
specifies the function type.
Example calling C code from assembly:
The first four parameters (if they exist) are passed through the registers RCX
, RDX
, R8
, and R9
.
NOTE: If a procedure requires more than four parameters, they are pushed onto the stack. These parameters are known as stack parameters, and the stack must be 16-byte aligned to accommodate them.
Example passing parameters:
Calling AsmFunc11Parms
from C is done below
When assembly returns a value it is stored in the RAX
register. Before executing the ret
instruction, the procedure saves the value inside the RAX
register. Allowing the function to return a value.
The following AddtwoNumbers
procedure, takes two parameters, to return their sum.
IMPORTANT: The first stack parameter (5th procedure parameter) is located at a specific offset from the rsp
register, depending on the function's . In a 64-bit MASM function, the fifth parameter is usually located at an offset of [rsp + 40].