May 2018

A gentle introduction into ARM assembly


Computer architecture 101

There are two main types of microprocessor architectures i.e. Complex Instruction Set Computing (CISC) and Reduced Instruction Set Computing (RISC). The term instruction set by definition is a group of instructions that are given in order to execute a program and direct the computer on manipulating or processing it. CISC is the typical x86 computer with a complex set of instructions while RISC is used in portable devices and embedded systems due to it's power efficiency and the fact that instructions can be executed more quickly.

ARM is an assembly language that is used to build software that ships as a System On Chip (SoC) which is an integrated circuit which contains all the components of a computer system on a single chip. An example of ARM systems are mobile devices, routers, and various assortment of Internet of Things (IoT) devices.

The smallest unit of data in a computer is a binary digit or a bit. A bit can have a value of 0 or 1. This is because the microprocessor understands binary numbers which is either 0 or 1, which translates to either Off or On or 0V or 5V respectively. Each 0 and 1 represent a bit of information to the microprocessor. 8 bits make 1 byte, 4 bits make 1 nibble and 2 nibbles make 1 byte. To a computer, data is basically a combination of binary digits.

Users interact with computers by providing alphanumeric or special character inputs which the computer does not directly understand. It has to convert them into bits. A detailed overview of the number systems is out of scope of this blog post. In order to grasp how computers convert numbers, alphanumeric and special characters, it is important to understand binary, hexadecimal and natural numbers conversations from one system to another. Not to mention, how to perform mathematical operations on them such as adding, subtracting, multiplication and division is a compulsory requirement. It is advisable to go through video tutorials on YouTube to acquire some insight on the above.

When it comes to representing integers, the following table is a really good reference on how to convert between binary, hex and decimal number systems for the first 16 digits.

Word length 

ARM device ship with either a 32bit or 64 bit ARM CPU. A 32 bit CPU processes 4 bytes of instruction blocks at a go, which makes up a word.  A 64 bit CPU on the other hand processes 8 bytes of  instruction blocks, at a go. In this case, the 8 bytes make up a word.  Remembering that 8 bits make 1 byte, it's easy to see how the bits that each CPU can  process, adds up to 32 and 64 respectively. 

32 bit CPU:
 0x01234567 ==> 0x67 = 1 byte (8 bits)
 0x01234567 ==> 0x01234567 = 4 bytes (32 bits)

64 bit CPU:
 0x0123456789ABCDEF ==> 0xEF = 1 byte (8 bits)
 0x0123456789ABCDEF ==>  0x0123456789ABCDEF  = 8 bytes (64 bits)

It's worth mentioning that the most significant bits are to the left and the least-significant bits are to the right. The most significant bit of a word for the ARM CPU, is located at bit 31, hence a carry is generated in the event an overflow occurs there. 

32 bit CPU:
 0x01234567 ==> 0x67 = least significant bits
 0x01234567 ==> 0x01 = most significant bits 

64 bit CPU:
 0x0123456789ABCDEF ==> 0xEF = least significant bits
 0x0123456789ABCDEF ==>  0x01 = most significant bits 

The ARM processor sees memory in word blocks i.e. every 4 bytes. The memory address at the start of a word is known as the word boundary and it is divisible by 4. Here is a quick illustration:

32 bit CPU:
0x00000000 ==> word boundary 
0x00000004 ==> word 1
0x00000008 ==> word 2
0x0000000C ==> word 2
0x0000000F ==> word 4

64 bit CPU:
0x0000000000000000 ==> word boundary 
0x0000000000000008 ==> word1
0x0000000000000010 ==> word2
0x0000000000000018 ==> word3
0x0000000000000020 ==> word4

The number of bytes in a word, vary from architecture to architecture. Some examples of ARM architectures are ARM v4, ARM v5, ARM v6, ARM v7-A. ARM v7-R and ARM v7-M. So it is always a good idea to keep in mind the architecture you are dealing with because instructions could vary in length. Understanding how the processor fetches and executes instructions is a very important concept in reverse engineering. 

ARM CPU instruction sets

ARM CPU Processors operate in 3 states i.e. ARM, thumb or thumb-2. ARM instructions are 32 bits long, thumb instructions are 16 bits long, while thumb-2 instructions on the other hand are 32 bits long. Thumb is a subset of ARM, hence not all the instructions in ARM are available in thumb. Thumb-2 is a super set of thumb with more instructions. Thumb-2 in essence, is a combination of 16 bit and 32 bit instructions. 

Modern processors can execute operations using either 16 bit thumb, 32 bit thumb-2 or 32bit ARM instructions. Processors can switch between ARM and thumb states, in order to optimize execution and processing especially so for embedded devices. A section of code can be compiled as either ARM or thumb. When compiling code in C or C++ you can set compiler options to assist in the use of these states. 

ARM Registers

A Register is the smallest and most fundamental storage area on a chip. The ARM processor provides 13 general purpose registers from R0-R12. Each register can store 32 bits of data. There are also 3 special purpose registers i.e. R13-R15 and the CPSR register. It's worth mentioning that the first four arguments of a function are stored in registers R0-R3 as per the specifications of the ARM function calling convention.

A majority of ARM chips work as a load and store machine. This is where a register is loaded with contents of another register or memory location or a register stores contents of another register or memory location.

LDR instructions load a register with a value from memory while the STR instructions store a register value into memory.

ldr , r3, [r5] @ load r3 with the contents of the memory address in r5
str, r3, [r5]  @ store value of r3 into memory address in r5

Azeria made some really awesome register images shown below, especially the ARM vs x86 register comparison.You should definitely check out her site here for some great tutorials, challenges and download her pre-configured ARM lab. 





The Frame Pointer (R11) is used to track the current stack frame. This register points to local variables stored on the stack frame. It's also used as a base pointer to local variables on the stack.

The Intra Procedural Call Register (R12) may be used by a linker as a scratch register between a routine and any subroutine it calls. It can also be used within a routine to hold temporary values between subroutine calls.

The Stack Pointer (R13) holds the address of the stack in memory. It is commonly referred to the top of the stack. The stack grows downwards and increases by 4 bytes.The stack uses the Last In First Out (LIFO)  principle. When we push a value to the stack, it goes into the stack pointer and when it's popped off the stack, it removes the value from the stack and into a register of your choice. When the instruction pop{ r7} is executed, what happens is that you are taking the data on the stack and storing it into the r7 register. 


The Link Register (R14) is used to hold the return address for subroutines/function calls. Some instructions such as Branch with Link (BL), copies the next instruction to be executed from where the function was branching from, into the link register before performing the branch. Branch means moving to specific location of a program.  This allows the program to copy the link register back into the program counter.

This is how subroutines/functions are called and how they return and resume execution at the next instruction after the one that was called. Given that program execution is from top to bottom, line by line. The link register does not need to write to the stack, hence can save a lot of execution time with repeated calls to subroutine/functions. 

The Program Counter (R15) that is used to store the currently executing instruction. It is also used by the C library when calling functions in dynamically linked libraries. Their content may change at random e.g. when a function such as printf is called. In x86 the program counter stores the address of the next instruction to be executed.

In the event of a branch instruction, it contains the address the next instruction to be executed which is the address of the destination it is branching to. This register should be treated with care else an entire program can crash if manipulated incorrectly. When hacking or reverse engineering, inspecting the program counter is essential, as it can help you understand how functions are being executed and how the program flows. Being able to control it, can help you make a program run your own code. Hint! Hint! :D

The Current Program State Register (CPSR) stores information about the program and the results of the operation. This register has 32 bits and the first 4 most-significant bits are the condition flags. The remaining bits are used by the operating system. The CPSR also contains the current mode (USR/SVC) and state (ARM/Thumb) as illustrated in the image below. 


Whenever an instruction is completed, the CPSR is updated accordingly. If any of the above conditions occurs, the respective bit is set to 1. 


Some of the instructions that directly affect the CPSR are Compare (CMP) and Compare negative (CMN). CMP and CMN compare register values and update the condition flags on the result but do not place the result in any register. Below is in an example illustrates this:
CMP r1, r0 ==> translates to subtract r0 from r1 (r1-r0) and if the result is zero, the 30th bit is set to 1
CMN r1, r0 ==> translates to add r1 to r0 and if the result is negative, the 31st bit will be set to 1

It is worth mentioning that add (ADD) and subtract (SUB) operations do not update the CPSR but ADDS and SUBS update the respective CPSR flags . On the other hand BEQ which stands for Branch if Equal to Zero, means that if the zero flag was set, it branches to another function within the code.

Disclaimer: A majority of the images used in the blog are the property of Azeria and Azeria Labs. I am not affiliated to Azeria Labs in any way. I have used them in the blog as a point of reference. However, i do use her blog content to learn ARM assembly.

References: