Working with ARM Assembly

Don’t ask me why I started looking at writing basic ARM assembly routines. Perhaps it’s for the thrill of it, or taking a walk down memory lane. My first assembly language program was for an IBM System/360 using WYLBUR in college.

This post is not a tutorial on assembly language itself, or the ARM processor for that matter. If the phrases mnemonic, register, or branch on not equal are foreign to you, have a look here. I just wanted to write some easy routines and pick up some basics.

Editor’s note: All of the code below is available on GitHub.

We’ll be using a Raspberry Pi 4. You will (obviously) need the GCC toolchain installed, which can be accomplished with sudo apt-get install build-essential.

Let’s save the following in a file named helloworld.s:


Assemble and link the application together with gcc helloworld.s -o helloworld and run it.

It doesn’t get much more straightforward than this, and you’ve learned three new ARM assembly instructions: ldr, mov, and bl. The remaining text are directives to the GNU assembler which we’ll cover in a minute.

The ldr instruction loads some value from memory into a register. This is key with ARM: load instructions load from memory. In the example above we’re loading the address of the beginning of the string into register r0. A technical note: ldr is actually a psuedoinstruction, but let’s gloss over that.

bl branches to the label indicated (and updates the link register), and in our case, there is this magical printf we’re branching to. More on that later.

Finally, mov r0,#0 is positioning our program’s return code (zero) into r0. Check it:

What if we change the mov r0,#0 to mov r0,#0xff? Try it:

Okay, now for something interesting. Let’s count down from 10 to 1 and then print Hello, world!.


Okay, that escalated quickly! One of the reasons assembly language is so much fun. Let’s take a look at what is going on here and add some comments to our code.

There’s a few things to note here. First, let’s talk about the use of r5 and why that register was deliberately chosen. It turns out that when calling routines in assembly you better not use registers that will get trashed by whatever subroutine your calling (r0-r3). printf can use these registers, so we’ll use r5 in our routine.

Now, I will confess, I am not an assembly language expert much less an ARM assembly language expert. Someone may look at the above code and ask why I didn’t use the subtract-and-compare-to-zero instruction (if there is one) or some other technique. If there is a better way to write the above, please let me know!

Counting Up

In the above example we counted down, now let’s count up, and instead of counting from zero to some max, let’s count up from some minimum value to some maximum value. In other words, we’ll step through a sequence of values using an increment of one.


There’s some new syntax here, in particular the ldr rx,[rx]. This syntax is “load the value that is pointed to by the address in the register.” It makes sense in that there is an instruction immediately before it ldr,=min which is load the address identified by the label min. To be clear, the actual value of that label is going to be dependent on the assembler, your application size, and where it gets loaded into memory. Let’s look at an example of that:


Compile and execute this code to see something like Address of x is 0x21028. Then move x to after fmtstr and you will see the address change. What it will change to, again, is highly dependent on a number of factors. Suffice it to say, using ldr with memory addresses loads the address into a register, not the value at the address. That is what we use ldr rx,[rx] for.

Running our countup code indeed counts up from 14 to 28 and if we look at the return code (echo $?) we get 29, the last value that was in r5.

Writing a Procedure

Here is a basic ARM assembly procedure that computes and returns fib(n), the nth element of the Fibonacci sequence. We chose this specifically to demonstrate the use of the stack with push and pop.


What should be noticed here is the use of push and the list of register values we’re going to save onto the stack. In ARM assembly the Procedure Call Standard convention is to save registers r4-r8 if you’re going to work with them in your subroutine. In the above example we use r4 and r5 to compute fib(n) so we first push r4 and r5 along with the link register. Before returning we pop the previous values off the stack back into the registers.

To use this routine in C we can write:


Then, compile, assemble, and link with gcc fibmain.c fib.s -o fibmain. Recall the procedure call standard convention that the arguments to the procedure will be in r0-r3, hence why our first instruction mov r4,r0 to capture what n we’re calculating the Fibonacci number of.

Taking the Average

Okay, one last routine. We want to take the average of an array of integers. In C that would look something like this:

Here’s a go at it in ARM assembly.


There are some new instructions, an interesting form of the ldr instruction, and a new type of register.

First, the vmov instruction and register s1. vmov moves values into registers of the Vector Floating-Point Coprocessor, assuming your processor has one (if it is a Pi it will). s1 is one of the single-precision floating-point registers. Note that this is a 32-bit wide register that can store a C float.

Next up is the ldr r5,[r0],#4 instruction. Recall that ldr ra, loads the value stored at the address in rb into ra. The #4 at the end instructs the processor to then increment the value in rb by 4. In effect we are walking the array of integers whose starting address is in r0.

Finally, once we add all of the values in the array we have the sum in the register r4. To divide that sum by the length of the array (which was saved off in the floating-point register s1) we load r4 into s0 and perform one last thing: vcvt. vcvt converts between integers and floating-point numbers (which are, after all, an encoding). So s0 gets converted to a floating-point value, as does s1, and then we perform our division with vdiv.

As with r0 being the standard for returning an int from a procedure call, s0 will hold our float value.

We can use this function in our main routine.


Compile with gcc averagemain.c average.s -o averagemain and run.

Closing Thoughts

This post has been a lot of fun to write because assembly is actually fun to write and serves as a reminder that even the highest-level languages get compiled down to instructions that the underlying CPU can execute. One instruction set we didn’t touch on is the store instructions. These are used to save the contents (store) of registers to memory. Perhaps next time.

Once again, all of the code in this post can be found in the armassembly repository on GitHub.