Software Development Tips and Tricks


Working with ARM Assembly

Don’t ask me why I started looking at writing basic ARM assembly routines. Perhaps it’s for the thrill of it, or taking a walk down memory lane. My first assembly language program was for an IBM System/360 using WYLBUR in college.

This post is not a tutorial on assembly language itself, or the ARM processor for that matter. If the phrases mnemonic, register, or branch on not equal are foreign to you, have a look here. I just wanted to write some easy routines and pick up some basics.

Editor’s note: All of the code below is available on GitHub.

We’ll be using a Raspberry Pi 4. You will (obviously) need the GCC toolchain installed, which can be accomplished with sudo apt-get install build-essential.

Let’s save the following in a file named helloworld.s:


Assemble and link the application together with gcc helloworld.s -o helloworld and run it.

It doesn’t get much more straightforward than this, and you’ve learned three new ARM assembly instructions: ldr, mov, and bl. The remaining text are directives to the GNU assembler which we’ll cover in a minute.

The ldr instruction loads some value from memory into a register. This is key with ARM: load instructions load from memory. In the example above we’re loading the address of the beginning of the string into register r0. A technical note: ldr is actually a psuedoinstruction, but let’s gloss over that.

bl branches to the label indicated (and updates the link register), and in our case, there is this magical printf we’re branching to. More on that later.

Finally, mov r0,#0 is positioning our program’s return code (zero) into r0. Check it:

What if we change the mov r0,#0 to mov r0,#0xff? Try it:

Okay, now for something interesting. Let’s count down from 10 to 1 and then print Hello, world!.


Okay, that escalated quickly! One of the reasons assembly language is so much fun. Let’s take a look at what is going on here and add some comments to our code.

There’s a few things to note here. First, let’s talk about the use of r5 and why that register was deliberately chosen. It turns out that when calling routines in assembly you better not use registers that will get trashed by whatever subroutine your calling (r0-r3). printf can use these registers, so we’ll use r5 in our routine.

Now, I will confess, I am not an assembly language expert much less an ARM assembly language expert. Someone may look at the above code and ask why I didn’t use the subtract-and-compare-to-zero instruction (if there is one) or some other technique. If there is a better way to write the above, please let me know!

Counting Up

In the above example we counted down, now let’s count up, and instead of counting from zero to some max, let’s count up from some minimum value to some maximum value. In other words, we’ll step through a sequence of values using an increment of one.


There’s some new syntax here, in particular the ldr rx,[rx]. This syntax is “load the value that is pointed to by the address in the register.” It makes sense in that there is an instruction immediately before it ldr,=min which is load the address identified by the label min. To be clear, the actual value of that label is going to be dependent on the assembler, your application size, and where it gets loaded into memory. Let’s look at an example of that:


Compile and execute this code to see something like Address of x is 0x21028. Then move x to after fmtstr and you will see the address change. What it will change to, again, is highly dependent on a number of factors. Suffice it to say, using ldr with memory addresses loads the address into a register, not the value at the address. That is what we use ldr rx,[rx] for.

Running our countup code indeed counts up from 14 to 28 and if we look at the return code (echo $?) we get 29, the last value that was in r5.

Writing a Procedure

Here is a basic ARM assembly procedure that computes and returns fib(n), the nth element of the Fibonacci sequence. We chose this specifically to demonstrate the use of the stack with push and pop.


What should be noticed here is the use of push and the list of register values we’re going to save onto the stack. In ARM assembly the Procedure Call Standard convention is to save registers r4-r8 if you’re going to work with them in your subroutine. In the above example we use r4 and r5 to compute fib(n) so we first push r4 and r5 along with the link register. Before returning we pop the previous values off the stack back into the registers.

To use this routine in C we can write:


Then, compile, assemble, and link with gcc fibmain.c fib.s -o fibmain. Recall the procedure call standard convention that the arguments to the procedure will be in r0-r3, hence why our first instruction mov r4,r0 to capture what n we’re calculating the Fibonacci number of.

Taking the Average

Okay, one last routine. We want to take the average of an array of integers. In C that would look something like this:

Here’s a go at it in ARM assembly.


There are some new instructions, an interesting form of the ldr instruction, and a new type of register.

First, the vmov instruction and register s1. vmov moves values into registers of the Vector Floating-Point Coprocessor, assuming your processor has one (if it is a Pi it will). s1 is one of the single-precision floating-point registers. Note that this is a 32-bit wide register that can store a C float.

Next up is the ldr r5,[r0],#4 instruction. Recall that ldr ra, loads the value stored at the address in rb into ra. The #4 at the end instructs the processor to then increment the value in rb by 4. In effect we are walking the array of integers whose starting address is in r0.

Finally, once we add all of the values in the array we have the sum in the register r4. To divide that sum by the length of the array (which was saved off in the floating-point register s1) we load r4 into s0 and perform one last thing: vcvt. vcvt converts between integers and floating-point numbers (which are, after all, an encoding). So s0 gets converted to a floating-point value, as does s1, and then we perform our division with vdiv.

As with r0 being the standard for returning an int from a procedure call, s0 will hold our float value.

We can use this function in our main routine.


Compile with gcc averagemain.c average.s -o averagemain and run.

Closing Thoughts

This post has been a lot of fun to write because assembly is actually fun to write and serves as a reminder that even the highest-level languages get compiled down to instructions that the underlying CPU can execute. One instruction set we didn’t touch on is the store instructions. These are used to save the contents (store) of registers to memory. Perhaps next time.

Once again, all of the code in this post can be found in the armassembly repository on GitHub.


An ARM Build Farm with Jenkins

There was a time when setting up a Continuous Integration server took a lot of work. I personally have spent several days wrestling with getting CruiseControl installed, configured, and working just right. Today it is much more straightforward and, for the most part, a simple apt-get install jenkins is all it takes to get a functional Jenkins CI server up and running.

In this tutorial we’re going to look at using Jenkins to set up a build farm of ARM systems. My personal interest in doing so is to support the Swift on ARM group of folks working to get Swift 3.0 support for ARMv7 devices such as the Raspberry Pi 2, BeagleBone Black, etc. While support for cross-compiling Swift is maturing there is still a desire to natively compile on an ARM system.

The Jenkins Build Server

To make things simple we’re going to focus on installing Jenkins on an Ubuntu 14.04 server. Using the instructions on the Jenkins Wiki for an Ubuntu 14.04 system:

The Jenkins daemon will start automatically upon installation. Once it does, open a browser and go to http://YOUR_HOST:8080/, where YOUR_HOST is the server you just installed Jenkins on.



The initial password can be found on the Jenkins server with sudo cat /var/lib/jenkins/secrets/initialAdminPassword, and will be something like “54d5f68be4554b5c8316689728721b37”. Paste it in and click Continue.

You’ll be prompted to choose whether or not to install suggested plugins or specifically select plugins. Go with Install suggested plugins for now. The next screen will be all of the plugins being loaded in. Once complete you’ll move on to creating the first admin user:

Create an Admin User

Create an Admin User

Once entering the admin user information, you’ll see Jenkins is ready!



Click that Start using Jenkins button and (wait for it), start using Jenkins.

Build Agents

We’re interested in using Jenkins to perform native builds on ARM systems. To that end we’ll add slave nodes to our build server.

First, let’s make sure we have an ARM system to compile on. I’ve become a big fan of using Scaleway for spinning up ARM systems. If you have your own ARM device such as BeagleBoard or Raspberry Pi, those will work as well, just be aware that compiling on single-core ARM devices with limited RAM can be, well, painful. With Scaleway you can spin up quad-core ARM systems with 2G of RAM for about $3.50 a month.

I’m going to assume you have an ARM system (either from Scaleway or a physical device that your Jenkins host can communicate with). Let’s look at what it takes to get things configured. There’s a bit of a dance to do to ensure that the master and slave can communicate with each other. In short we need to:

  • Install Java on the ARM system so the slave agent can run
  • Create a build user on the ARM system
  • Create a public/private key pair on the build server
  • Add the build server’s public key to the ARM build user’s authorized_keys file

1. Install Java and create a build user

Your ARM device is going to need Java to run the Jenkins slave agent, so on the ARM system run sudo apt-get install openjdk-7-jre-headless.

Let’s also set up the build user while we’re here with sudo adduser build. Follow the prompts to create the user appropriately.

2. Create a public/private key pair on the build server

On the build server, su to the jenkins user and cd to its home directory. Run ssh-keygen to create the key pair.

3. Add the public key to the authorized_keys file

Now, add the ~/.ssh/id_rsa.pub contents of the Jenkins server jenkins user to the /home/build/.ssh/authorized_keys file of the ARM system build user.

Adding the Build Slave Node

Now, let’s add the ARM device as a build slave. Under Manage Jenkins go to the Manage Nodes link.

Manage Nodes

Manage Nodes

Click on New Node and we’ll give it a name like arm1. Select the Permanent Agent radio button and then click OK.

You’ll move on to a screen that requires some more information (typical!) We’re interested in:

  • Remote root directory
  • Labels
  • Launch method

For Remote root directory we’ll use the home directory of our new build user, so /home/build. For the label, type in arm. It will become apparent as to why in a bit.

For Launch method choose Launch slave agents on Unix machines via SSH. In the host field enter your ARM system’s hostname or IP address that is accessible from your Jenkins build server.

Launch method

Launch method

Now, click on the Add button (the one with the key) to go to the Add Credentials configuration page. For the Kind choose SSH Username with private key. For the Username we’ll use build. For the Private Key select From the Jenkins master ~/.ssh/. Recall earlier when we had you create a public/private key pair for the jenkins user on the master? This is why. When trying to contact the slave agent the Jenkins system will load the private key from the ~/.ssh/ directory of the jenkins user. The slave agent will be ready with the public key in the authorized_keys file.

Adding Credentials

Adding Credentials

Click Add to add the credentials, and then Save to save the new node configuration.

You should be taken back to the Node Management page where you’ll see your new node (probably in an offline state):

arm1 Offline

arm1 Offline

You can either click Refresh status or go to the node’s logs by clicking on the nodename arm1 and then selecting Log. If everything was configured properly you’ll see:

[04/27/16 17:01:42] [SSH] Starting slave process: cd "/home/build" && java  -jar slave.jar
<===[JENKINS REMOTING CAPACITY]===>channel started
Slave.jar version: 2.56
This is a Unix agent
Evacuated stdout
Agent successfully connected and online

Now let’s create a new job and use our ARM agent to build opencv.

Building OpenCV

We know we’re going to be using git to check out the OpenCV source tree from Github, and there are a number of other dependencies required for building, so we need to install all of them on our ARM system:

sudo apt-get install build-essential cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev python-dev python-numpy libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-dev

OpenCV is traditionally built with the following commands:

These will be our basic build steps, with the exception we’ll use make -j4 to take advantage of our 4 core ARM build agent.

Creating the Build Project

To create a new build project, click on Create New Item in the main Jenkins menu, and choose Freestyle project. We’ll name our project OpenCV 3.0. Click OK at the bottom of the options.

Creating a new Freestyle Project

Creating a new Freestyle Project

You’ll be on the General tab to get started. For now we’re interested in the general project options, Source Code Management section, and Build section.

Since we want to build the OpenCV project on the ARM system we are going to restrict the project to only build on those nodes matching our arm label.

Restrict Project

Restrict Project

For OpenCV 3.0 we’ll choose Git as the SCM type, and then enter the URL to the OpenCV source, https://github.com/itseez/opencv/. Jenkins allows you to specify the branch to build on, and we’ll leave the default master.

GIT for OpenCV

GIT for OpenCV

For the actual build we will enter a single Execute shell build step. It should be noted that this is an example only. There are a number of ways to configure and script out Jenkins project; each project will have different build steps to fit the needs of the underlying task at hand.

Configure and Build OpenCV

Configure and Build OpenCV

Save your project, and then click Build Now! In the Build History pane you’ll see a flashing blue dot (hopefully). Click on the dot to go to the build page where you can select to look at the console output of the build. If everything was set up correctly you should now see OpenCV building away on your ARM build agent!

Building on arm1

Building on arm1

OpenCV 3.0 Build Log

OpenCV 3.0 Build Log

In our example OpenCV took 42 minutes to build on a quad-core ARM system from Scaleway. Not bad.

Final Thoughts

This was clearly not a comprehensive Jenkins tutorial. Folks familiar with the craft know that setting up and configuring continuous integration servers and build projects to produce traceable artificacts is a disclipline unto itself. However, it should be clear that creating a compile farm consisting of slave build agents does not have to be complicated.