RISC-V Assembly Language

As described in the chapter on machine language, the assembler language is an intermediate format between a high level programming language and pure (binary) machine code. While it is close to the target ISA, it contains human-readable annotations and symbols (such as labels and register names). This section provides some basic information on how to write and read assembler programs with the RISC-V base ISA.

External Documentation

Here are some additional sources which go beyond this text or which provide useful tools:

A detailed documentation of each single instruction, including its format and functional specification, can be found here
An online encoder/decoder for RISC-V instruction set can be found here
The GNU assembler syntax including pseudo instructions and macros is described here

Basic Syntax

Assembler programs do not have any high level structure (such as classes, modules, functions, loops, ...). They contain basically a list of instructions, labels, and some additional macros.

Instructions

Each instruction starts with the mnemonic name, followed by the arguments, usually registers or literals (constant values):

add x1, x2, x3

Here, the destination is always in the first position, followed by the source registers and/or literals. The above instruction stores the sum of registers x2 and x3 in register x1.

Register Names

Most instructions use registers as data source and target. In the assembler notation, there are two alternative ways to denote a register, and they can be mixed within a single assembler program. The first one uses the register names x0, x1, ... x31. The second one uses register names referring to the application binary interface (ABI). The ABI is basically a convention between hardware and software on the usage of the registers. For example, x1 is used to store the return address of a subroutine, and is therefore called ra. Register x2 contains the stack pointer (named sp). Other registers are reserved for temporary results (t0, t1, ...) or function results and arguments (a0, a1, ...). The ABI introduces also the name zero for the constant zero register x0. A complete description of the register names can be found here.

Literals

Constant values can be written in either (signed) decimal or in hexadecimal notation:

addi t1, zero, -127
ori  t2, t1, 0xcafe

Labels

Some instructions take addresses as arguments, in particular memory and branch instructions. Since the exact addresses are only known after the translation and linking, it would be very complicated to use absolute (constant) addresses in a program. Furthermore, inserting an instruction will shift all subsequent addresses. Therefore, labels can be used as a placeholder for an address, and the assembler will take care of the translation for us:

main:
    addi t1, zero, -127
    j main

Besides global labels (as in the example above), we can use integer numbers as local labels. In order to refer to a local label, the number must be suffixed with the letter 'b' for a backward reference (the label precedes the instruction using it), or the letter 'f' for a forward reference (the label succeeds the instruction using it). The same numbers can be used several times in the same program, they will be resolved to the nearest matching label:

1:  j 2f  # jump forward to label 2
    nop
    nop
2:  j 1b  # jump backward to label 1
1:  nop   # another local label

Pseudo Instructions

The RISC-V ISA is very compact, and contains virtually no redundant instructions. This means that there is often a single way to express a given operation, and that sometimes we need to "abuse" instructions in order to obtain a certain functionality. As an example, there is no instruction for bitwise negation. Instead, we will use exclusive or with an all-one constant value. While this works perfectly, it does not well express the programmer's intent and leads to poorly readable code. This is why assembly languages introduce pseudo instructions. These are additional instructions, which will be mapped to one or several machine code instructions. We will give some examples in the following, for the GNU assembly language. For a complete list, please refer to the online documentation of the GNU assembler.

NOP

The nop instruction (no operation) does not do anything, it does not have any effect on the memory or the registers. This can be achieved by many different instructions. However, the RISC-V ISA specifies that nop should be realised by the instruction addi x0, x0, 0, i.e. adding zero to the constant zero register and storing the result in the constant zero register. Since x0 is a read-only register, any write access to it will be ignored.

Load Immediate

A very common operation is initialising a register with a constant value. Depending on the processor architecture (32 or 64 bit) and the constant value (byte, word, double word...) this can be realised by one or several instructions. For example, if the value fits into 12 bits, we can use the addi immediate instruction: addi x5, x0, 127 will load the value 127 into the register x5. If however the value is bigger than 12 bits, we need an additional instruction, lui (load upper immediate), combined with addi for the lower bits. To save the programmer the manual and error prone work of splitting up the constant into upper and lower parts, she can just use the li load immediate pseudo instruction, and the assembler will take care of the proper translation into one or several machine instructions.

As an example, the following assembler code

li x3, 0x1234cafe

will be translated into

lui x3, 0x1234d
addi x3, x3, -1282

Load Address

Another common operation is initialising a register with the address of some data or code. Since the real address is only known after assembly and linking, we can instead use a label and the la (load address) pseudo instruction. In contrast to the li pseudo instruction, the GNU assembler will usually translate la to the auipc (add upper immediate to pc) instruction (and potentially another addi instrucion). The somehow weird auipc instruction can be used to transfer the current value of the program counter into a register, allowing for PC-relative addressing, which is useful to achieve position-independent code.

As an example, the following assembler code

main:
    nop
    la a0, foo
    nop
    nop
    nop
foo:
    .byte 0xff

will be translated into

main:
    nop
    auipc a0, 0x0
    addi  a0, a0, 20
    nop
    nop
    nop
foo:
    .byte 0xff

This will load the address at offset 20 (i.e. five 32 bit instructions) after the auipc instruction into register a0, which corresponds to the label foo.

Move

Copying the value of one register to another can be achieved by the mv (move) pseudo instruction. It translates to the addi instruction. For example:

mv a1, a0    # copy a0 to a1

will be translated to

addi a1, a0, 0

Data

When your program is supposed to deal with (constant) data, it makes sense to store this data in a separate section in the binary file. As you will learn later, the compilation process takes care of this, creating different sections for the code and for read-only data, among others. When we write a program in assembler, we can mix code and data as we like, identifying data section with a label. In the GNU assembly language, there are directives to indicate that we want to store literal values at a certain place in the program. The type of the directive determines the size of the data items:

my_data:
    .byte  0xff, 0x87, 123, -6    # Four bytes
    .half  0xabcd, -1             # Two 16-bit half words
    .word  0xff00ff00             # One 32-bit word
    .dword 0xff00ff11ff22ff33     # One 64-bit double word

We can also store ASCII strings like this:

message: 
    .string "Hello world\n"

Using the GNU Assembler on the Command Line

Let's consider that we have a file foo.s with assembler code. In order to produce machine code, these are the necessary steps:

riscv64-unknown-elf-as -march=rv32i foo.s -o foo.o

The above command calls the GNU assembler with an option to choose the target architecture as the RISC-V 32 bit base ISA. This will already produce machine code. However, addresses might still be wrong in the target code, especially if you are using la pseudo instructions. In this case, the following command will get the addresses right and create the different sections in an executable file:

riscv64-unknown-elf-ld -Ttext 0 -melf32lriscv foo.o -o foo.x

In case there is not .text section defined in the source code, we set it to the address 0. We also specify the binary format. In order to see the (disassembled) result, you can inspect the binary file like this:

riscv64-unknown-elf-objdump foo.x -d -j .text

Using the Ripes Simulator

Ripes is a graphical RISC-V processor simulator, and it also includes an assembler. The tool is installed on the lab machines, just type into a terminal

Ripes

INF107