RISC-V Assembly Language
As described in the chapter on machine language, the assembler language is an intermediate format between a high level programming language and pure (binary) machine code. While it is close to the target ISA, it contains human-readable annotations and symbols (such as labels and register names). This section provides some basic information on how to write and read assembler programs with the RISC-V base ISA.
External Documentation
Here are some additional sources which go beyond this text or which provide useful tools:
- A detailed documentation of each single instruction, including its format and functional specification, can be found here
- An online encoder/decoder for RISC-V instruction set can be found here
- The GNU assembler syntax including pseudo instructions and macros is described here
Basic Syntax
Assembler programs do not have any high level structure (such as classes, modules, functions, loops, ...). They contain basically a list of instructions, labels, and some additional macros.
Instructions
Each instruction starts with the mnemonic name, followed by the arguments, usually registers or literals (constant values):
add x1, x2, x3
Here, the destination is always in the first position, followed by the source
registers and/or literals. The above instruction stores the sum of registers
x2
and x3
in register x1
.
Register Names
Most instructions use registers as data source and target. In the assembler
notation, there are two alternative ways to denote a register, and they can be
mixed within a single assembler program. The first one uses the register names
x0
, x1
, ... x31
. The second one uses register names referring to the
application binary interface (ABI). The ABI is basically a convention between
hardware and software on the usage of the registers. For example, x1
is used
to store the return address of a subroutine, and is therefore called ra
.
Register x2
contains the stack pointer (named sp
). Other registers are reserved
for temporary results (t0
, t1
, ...) or function results and arguments (a0
,
a1
, ...). The ABI introduces also the name zero
for the constant zero
register x0
. A complete description of the register names can be found
here.
Literals
Constant values can be written in either (signed) decimal or in hexadecimal notation:
addi t1, zero, -127
ori t2, t1, 0xcafe
Labels
Some instructions take addresses as arguments, in particular memory and branch instructions. Since the exact addresses are only known after the translation and linking, it would be very complicated to use absolute (constant) addresses in a program. Furthermore, inserting an instruction will shift all subsequent addresses. Therefore, labels can be used as a placeholder for an address, and the assembler will take care of the translation for us:
main:
addi t1, zero, -127
j main
Besides global labels (as in the example above), we can use integer numbers as local labels. In order to refer to a local label, the number must be suffixed with the letter 'b' for a backward reference (the label precedes the instruction using it), or the letter 'f' for a forward reference (the label succeeds the instruction using it). The same numbers can be used several times in the same program, they will be resolved to the nearest matching label:
1: j 2f # jump forward to label 2
nop
nop
2: j 1b # jump backward to label 1
1: nop # another local label
Pseudo Instructions
The RISC-V ISA is very compact, and contains virtually no redundant instructions. This means that there is often a single way to express a given operation, and that sometimes we need to "abuse" instructions in order to obtain a certain functionality. As an example, there is no instruction for bitwise negation. Instead, we will use exclusive or with an all-one constant value. While this works perfectly, it does not well express the programmer's intent and leads to poorly readable code. This is why assembly languages introduce pseudo instructions. These are additional instructions, which will be mapped to one or several machine code instructions. We will give some examples in the following, for the GNU assembly language. For a complete list, please refer to the online documentation of the GNU assembler.
NOP
The nop
instruction (no operation) does not do anything, it does not have any
effect on the memory or the registers. This can be achieved by many different
instructions. However, the RISC-V ISA specifies that nop
should be realised by
the instruction addi x0, x0, 0
, i.e. adding zero to the constant zero register
and storing the result in the constant zero register. Since x0
is a read-only
register, any write access to it will be ignored.
Load Immediate
A very common operation is initialising a register with a constant value.
Depending on the processor architecture (32 or 64 bit) and the constant value
(byte, word, double word...) this can be realised by one or several
instructions. For example, if the value fits into 12 bits, we can use the addi
immediate instruction: addi x5, x0, 127
will load the value 127 into the
register x5
. If however the value is bigger than 12 bits, we need an
additional instruction, lui
(load upper immediate), combined with addi
for
the lower bits. To save the programmer the manual and error prone work of
splitting up the constant into upper and lower parts, she can just use the li
load immediate pseudo instruction, and the assembler will take care of the
proper translation into one or several machine instructions.
As an example, the following assembler code
li x3, 0x1234cafe
will be translated into
lui x3, 0x1234d
addi x3, x3, -1282
Load Address
Another common operation is initialising a register with the address of some
data or code. Since the real address is only known after assembly and linking,
we can instead use a label and the la
(load address) pseudo instruction.
In contrast to the li
pseudo instruction, the GNU assembler will usually
translate la
to the auipc
(add upper immediate to pc)
instruction (and potentially another addi
instrucion). The somehow weird
auipc
instruction can be used to transfer the current value of the program
counter into a register, allowing for PC-relative addressing, which is useful to
achieve position-independent code.
As an example, the following assembler code
main:
nop
la a0, foo
nop
nop
nop
foo:
.byte 0xff
will be translated into
main:
nop
auipc a0, 0x0
addi a0, a0, 20
nop
nop
nop
foo:
.byte 0xff
This will load the address at offset 20 (i.e. five 32 bit instructions) after
the auipc
instruction into register a0
, which corresponds to the label
foo
.
Move
Copying the value of one register to another can be achieved by the mv
(move) pseudo instruction. It translates to the addi
instruction. For example:
mv a1, a0 # copy a0 to a1
will be translated to
addi a1, a0, 0
Data
When your program is supposed to deal with (constant) data, it makes sense to store this data in a separate section in the binary file. As you will learn later, the compilation process takes care of this, creating different sections for the code and for read-only data, among others. When we write a program in assembler, we can mix code and data as we like, identifying data section with a label. In the GNU assembly language, there are directives to indicate that we want to store literal values at a certain place in the program. The type of the directive determines the size of the data items:
my_data:
.byte 0xff, 0x87, 123, -6 # Four bytes
.half 0xabcd, -1 # Two 16-bit half words
.word 0xff00ff00 # One 32-bit word
.dword 0xff00ff11ff22ff33 # One 64-bit double word
We can also store ASCII strings like this:
message:
.string "Hello world\n"
Using the GNU Assembler on the Command Line
Let's consider that we have a file foo.s
with assembler code. In order to
produce machine code, these are the necessary steps:
riscv64-unknown-elf-as -march=rv32i foo.s -o foo.o
The above command calls the GNU assembler with an option to choose the target
architecture as the RISC-V 32 bit base ISA. This will already produce machine
code. However, addresses might still be wrong in the target code, especially if
you are using la
pseudo instructions. In this case, the following command will
get the addresses right and create the different sections in an executable file:
riscv64-unknown-elf-ld -Ttext 0 -melf32lriscv foo.o -o foo.x
In case there is not .text
section defined in the source code, we set it to
the address 0. We also specify the binary format. In order to see the
(disassembled) result, you can inspect the binary file like this:
riscv64-unknown-elf-objdump foo.x -d -j .text
Using the Ripes Simulator
Ripes is a graphical RISC-V processor simulator, and it also includes an assembler. The tool is installed on the lab machines, just type into a terminal
Ripes