Related Classes: EECS 370
LEGv8 is a subset of the ARMv8 ISA. It is simplified to make it easier to learn and understand in a classroom environment.
It has 32 registers (X0
-X31
) and 64 bits in each register.
Some registers have special uses: X31
is always 0 (also called XZR
)
LEG is byte addressable
ARM Instruction Set
- Arithmetic
- Add
- Subtract
- Data Transfer
- Loads and Stores—LDUR (load unscaled register), STUR, etc
- Logical
- AND, ORR, EOR, etc.
- Logical Shifts (LSL, LSR)
- Conditional Branch
- CBZ, CBNZ, B.cond
- Unconditional Branch (jumps)
- B, BR, BL
Arithmetic Instructions (R Instructions)
Format: 3 Operand Fields
- Destination Register is the FIRST ONE
ADD X3, X4, X7 // X3 = X4 + X7
Encoding given R[Rd] = R[Rn] + R[Rm]
(in the add case):
opcode
: 11 bitsRm
: 5 bitsshamt
: 6 bitsRn
: 5 bitsRd
: 5 bits
LEGv8 Logical Instructions
- Logical operations are bit-wise
- For immediate fields, the 12 bit constant is padded with zeros to the left
LEGv8 Shift Logical Instructions
Pseudo-Instructions
- Instructions that use a shorthand “mnemonic” that expands to pre-existing assembly instructions
- Examples:
MOV X12, X2 // copy X2 into X12
⇒ORR X12, XZR, X2
MOVZ X12, #23 // set X12 to be 23
⇒ORRI X12, XZR, #23
Memory Instructions
Unlike LC2K, which is word addressable, ARM (and most modern ISAs) is byte addressable. Recall that a word is four bytes.
Just like LC2K, ARM uses a base + displacement mode. However, in ARM, you can transfer different sizes (instead of the whole 64 bit value). This way, it lets us deal with non-64 bit values.
Loads
When we’re loading smaller elements from memory, there are two options on what to do with the other bits:
- Set them to zero, or zero-extend
- Sign-extend (extend based on the most significant bit) ARM has different instructions for each option so the processor knows what to do.
Desired amount of data to transfer | Operation | Unused bits in register |
---|---|---|
64-bits (double word, or whole register) | LDUR | N/A |
16-bits (half-word) into the lower bits of register | LDURH | Set to zero |
8-bits (byte) into lower bits of register | LDURB | Set to zero |
32-bits (word) into lower bits of register | LDURSW (load signed word) | Sign extend (0 or 1 based on the most significant bit) |
Stores
For stores, signedness doesn’t actually matter: you’re storing exactly what you said, there’s no extension necessary (or even relevant).
Sequencing Instructions (Control Flow Instructions)
Sequencing Instructions change the flow of instructions that are executed. This is achieved by modifying the program counter (PC).
Conditional Instructions
Type 1: Compare to Zero
The first type of Conditional Instruction is comparing to zero, and then branching. This includes CBZ
and CBNZ
.
Instruction | Example | What it does |
---|---|---|
CBZ | CBZ X1, 25 | If the value in the specified register equals zero, then adjust the program counter by that many lines (multiply by four to get number of bytes) |
CBNZ | CBNZ X1, 25 | If the value in the specified register does not equal zero, then adjust the PC… |
Type 2: Program Status Register (non-zero equality comparisons)
Sometimes, you want to do comparisons that are not just equal to zero. For example, a > b
, or a = b
where b
is a variable, etc. ARM allows us to this with with flags.
The flags used in ARM for this are NZVC
, or Negative, Zero, oVerflow, and Carry. If you run an instruction that also sets a flag, this can then be referenced in a subsequent branch instruction (reference ADDS
, SUBS
, anything that ends in S
).
For EECS 370, we’ll focus on N and Z.
While you can use arithmetic operations to set flags (which can save on lines of assembly), the most common way to set a flag is to use the CMP
pseudo-instruction. This updates all the flags to allow you to use the correct type of branching.
C to Assembly
When converting C code to LEG code, there are few important things that you need to keep in mind:
- At least for EECS 370, exams will include doing operations on values that a certain size, and then storing them into items of another size. Being careful about the sizes is important
Example Conversion 1
C Code:
struct {int a; unsigned char b, c; } y;
y.a = y.b + y.c; // NOTE: a is an integer (32 bits), while b and c are bytes (4 bits)
Assembly
/* Assume that a pointer to y is in X1 */
LDURB X2, [X1, #4] ; it starts 4 bytes after the start of the struct
LDURB X3, [X1, #5]
ADD X4, X2, X3
STURW X4, [X1, #0] ; STURW stores four bytes, or a single word
Memory Alignment
When dealing with a variables in C or C++, we often need to include padding between data members to make sure things are properly aligned (to help modern hardware access things quickly and efficiently).
This means that, even though a struct looks like it only is using, say, 7 bytes of data, it might under the hood actually be using 12 bytes because of alignment.
Modern ISAs enforce the following requirement: an -byte variable must start at an address such that . This is called the “Golden Rule of Alignment”
Memory Alignment Example
Say we have the following C Code:
char c;
short s;
int i;
The following memory layout would be created
0x1000 | 0x1001 | 0x1002 | 0x1003 | 0x1004 | 0x1005 | 0x1006 | 0x1007 |
---|---|---|---|---|---|---|---|
[c] | PADDING | [s] | [s] | [i] | [i] | [i] | [i] |
So this ends up being eight bytes long. If the compiler can not change the order of fields, and the short
and int
declarations were swapped, this same “structure” would take 10 bytes of memory instead.
Aligning Arrays
We treat arrays like they’re just multiple of the original type, and align the start of the array accordingly.
Aligning Structs
Structs start to break things. Specifically, in the case where you have an array of structs. Following the above procedure, we’d end up with different padding gaps between different structs, which makes it very difficult (if not impossible) to be able to navigate within the array.
This requires an addition to the golden rule: identify the largest primitive field, and ensure that the starting address of the overall struct is aligned based on that largest field (and add padding to the back so that the total size is a multiple of the largest primitive).
Example Conversion 2: Branching
C Code
int x, y; // assume x is in X1, y is in X2
if (x == y)
x++;
else
y++;
Assembly Code
CMP X1, X2
B.EQ if
else:
ADD X2, X2, #1
B end ; unconditionally branch to end
if:
ADD X1, X1, #1 ; the # symbol makes it so that you don't need to specify ADDI
end:
Also Valid Assembly Code
CMP X1, X2
B.NEQ else
if:
ADD X1, X1, #1
B end
else:
ADD X2, X2, #1
end:
Example Conversion 3: Loops
C Code
// assume all vraibles are long lon integers (8 bytes, 64 bits)
// i is in X1, start of a is at address 100, and sum is in X2
sum = 0;
for (i = 0; i < 10; i++) {
if (a[i] >= 0) {
sum += a[i];
}
}
Assembly Code:
MOVZ X2, #0 ; set sum to zero
MOVZ X1, #0
start:
CMPI X1, #10 ; compare i to 10
B.GE end
LSL X3, X1, #3 ; we need to increment by eight bytes
LDUR X3, [X3, #100] ; load the current value of a[i]
CMPI X3, #0
B.LT elseif ; if it's not geq 0, then continue
ADD X2, X2, X3 ; add value to sum
elseif:
ADD X1, X1, #1 ; increment i
B start ; start from start
end:
Function Calls
Sometimes, function calls can be very far away: more than the instructions that a typical CBZ
or CBNZ
branching instruction might take. This is where unconditional branching comes into play, to let us unconditionally go to far locations.
Since unconditional branches are, well, unconditional, no other registers need to be specified, so they allow us to specify an address up to 26 bits away, which is more than 64 million instructions. This should be plenty.
There are three types of unconditional branches in the LEGv8 ISA:
- Branch:
B
- Syntax:
B #OFFSET
→ go toPC + 4 * OFFSET
(byte addressable)
- Syntax:
- Branch to Register:
BR
- Syntax:
BR X30
(or any other register) → go to the address stored in that register
- Syntax:
- Branch with link:
BL
- This is commonly used in function calls
BL #OFFSET
→ Store PC + 4 intoX30
and go toPC + 4 * OFFSET
The Call Stack
In many situations, functions might be “passed in” data that is more than can fit in however many registers can be used to get data passed in. There needs to be a temporary place to store these values when they get passed to the function: the solution is the calls stack.
Think of the call stack as a stack of scratch paper: you can add more on top and use it and scribble things on top of it, but it is really all just temporary. When you’re done with a piece of paper, you can throw it out, shred it, and eliminate any trace that it was actually there.
Every time we call a function, a stack frame is allocated for that function, and then discarded when that function is complete.