Stack Based Virtual Machines - 5
This post is part of the Stack Based Virtual Machines series.
In the previous part we have added many instructions to our virtual machine, such as jumps and local variables. It’s now time to discuss how we can use these instructions to implement an “if” statement, or a “while” loop.
The complete unit tests implementing the code samples below can be on github.
Implementing an IF Link to heading
Let start by implementing an if
statement in our bytecode. We want to implement this snippet:
if (a > b) {
c = a;
} else {
c = b;
}
We’re going to use the local variables 0
for a
, 1
for b
and 2
for c
. Since our language doesn’t yet do I/O, nor does it have function calls, we’ll hard code the values for a
and b
, which means that our program won’t exactly be the most useful ever written, but that’s ok for the moment.
So, our program will be as follows:
CPU cpu = new CPU(
// Init a with "6"
PUSH, 6,
STORE, 0,
// Init b with "4"
PUSH, 4,
STORE, 1,
// Load a and b into the stack
LOAD, 0, // Stack contains a
LOAD, 1, // Stack contains a, b
ISGT, // Stack contains a > b
JIF, 21,
// This is the "else" path
LOAD, 1, // Stack contains b
STORE, 2, // Set c to the stack head, meaning c = b
JMP, 25,
// This is the "if" path, and this is the address 21
LOAD, 0, // Stack contains a
STORE, 2, // Set c to the stack head, meaning c = a
// Done; this is address 25
HALT
);
Let’s look at it slowly. In the first part, we are simply initializing the values of a
and b
with some hard-coded values. To do this, we have to first PUSH
the value to the stack and then use the STORE
instruction to pop the value and save it in the local variable.
Next, we push the variable’s values back to the stack using two LOAD
instructions. Afterwards, by using the ISGT
instruction, we pop both from the stack and replace them with a boolean representing whether a > b
: this is the boolean condition that we want to test in our if
.
It’s now time to implement the if
. Since our bytecode doesn’t have the concept of blocks, we have to simulate it somehow. We’re doing it by writing both paths of the if (in this case first the else
path and then the if
path, given how our JIF
instruction works) in a row and use some jumps instructions. The code follows this structure:
load the boolean value to test to the stack
JIF after the else code path
[[ else code path ]]
JMP after the if code path
[[ if code path ]]
So, depending on the value of the boolean value, we will execute only one of the two code paths - which is exactly what we want.
In our program, both paths are similar: they load to the stack either a
or b
, and then save the value to the local variable 2
, meaning c
.
The last instruction is the required HALT
, to shutdown cleanly our program.
Intermezzo: some x64 assembler Link to heading
While this looks quite complex, it is actually the standard way to write an if
statement in bytecode and/or assembler - as they generally don’t have blocks but only jump instructions.
To demonstrate it, let’s write the example above in C, compile it and then look at the generated code. This is our test.c
:
int main()
{
int a = 4, b = 6, c;
if (a > b) {
c = a;
} else {
c = b;
}
return 0;
}
When compiled (with a gcc test.c
) and then disassembled (objdump -d a.out
) on my machine, it produces this output:
; skipped code
400508: 8b 45 f4 mov -0xc(%rbp),%eax
40050b: 3b 45 f8 cmp -0x8(%rbp),%eax
40050e: 7e 08 jle 400518 <main+0x22>
400510: 8b 45 f4 mov -0xc(%rbp),%eax
400513: 89 45 fc mov %eax,-0x4(%rbp)
400516: eb 06 jmp 40051e <main+0x28>
400518: 8b 45 f8 mov -0x8(%rbp),%eax
40051b: 89 45 fc mov %eax,-0x4(%rbp)
; here is the return 0 part
40051e: b8 00 00 00 00 mov $0x0,%eax
400523: 5d pop %rbp
400524: c3 retq
400525: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40052c: 00 00 00
40052f: 90 nop
While definitely more complicated to understand, the structure of the code is actually quite similar: there is a cmp
instruction, followed by a jle
(jump if less or equals; basically does the job of a sequence of our ISGT
, NOT
, JIF
in one instruction), then the “if” body, then a jmp
, and then the “else” body.
If you haven’t really understood this part, no worries; this after all is not a series about x64 assembler. Just trust that the way we have designed our virtual machine is actually not that far from from how a real CPU works. :-)
Implementing a WHILE Link to heading
Let’s now implement a simple while loop. We’re going to multiply two numbers without using the MUL
instruction, which is quite strange, but makes for a simple example - I hope. :-)
Since, as you obviously know, a * b = a + a + a... (b times)
, it should be clear that this loop computes a * b
, assuming that b
is non negative:
int total = 0;
while (b >= 1) {
total += a;
--b;
}
Similarly to what we have done before, we’re going to use the variables 0
for a
, 1
for b
and 2
for total
. The code is:
CPU cpu = new CPU(
// Init a with "6"
PUSH, 6,
STORE, 0,
// Init b with "4"
PUSH, 4,
STORE, 1,
// Init total to 0
PUSH, 0,
STORE, 2,
// While part
// Here is address 12
LOAD, 1, // Stack contains b
PUSH, 1, // Stack contains b, 1
ISGE, // Stack contains b >= 1
NOT, // Stack contains b < 1
JIF, 36, // 36 is the address of the HALT label
// Inner loop part
LOAD, 0, // Stack contains a
LOAD, 2, // Stack contains a, total
ADD, // Stack contains a + total
STORE, 2, // Save in total, meaning total = a + total
LOAD, 1, // Stack contains b
PUSH, 1, // Stack contains b, 1
SUB, // Stack contains b - 1
STORE, 1, // Save in b, meaning b = b - 1
JMP, 12, // Go back to the start of the loop
HALT
);
The beginning of the code is the same as before: we simply initialize the three local variables. We could have relied on the fact that an uninitialized variable is automatically set to 0 and avoided the initialization of total
, but being explicit should be a bit clearer.
Afterwards comes the while statement. First we compose on the stack the boolean expression b < 1
using a short sequence of instructions, and then we jump to the end of the program if the condition is true; this is equivalent to saying that we go on in case b < 1
is false, or rather b >= 1
as we have written in the code above.
Next, in the inner part of the loop we increment total
by a
and decrement b
by 1
. Finally we go back to the beginning of the loop and test the value of b
again.
Conclusions Link to heading
I hope you found this enjoyable; personally I’ve always found quite impressive how an if
statement can be mapped in bytecode/assembler with just some jumps instructions.
In the next part, we’ll start talking about function calls.
Update: part 6 is online.