In this series we’re going to delve a bit into stack based virtual machines. First we’re going to see an overview, then we’ll build our own toy VM. Next, we’re going to see how what we’ll build maps to a real CPU. Finally, we’ll discuss the most famous of all: the JVM.
What’s a stack based virtual machine then? It’s an abstraction of a computer, that emulates a real machine. Generally it is built as an interpreter of a special bytecode, that translates its in real time for execution on the CPU.
Let’s start with a trivial example: suppose your program needs to add two numbers. To do that in a stack VM, the program will push the first number to the stack, push the second and then execute some form of the special instruction add, that will pop the first two elements of the stack and replace them with their sum. Let’s see it step by step:
SP is the stack pointer, which refers to the head of the stack. The
IP is the instruction pointer, which points to the address of the current instruction to execute. Let’s now execute the first instruction:
You can see that the stack now contains
1 and that the instruction pointer has been moved to the next instruction. Let’s now simulate the second instruction:
Finally we can execute the
The head of the stack and the previous element have been popped and replaced with their sum.
As you have seen from the example, the main two data structures for a stack VM are the code listing, with the instruction pointer, and the stack data, which is accessed only via the stack pointer. While these two data structures seem trivial, they are more than enough to implement a lot of complex programs. By adding some external memory area (what is generally known as “memory on the heap”), this structure will become complex enough to form the basis of a real language such as Java or Scala.
The stack will be the central structure. From what we hve seen above you should have some idea on how to use it to implement the basic arithmetic, but it is also going to be the basis of implementing loops and conditional execution (if) and even function calls! We’re going to discuss all of this while building our toy VM.
Closely related to stack based VM are register based virtual machines. They are also interpreters of bytecode, but their design is quite different, since they don’t use the stack for the operands but rather a set of registers. While they tend to be more complex, they are also generally faster at runtime, since they map much more closely to the CPU (which, as we’ll see later, is actually an hardware register machine) and thus they tend to generate and execute better efficient code.
However stack based virtual machines are not just a learning toy. The most successful VM ever, the Java Virtual Machine, is a stack-based virtual machine (and so is the CLR, the basis of .NET). Furthermore the JVM is extremely high performant, while still quite simple - although that has been achieved more by the immense amount of money that has flown into its development than by some special characteristic of its architecture.
I hope this short introduction has stimulated your curiosity. Here are a few links in case you wish to start reading something more:
About the JVM:
- JVM bytecode introduction
- JVM bytecode instructions
- Introduction to ASM, a JVM bytecode manipulation library
Update: part two is online.