Emjay - implementing function calls
In this post, I am following up on the explanation of my simple JIT compiler glorified calculator Emjay and I will show you how I implemented function calls.
Why the complexity? Link to heading
Imagine we are JIT-compiling the following program:
fn f() {
return 1 + g(2);
}
fn g(x) {
return 2 * x;
}
In the backend, when generating the code for f
, we need to insert a call to g
. But to do that, we need the address of g
which we don’t know when we are generating the code, because we will do the call to mmap
to get the address only in a further step of the pipeline:
How to solve it Link to heading
My original idea was to do a sort of relocation:
- when generating a function call, generate a bunch of
nop
instead of the jump instruction initially; - keep track of a mapping between positions in the assembler and functions being called;
- JIT all functions and create a mapping between function and address;
- go over all the mappings and replace the
nop
with jump instructions with the actual address.
However, a colleague suggested I try to use a trampoline function, which is what I ended up implementing. The idea I have implemented is the following:
- each function gets assigned an id (simply a progressive integer);
- the assembly code generated for a function call will always invoke a certain function defined in the Rust “runtime”, which has a fixed address;
- one of the parameters of the trampoline call will be the id of the function to call;
- the trampoline will rely on a map that associates function ids to their addresses, filled when the JIT compiles all the functions, but before we start executing code. I’ve called this
CompiledFunctionCatalog
, and the trampoline function will receive a pointer to it.
Pretty often trampolines are used to lazily compile functions as necessary; for example, see how Mono does it. However, Emjay compiles all functions eagerly.
An example Link to heading
For the example above, the frontend will assign the id 0
to f
and 1
to g
. The generated IR for the function f
will then be:
fn f - #args: 0, #reg: 4 {
0: mvi @r0, 1
1: mvi @r2, 2
2: call @r1, g:1(r2)
3: add @r3, r0, r1
4: ret r3
}
where the syntax call @r1, g:1(r2)
means “call the function named g
, which has id 1
, passing as argument r2
, and store the result in r1
”. (I know, the syntax for my IR is a bit cryptic).
In the Apple Silicon (aarch64) architecture, the calling conventions say that:
- the first 8 function parameters go in registers
x0
..x7
and the rest go on the stack; - the return value is stored into
x0
; - some registers (
x9
..x15
) are caller-saved, meaning that the callee can modify them freely; - other registers (
x19
..x28
) are callee-saved, meaning that the callee must save and restore them if it uses them.
Here is a great article about calling conventions (in general) and the specific ones for aarch64
if you’re interested.
I have not deviated from the standard function call; arguments go in x0
..x7
, return value is in x0
and I have chosen to use x19
to store the trampoline function address. It’s not uncommon for a trampoline to have a special calling convention though, but having used the standard one allows me to define it very simply as a standard Rust function.
The code in the backend, when generating a function call, uses a pretty “naive” implementation where all the registers that are used in the function get saved, even if they are not actually used at that moment. Continuing the example above, the generated assembler for f
becomes:
# standard function prologue
stp x29, x30, [sp, #-64]!
mov x29, sp
# implementation of mvi @r0, 1
movz x9, 1
# implementation of mvi @r2, 2
movz x10, 2
## call to g: start
# save registers on the stack
str x0, [x29, #24]
str x19, [x29, #32]
str x9, [x29, #40]
str x10, [x29, #48]
str x11, [x29, #56]
# parameters: first the function catalog...
movz x0, 105553154965536
# ...then the called function id (g is 1)
movz x1, 1
# ...and then the actual parameter of g
mov x2, x10
# x19 will contain the trampoline address
movz x19, 4372771476
# perform the actual call!
blr x19
# restore saved registers from the stack
ldr x11, [x29, #56]
ldr x10, [x29, #48]
ldr x9, [x29, #40]
ldr x19, [x29, #32]
# copy return result to x11
mov x11, x0
ldr x0, [x29, #24]
## call to g: done
# implementation of add @r3, r0, r1
add x10, x9, x11
# move the return value in x0
mov x0, x10
# standard function epilogue and return
ldp x29, x30, [sp], #64
ret
All the function call code basically represents:
jit_call_trampoline(function_catalog_ptr, 2)
Yes, assembler is more verbose than high level languages! 😁
The trampoline function is defined as follows in the Rust code:
pub fn jit_call_trampoline(
function_catalog_ptr: *const CompiledFunctionCatalog,
function_index: usize,
a0: i64,
a1: i64,
a2: i64,
a3: i64,
a4: i64,
a5: i64,
) -> i64 {
Notice that the signature always takes 2 + 6 = 8 arguments, because I did not implement spilling to the stack in Emjay’s backend. Therefore, since we need to use two parameters for the function catalog and the callee’s id, only six registers are left; thus functions in Emjay can only take up to six parameters. It is one of its many limitations 🙂.
The actual implementation of the trampoline is very simple, once we remove all the debug print and comments: it simply looks up the callee’s address, given its id, and then invokes it:
let function_catalog = unsafe { &*function_catalog_ptr };
let fun = function_catalog.get_function_pointer(FunctionId(function_index));
fun(a0, a1, a2, a3, a4, a5);
The function_catalog
is a very simple struct:
#[derive(Debug)]
pub struct CompiledFunctionCatalog {
// Indexed by FunctionId, which are dense. Thus, we can use a simple Vec
// and avoid the extra cost of an hash map
addresses: Vec<JitFn>,
}
pub type JitFn = fn(i64, i64, i64, i64, i64, i64) -> i64;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct FunctionId(pub usize);
The catalogue is created before the JIT creates the function pointers, so it has a fixed address. When the JIT compiles all the functions, it will fill the catalogue with all the functions’ addresses. When the main function’s execution starts the catalogue will be fully populated. So, our trampoline will work correctly.
Conclusions Link to heading
I have chosen to use a trampoline to handle function calls in my JIT Emjay. It has been pretty simple to implement, and it is a useful technique with a lot of applications. My implementation is quite basic and limited, as is the rest of Emjay, but it was still a very useful learning exercise - and it was an elegant and simple solution to handle relocation logic. If you’re interested in the complete implementation, check it out on GitHub.
I hope you have found this post interesting! ☺️