A JVM in Rust part 6 - Methods and exceptions

Published Sunday, Sep 24, 2023 - 1943 words, 10 minutes

Tagged:

This post is part of the Writing a JVM in Rust series.

In this post, I will continue the discussion about how RJVM executes the Java bytecode instruct. If you haven’t read the previous parts, most of this discussion will be hard to follow, so use the links above to check them out!

Invoking methods

As I have discussed in part 4, the JVM has various instructions for invoking methods. As a recap, there are:

invokestatic to invoke a static method;
invokevirtual and invokeinterface to execute standard methods, while performing virtual methods resolution;
invokespecial to execute a standard method while not performing virtual method resolution.

I will not cover invokedynamic since I have not implemented it in RJVM.

The gist of all these instructions is:

determine the actual method to invoke, performing virtual method resolution if necessary;
find the receiver (i.e. the object on which the method is invoked);
collect the arguments;
allocate a new call frame;
start executing the new method.

As discussed in part 5, in RJVM I have also used the native stack of Rust to execute methods.

Let us see some code! We start with the following enum:

/// One of the possible invocation kind of methods in the JVM.
#[derive(Clone, Copy)]
enum InvokeKind {
    /// Special instance methods include constructors and calls
    /// to method of this class, bypassing virtual function resolution
    Special,
    /// Static methods do not take a receiver object
    Static,
    /// Virtual instance methods will apply the virtual function resolution
    Virtual,
    /// Invokation of an interface's method. Will apply the
    /// virtual function resolution
    Interface,
}

In the big switch on the instruction bytecode, we see that all the opcode have a similar implementation:

Instruction::Invokespecial(constant_index) => {
    self.invoke_method(vm, call_stack, constant_index, InvokeKind::Special)?
}

Where CallFrame::invoke_method is as follows (I am omitting an ugly hack here; you can look it up on GitHub if you want 😅):

fn invoke_method(
    &mut self,
    vm: &mut Vm<'a>,
    call_stack: &mut CallStack<'a>,
    constant_index: u16,
    kind: InvokeKind,
) -> Result<(), MethodCallFailed<'a>> {
    let method_reference = self.get_constant_method_reference(constant_index)?;
    let static_method_reference =
        self.get_method_to_invoke_statically(
            vm, call_stack, method_reference, kind
        )?;

    let (receiver, params, new_stack_len) =
        self.get_method_receiver_and_params(&static_method_reference)?;
    let class_and_method = match kind {
        InvokeKind::Virtual | InvokeKind::Interface => {
            Self::resolve_virtual_method(
                vm, receiver.clone(), static_method_reference
            )?
        }
        _ => static_method_reference,
    };

    self.stack.truncate(new_stack_len)?;
    let method_return_type = class_and_method.return_type();
    let result = vm.invoke(call_stack, class_and_method, receiver, params)?;
    Self::validate_type_opt(vm, method_return_type, &result)?;

    if let Some(value) = result {
        self.push(value)?;
    }
    Ok(())
}

Let’s go through this code with an example. Suppose we have something like this:

class Base {
    void foo() {}
}

class Derived extends Base {
    @Override void foo() {}
}

class Main {
    void test(Base a) {
        a.foo();
    }
}

We can see using javap that the bytecode for Main::test is as follows:

void test(Base);
  Code:
    0: aload_1
    1: invokevirtual #7  // Method Base.foo:()V
    4: return

The instruction invokevirtual refers to the constant number 7, which is shown as a comment to be Base.foo. Notice that the bytecode also include the method signature, in this case ()V (i.e. a void method taking no arguments). The signature is essential in retrieving the correct method to invoke, since Java supports method overloading, and thus the name is not enough to disambiguate the actual function to invoke.

Therefore, the first step that CallFrame::invoke_method does is to retrieve a reference to method to invoke, without performing virtual methods resolution:

let method_reference = self.get_constant_method_reference(constant_index)?;
let static_method_reference =
    self.get_method_to_invoke_statically(vm, call_stack, method_reference, kind)?;

Next, the method receiver (i.e. the object on which the method is invoked) is popped from the stack, and so are all the method arguments. In case of a static method, we do not pop the receiver from the stack, since static methods do not have a this. Otherwise, the value of this is always passed in the first local variable slot of the new call frame.

let (receiver, params, new_stack_len) =
    self.get_method_receiver_and_params(&static_method_reference)?;

Notice how this method returns multiple information at once:

the receiver, wrapped in an Optional;
a vector with the parameters;
and the new stack length after popping the receiver and the arguments.

The last step before actually invoking the method is to perform virtual method resolution, if necessary:

let class_and_method = match kind {
    InvokeKind::Virtual | InvokeKind::Interface => {
        Self::resolve_virtual_method(vm, receiver.clone(), static_method_reference)?
    }
    _ => static_method_reference,
};

Notice how we do not do virtual method resolution for invokestatic or invokespecial.

We finally can invoke the method and, if it is not void, push its result on the stack:

self.stack.truncate(new_stack_len)?;
let method_return_type = class_and_method.return_type();
let result = vm.invoke(call_stack, class_and_method, receiver, params)?;
Self::validate_type_opt(vm, method_return_type, &result)?;

if let Some(value) = result {
    self.push(value)?;
}
Ok(())

Notice how I have used liberally the ? Rust operator to return early in case of any error during any of these steps. I have said it before, but I’ll say it again - it is a fantastic feature of the language.

Another detail to point out is that my implementation is very basic. It performs linear searches every single time we invoke a function. Obviously, it is not how a real JVM (or any interpreter, for that matter) works: there are various optimization strategies, to avoid repeating the lookup every time.

Resolving methods statically

To resolve a method, we simply look it up in the referred class:

fn get_method_to_invoke_statically(
    &self,
    vm: &mut Vm<'a>,
    call_stack: &mut CallStack<'a>,
    method_reference: MethodReference,
    kind: InvokeKind,
) -> Result<ClassAndMethod<'a>, MethodCallFailed<'a>> {
    let class = vm.get_or_resolve_class(call_stack, method_reference.class_name)?;
    match kind {
        InvokeKind::Special | InvokeKind::Static => {
            Self::get_method_of_class(class, method_reference)
                .map(|method| ClassAndMethod { class, method })
        }
        InvokeKind::Virtual | InvokeKind::Interface => {
            Self::get_method_checking_superclasses(class, method_reference)
        }
    }
}

In case of virtual methods, the bytecode might contain things like Derived::method, even when method is actually defined on a superclass. This does not happen for invokestatic or invokespecial, though, so we can do the lookup only on the concrete class rather, than checking its superclass (and implemented interfaces) as well.

Virtual methods resolution

In the example above, when we execute Main::test we might end up invoking Base::foo or Derived::foo, depending on the actual type of the object passed as argument. Determining which version to invoke is done by CallFrame::resolve_virtual_method:

fn resolve_virtual_method(
    vm: &Vm<'a>,
    receiver: Option<AbstractObject>,
    class_and_method: ClassAndMethod,
) -> Result<ClassAndMethod<'a>, MethodCallFailed<'a>> {
    match receiver {
        Some(receiver) if receiver.kind() == ObjectKind::Object => {
            let receiver_class = vm.find_class_by_id(receiver.class_id()).ok_or(
                VmError::ClassNotFoundException(receiver.class_id().to_string()),
            )?;
            let resolved_method = Self::get_method_checking_superclasses(
                receiver_class,
                MethodReference {
                    class_name: &class_and_method.class.name,
                    method_name: &class_and_method.method.name,
                    type_descriptor: &class_and_method.method.type_descriptor,
                },
            )?;
            Ok(resolved_method)
        }
        _ => Err(MethodCallFailed::InternalError(
            VmError::ValidationException,
        )),
    }
}

The idea is to do a lookup of the method starting with the actual class of the receiver object. The static resolution that we saw earlier used the class defined in the bytecode, i.e. in our example above Base. But here we do the lookup once again starting with the class of the actual object we have on the stack, going up on the superclasses hierarchy, until we find the “most overridden” version of the method.

Exceptions

I have already given a high level view of how exception work in RJVM in part 1, but let us see some more details. The JVM has an instruction athrow that will pop the exception from the stack and start the process of dispatching the exception to the handler, but it has no catch instruction. Rather, the catch blocks in the source code are implemented via exception tables.

You might recall that, in part 2, I mentioned that the code attribute of a method contains an exception table. This table has the following structure:

{   u2 start_pc;
    u2 end_pc;
    u2 handler_pc;
    u2 catch_type;
} exception_table[exception_table_length];

Basically it is a list of entries, each containing:

a range of bytecodes to which they apply (start is inclusive, end is not),
the bytecode address of the handler,
and the type of the caught class, referenced by a constant in the class pool. It can be zero to indicate a “catch all classes”, which is used by the Java compiler to implement finally.

This table is global for the whole method, even if a given function contains multiple try/catch blocks.

In RJVM, I have modelled this via an explicit ExceptionTable struct, referred by the ClassFileMethodCode:

pub struct ClassFileMethodCode {
  // ...
  pub exception_table: ExceptionTable,
}

/// Exception table of a method's code
#[derive(Debug, Default, PartialEq)]
pub struct ExceptionTable {
    entries: Vec<ExceptionTableEntry>,
}

/// Entries of the exception table
#[derive(Debug, PartialEq, Eq, Clone)]
pub struct ExceptionTableEntry {
    /// The range of program counters that this entry covers
    pub range: Range<ProgramCounter>,
    /// The address of the handler of this entry
    pub handler_pc: ProgramCounter,
    /// The class or superclass that matches this entry
    pub catch_class: Option<String>,
}

To model the fact that an exception has been thrown, I have decided to use the following type as the possible returned failure of the execution of an instruction:

/// Models the fact that a method execution has failed
#[derive(Debug, PartialEq)]
pub enum MethodCallFailed<'a> {
    InternalError(VmError),
    ExceptionThrown(JavaException<'a>),
}

/// Newtype that wraps a java exception
#[derive(Debug, PartialEq)]
pub struct JavaException<'a>(pub AbstractObject<'a>);

and this is the code that handles it:

let instruction_result = self.execute_instruction(vm, call_stack, instruction);
match instruction_result {
    Ok(ReturnFromMethod(return_value)) => return Ok(return_value),
    Ok(ContinueMethodExecution) => {}

    Err(MethodCallFailed::InternalError(err)) => {
        return Err(MethodCallFailed::InternalError(err))
    }

    Err(MethodCallFailed::ExceptionThrown(exception)) => {
        let exception_handler = self.find_exception_handler(
            vm,
            call_stack,
            executed_instruction_pc,
            &exception,
        );
        match exception_handler {
            Err(err) => return Err(err),
            Ok(None) => {
                // Bubble exception up to the caller
                return Err(MethodCallFailed::ExceptionThrown(exception));
            }
            Ok(Some(catch_handler_pc)) => {
                // Re-push exception on the stack and continue execution
                // of this method from the catch handler
                self.stack.push(Value::Object(exception.0))?;
                self.pc = catch_handler_pc;
            }
        }
    }
}

Therefore, if executing the current instruction has resulted in an exception being thrown, we check if the current method has an handler in its exception table for it. If a matching handler is found, we push the exception on the stack and we resume execution from the handler’s address. Otherwise, we simply do an early return of the method, propagating the same error.

Thus, regardless of whether the exception has been thrown directly by the current method (if the instruction was an athrow) or by another method invoked by the current one (if the instruction was one of the invokeXXX family), both situation will result in the same case: the execution of the instruction will return an Err(ExceptionThrown) and the exception will either be managed by the current method or propagated to the caller. Just as it should be. 😊

Finding the exception handler

Finding a matching exception handler is just a question of checking all of them, in order, and stopping with the first one that matches, i.e. whose catch class is a superclass of the actual exception class, since obviously a catch (Exception) e will handle all subclasses of Exception:

fn find_exception_handler(
    &self,
    vm: &mut Vm<'a>,
    call_stack: &mut CallStack<'a>,
    executed_instruction_pc: ProgramCounter,
    exception: &JavaException<'a>,
) -> Result<Option<ProgramCounter>, MethodCallFailed<'a>> {
    let exception_table = &self
        .class_and_method
        .method
        .code
        .as_ref()
        .unwrap()
        .exception_table;

    // We shouldn't use self.pc, since we have already incremented it!
    let catch_handlers = exception_table.lookup(executed_instruction_pc);

    // Stop at the first matching catch handler.
    // We expect to have very few for a given instruction, in real code!
    for catch_handler in catch_handlers {
        match &catch_handler.catch_class {
            None => return Ok(Some(catch_handler.handler_pc)),
            Some(class_name) => {
                let catch_class = vm.get_or_resolve_class(call_stack, class_name)?;
                let exception_class = vm.get_class_by_id(exception.0.class_id())?;
                if exception_class.is_subclass_of(catch_class) {
                    return Ok(Some(catch_handler.handler_pc));
                }
            }
        }
    }
    Ok(None)
}

Once again, notice how we express with a combination of the Result and Option types the fact that this search can find a match, that it can find no matches, or that an unexpected error can happen.

Conclusions

We are nearing the end of this (long) series! 🎉 The next part will cover how objects and arrays are implemented in RJVM and some more details about the garbage collections that weren’t covered in part 1.

Once again, thanks a lot for reading! 😊