A JVM in Rust part 7 - Objects and GC

This post is part of the Writing a JVM in Rust series.

In this post, I discuss how I have modelled objects in RJVM - we’ll see a bit of lower-level code, with some pointer fiddling! ๐Ÿค“ Again, if you haven’t read the previous parts, some of this discussion might be hard to follow, so use the links above to check them out!

Modelling objects

The initial implementation I did for object was the simplest possible: a pointer to the class, and a flat array of Values, one for each field, including those in the superclass:

#[derive(Clone, PartialEq)]
pub struct ObjectValue<'a> {
    pub class_id: ClassId,
    fields: RefCell<Vec<Value<'a>>>,
}

impl<'a> ObjectValue<'a> {
    pub fn new(class: &Class<'a>) -> Self {
        let fields = (0..class.num_total_fields)
            .map(|index| {
                let field = class.field_at_index(index).unwrap();
                match &field.type_descriptor {
                    FieldType::Base(base_type) => match base_type {
                        BaseType::Byte => Value::Int(0),
                        BaseType::Char => Value::Int(0),
                        BaseType::Double => Value::Double(0f64),
                        BaseType::Float => Value::Float(0f32),
                        BaseType::Int => Value::Int(0),
                        BaseType::Long => Value::Long(0),
                        BaseType::Short => Value::Int(0),
                        BaseType::Boolean => Value::Int(0),
                    },
                    FieldType::Object(_) => Value::Null,
                    FieldType::Array(_) => Value::Null,
                }
            })
            .collect();
        Self {
            class_id: class.id,
            fields: RefCell::new(fields),
        }
    }

    pub fn set_field(&self, index: usize, value: Value<'a>) {
        self.fields.borrow_mut()[index] = value;
    }

    pub fn get_field(&self, index: usize) -> Value<'a> {
        self.fields.borrow()[index].clone()
    }
}

Easy enough. But this implementation had the downside that each object, and each field, were allocated separately, and I wanted to improve on that to play a bit with the GC. Therefore, I have eventually switched on a more complex and lower level representation.

First, I allocate a large chunk of memory at the beginning of the program. Then, I use a classic bump allocator so that allocating each object and array means just bumping a pointer. All objects are thus contiguous in memory. Each object is preceded by an header, which contains a reference to the class and some other booking data.

The following are the most interesting parts of the code for the memory allocator:

/// An allocation on our memory chunk
pub struct AllocEntry {
    pub(crate) ptr: *mut u8,
    pub(crate) alloc_size: usize,
}

/// Models an allocated chunk of memory
struct MemoryChunk {
    memory: *mut u8,
    used: usize,
    capacity: usize,
}

impl MemoryChunk {
    fn new(capacity: usize) -> Self {
        let layout = Layout::from_size_align(capacity, 8).unwrap();
        let ptr = unsafe { std::alloc::alloc_zeroed(layout) };

        MemoryChunk {
            memory: ptr,
            capacity,
            used: 0,
        }
    }

    /// Allocates from the chunk, or returns None if there is not enough space
    fn alloc(&mut self, required_size: usize) -> Option<AllocEntry> {
        if self.used + required_size > self.capacity {
            return None;
        }

        let ptr = unsafe { self.memory.add(self.used) };
        self.used += required_size;

        Some(AllocEntry {
            ptr,
            alloc_size: required_size,
        })
    }
}

On top of this basic abstraction, I have built the ObjectAllocator, which is responsible for allocating new instance of objects, and running the GC when there is no more free memory. This is the part of the code that allocates an object:

pub struct ObjectAllocator<'a> {
    current: MemoryChunk,
}

impl<'a> ObjectAllocator<'a> {
    /// Allocates a new object, or returns None if the memory is full
    pub fn allocate_object(&mut self, class: &Class<'a>)
      -> Option<AbstractObject<'a>> {
        let size = AbstractObject::size_of_object(class);
        self.current
            .alloc(size)
            .map(|alloc_entry| AbstractObject::new_object(class, alloc_entry))
    }
}

As you can see, it is pretty simple: we simply compute the size of the object and allocate it from the memory chunk. Computing the size is done in AbstractObject:

impl<'a> AbstractObject<'a> {
    pub(crate) fn size_of_object(class: &Class) -> usize {
        let fields_sizes: usize = 8 * class.num_total_fields;
        ALLOC_HEADER_SIZE + OBJECT_HEADER_SIZE + fields_sizes
    }
}

const ALLOC_HEADER_SIZE: usize = align_to_8_bytes(size_of::<AllocHeader>());
const OBJECT_HEADER_SIZE: usize = align_to_8_bytes(size_of::<ObjectHeader>());

/// The first word of any allocated object or array
#[bitfield(u64)]
#[derive(PartialEq, Eq)]
pub(crate) struct AllocHeader {
    #[bits(1)]
    pub(crate) kind: ObjectKind,

    #[bits(1)]
    pub(crate) state: GcState,

    #[bits(30)]
    identity_hash_code: i32,

    #[bits(32)]
    pub(crate) size: usize,
}

#[derive(PartialEq, Eq, Clone, Copy, Debug)]
pub(crate) enum GcState {
    Unmarked,
    Marked,
}

#[derive(PartialEq, Eq, Clone, Copy, Debug)]
pub enum ObjectKind {
    Object,
    Array,
}

/// The second word of an allocated "classical" object
#[repr(transparent)]
struct ObjectHeader {
    class_id: ClassId,
}

Let’s see this in detail. First, we have 64 bits dedicated to a generic allocation header - which contains a bit to determine if the allocation is referring to an object or to an array, another bit for the garbage collector, some bits for the identity hash code, and then some for the size of the allocation. Then, we have the class id, which is an u32… but we pad it to 64 bits, because CPUs like aligned reads and the code is complicated enough as is. ๐Ÿ™‚

Incidentally, this was greatly inspired by the JVM, which currently also uses an object header of 128 bits on a 64 bit architecture. However, there’s a project currently ongoing, called Lilliput, which aims to reduce it to 64 bits - if you are curious, I can recommend this recent talk about it.

For storing the fields’ values, I have simply decided to use 8 bytes for each field - which obviously wastes memory, because an int would fit in 4 bytes, a boolean in 1 ,etcโ€ฆ but, again, I wanted to keep things relatively simple. In this way, computing a field’s offset is simply a matter of some trivial pointer arithmetic:

impl<'a> AbstractObject<'a> {
    pub(crate) unsafe fn ptr_to_field_value(&self, field_index: usize) -> *mut u8 {
        let preceding_fields_size = 8 * field_index;
        let offset = ALLOC_HEADER_SIZE + OBJECT_HEADER_SIZE + preceding_fields_size;
        self.data.add(offset)
    }
}

Then, accessing an object’s fields is done via some simple pattern matching:

unsafe fn write_value(ptr: *mut u8, value: Value) {
    match value {
        Value::Int(int) => std::ptr::write(ptr as *mut i32, int),
        Value::Long(long) => std::ptr::write(ptr as *mut i64, long),
        Value::Float(float) => std::ptr::write(ptr as *mut f32, float),
        Value::Double(double) => std::ptr::write(ptr as *mut f64, double),
        Value::Uninitialized | Value::Null => std::ptr::write(ptr as *mut u64, 0),
        Value::Object(obj) => std::ptr::write(ptr as *mut AbstractObject, obj),
    }
}

// Notice that we do not store the value's type, but we expect the caller to pass
// to us the correct type.
unsafe fn read_value<'a>(ptr: *const u8, field_type: &FieldType) -> Value<'a> {
    match field_type {
        FieldType::Base(BaseType::Boolean)
        | FieldType::Base(BaseType::Byte)
        | FieldType::Base(BaseType::Char)
        | FieldType::Base(BaseType::Short)
        | FieldType::Base(BaseType::Int) =>
            Value::Int(std::ptr::read(ptr as *const i32)),
        FieldType::Base(BaseType::Long) =>
            Value::Long(std::ptr::read(ptr as *const i64)),
        FieldType::Base(BaseType::Float) =>
            Value::Float(std::ptr::read(ptr as *const f32)),
        FieldType::Base(BaseType::Double) =>
            Value::Double(std::ptr::read(ptr as *const f64)),
        FieldType::Object(_) | FieldType::Array(_) =>
            match std::ptr::read(ptr as *const i64) {
                0 => Value::Null,
                _ => Value::Object(std::ptr::read(ptr as *const AbstractObject)),
            },
    }
}

Low-level code, but (IMHO) it is much nicer to read and write than the equivalent C or C++!

Arrays

Arrays are modelled in a similar way. The allocation header is slightly different - for various reasons, RJVM needs to keep track of the elements type:

/// The second word of an allocated array
struct ArrayHeader {
    elements_type: ArrayEntryType,
    length: u32,
}

However, storing entries is done with the same code as for objects - we simply compute the offset of the i-th element and read or write the value.

Garbage collection

I have discussed the algorithm for the garbage collector I have implemented for RJVM in part 1, so I will not go into more details here. However, there is one interesting part that I want to discuss - finding alive objects’, i.e. the GC roots.

To do this, RJVM iterates on each call frame and retrieves all the values on the stack and on the local variables. Since there is no threading, this is relatively simple to do - it is much more complicated in a real garbage collector with threading, though - there are whole books (and careers) dedicated to this topic! The slightly simplified code is as follows:

impl <'a> Vm<'a> {
    pub fn run_garbage_collection(&mut self) -> Result<(), VmError> {
        let mut roots = vec![];
        roots.extend(self.call_stacks.iter_mut().flat_map(|s| s.gc_roots()));

        unsafe {
            self.object_allocator
                .do_garbage_collection(roots, &self.class_manager)?;
        }
        Ok(())
    }
}

impl<'a> CallStack<'a> {
    pub fn gc_roots(&mut self)
        -> impl Iterator<Item = *mut AbstractObject<'a>> {
        let mut roots = vec![];
        roots.extend(
            self.frames
                .iter_mut()
                .flat_map(|frame| frame.as_mut().gc_roots()),
        );
        roots.into_iter()
    }
}

impl<'a> CallFrame<'a> {
    pub fn gc_roots(&mut self)
        -> impl Iterator<Item = *mut AbstractObject<'a>> {
        let mut roots = vec![];
        roots.extend(self.stack.iter_mut().filter_map(|v| match v {
            Value::Object(o) => Some(o as *mut AbstractObject),
            _ => None,
        }));
        roots.extend(self.locals.iter_mut().filter_map(|v| match v {
            Value::Object(o) => Some(o as *mut AbstractObject),
            _ => None,
        }));
        roots.into_iter()
    }
}

However, as some people pointed out in the HackerNews discussion, this approach has a big bug. In particular, if the garbage collector is triggered while executing a native method (i.e. a method whose implementation is in Rust and no in Java), all sorts of things may go wrong! This is because RJVM does not consider the native (i.e. Rust) stack when collecting the roots, so any objects referenced only from the native stack might be collected. And, in any case, objects will be moved from one semi space to the other - so we will be left in the native method with dangling references to the old semi space, which - while not a cause for invalid memory accesses, is still a bug.

Root determination in a GC is a hard problem - my simplified implementation is definitely buggy. But, again, this was a just a toy project, so I am still happy - the implementation was not trivial, and while not it is not a production-ready GC, building it has been a fantastic learning exercise

Conclusions

And that’s it for this part - and for the explanation of how RJVM works! ๐ŸŽ‰ In the next (and last!) part, I will do a bit of retrospective about the project and my experience with Rust.

Once more, thanks a lot for reading! ๐Ÿ˜Š