• code
  • technical
  • rust
  • zig

Arena Allocated Caches in Rust and Zig

I really enjoy using Rust, whether it’s for web services with async, CLI tools, or building libraries. I find it both fun to use and very pragmatic for anything where I want high confidence in both reliability and performance. More recently I have been spending a lot of time learning and using Zig. I’m finding it to also be excellent, but more as a “better C”. Zig makes direct control easy and ergonomic while still providing some of the niceties of a modern language, like exhaustive switching on tagged unions, better error handling, defer blocks, etc.

On this journey of building and learning with Zig, different coding patterns arise for optimizations, and I like to think about what those patterns might look like in Rust. Zig makes many patterns direct and easy to implement, especially coming from Rust, which often requires a lot more work for the same control; having to dabble in unsafe or utilize various crates that provide safe wrappers is often unwieldy. This is a look at a coding pattern relating to memory allocations that hits a few significant challenges in Rust: direct allocator control, self-referential structs, and Send/Sync trait restrictions. I’ll show that by composing some crates and a smidgen of unsafe, we get to a fairly elegant solution.

The Problem

Motivation

In Rust web services processing many requests per second and caching data, it’s not uncommon to find allocs/deallocs to be one of the first things to optimize due to default owning types like String, Vec, and HashMap leading to many small and differently sized allocations (e.g., if you have a struct with five String fields, that is five different dynamically sized allocations; each has to interact with the global allocator to alloc when created and dealloc on drop).

These allocations put heavy pressure on the global allocator, and when compared to the other work being done, such as processing network requests, reading strings, or doing math operations, they can dominate the workload. This allocation spam can also cause significant blocking within a thread when a large object graph containing thousands of differently sized small allocations is dropped and the memory is deallocated for every allocation all at once.

I mention “differently sized allocations” explicitly because modern allocators like mimalloc and jemalloc can deal with lots of repeated same sized allocs/deallocs very efficiently (although there still can be benefits to not having to go through them).

Notably this can be elegantly handled in Zig as one of the core patterns in Zig is to have allocators explicitly passed to functions that need to allocate. This makes it ergonomic to do things like arena-based allocation patterns for better cache locality, avoid thrashing on a global allocator, and more generally it can avoid allocations entirely as you can architect your code to keep memory better scoped by utilizing this explicit allocator passing.

The Pattern

  • We have an entry of data that is written once and read many times (effectively a cache)
  • The entry has lots of heap-allocated and nested dynamically sized types
  • Data structures in the entry are dynamically sized and used for indexing (hash maps and other pivoting storage)
  • The entry should be easily shared between threads as read-only after creation
  • The entry should be fast to create and destroy
  • Atomic reference counting is used for coordinated cleanup as the entry is shared across threads
  • Bonus: it’d be nice to be able to share substructures that are also reference counted, enabling passing references directly to relevant data within the entry

This type of pattern is useful for short-lived rolling caches where you want to cache state associated with events that have recently occurred, but only for some sliding window, with the entries eventually being cleared from the cache.

Here are the example structures we’ll use for demonstrating this pattern:

const Account = struct {
    owner: []u8,
    data: []u8,
    index: u64,
};

const SelectedAccount = struct {
    id: []const u8,
    account: Account,
    source: *Accounts,
};

const Accounts = struct {
    slot: u64,
    // Map from account ids to accounts
    map: std.StringHashMapUnmanaged(Account),
};

The specific names and fields don’t matter much. The structures are intended to demonstrate some nested dynamically sized types (HashMap and u8 slices).

  • Accounts is the “entry” in the pattern, it has all the accounts associated with a slot number.
  • Account represents a single account, and notably this type is not restricted to just being used as a value in the Accounts map.
  • SelectedAccount represents a “projected borrow” of a specific account from the Accounts map along with the reference-counted handle back to the owning Accounts entry.

I’m going to walk through what this looks like in Zig, then I’ll dive into one approach for tackling it in Rust. There are some concepts I won’t dive too deeply into, but I will provide references for further reading. The repository with all the code examples is available here.

Building in Zig

In Zig with the previously defined structures, we can easily allocate everything in an arena (modifying Accounts to contain the arena):

const Accounts = struct {
    arena: std.heap.ArenaAllocator,
    slot: u64,
    map: std.StringHashMapUnmanaged(Account),

    pub fn init(backing_allocator: std.mem.Allocator, slot: u64) !*Accounts {
        const self = try backing_allocator.create(Accounts);
        self.* = .{
            .arena = std.heap.ArenaAllocator.init(backing_allocator),
            .slot = slot,
            .map = .{},
        };
        return self;
    }

    pub fn insert(
        self: *Accounts,
        id: []const u8,
        owner: []const u8,
        index: u64,
        data: []const u8,
    ) !void {
        const alloc = self.arena.allocator();
        try self.map.put(alloc, try alloc.dupe(u8, id), .{
            .owner = try alloc.dupe(u8, owner),
            .data = try alloc.dupe(u8, data),
            .index = index,
        });
    }

    pub fn get(self: *Accounts, id: []const u8) ?Account {
        return self.map.get(id);
    }

    pub fn deinit(self: *Accounts) void {
        const backing = self.arena.child_allocator;
        self.arena.deinit(); // frees everything at once
        backing.destroy(self);
    }
};

// Usage:
const accounts = try Accounts.init(allocator, 42);
defer accounts.deinit();

try accounts.insert("acct-rusty", "Rusty-03", 7, &.{ 1, 2, 3, 4 });
try accounts.insert("acct-sparky", "Sparky-11", 11, &.{ 0, 3, 6, 9, 12 });

This now gives us an efficient way to build an entry of Accounts. Using an arena allocator we avoid fragmentation and make it fast to allocate and free everything at once. In a real scenario these accounts would likely be loaded from a file, database, or network request, and there would be other considerations for how to efficiently read and load the data, I am avoiding any of those details for this example.

The main idea with arena allocators is to allocate chunks of memory for which everything can be allocated and freed together. In this example the id strings, owner strings, data slices, and the internal structure of the HashMap are all allocated in shared chunks. This means even if there are thousands of accounts we only need to allocate a few chunks from the main/global allocator, and when we need to free everything there are only a few deallocs to perform.1

Notice with Zig we didn’t have to change our types much:

  • HashMap: uses the explicitly passed-in arena allocator for its internal structure
  • Account: arena provides allocation for slices to be copied into

In another context we could use an Account with a different allocator, mutate its contents, etc. This is helpful to avoid needing to declare new types and maintain compatibility across functions consuming these types, but it can also be somewhat tricky to work with as the types do not indicate how the memory is managed.

Complete Zig Code

We can add a reference counter, and make SelectedAccount hold onto a pointer back to Accounts for reference counting, and we have everything for our pattern.

Here is the complete Zig code (minus the ReferenceCounter implementation):

const std = @import("std");
const ReferenceCounter = @import("ReferenceCounter.zig");

const Account = struct {
    owner: []u8,
    data: []u8,
    index: u64,
};

const SelectedAccount = struct {
    id: []const u8,
    account: Account,
    source: *Accounts,

    pub fn acquire(self: SelectedAccount) SelectedAccount {
        return .{ .id = self.id, .account = self.account, .source = self.source.acquire() };
    }

    pub fn release(self: SelectedAccount) void {
        self.source.release();
    }
};

const Accounts = struct {
    arena: std.heap.ArenaAllocator,
    slot: u64,
    map: std.StringHashMapUnmanaged(Account),
    rc: ReferenceCounter,

    pub fn init(backing_allocator: std.mem.Allocator, slot: u64) !*Accounts {
        const self = try backing_allocator.create(Accounts);
        self.* = .{
            .arena = std.heap.ArenaAllocator.init(backing_allocator),
            .slot = slot,
            .map = .{},
            .rc = .init,
        };
        return self;
    }

    pub fn insertAccount(
        self: *Accounts,
        id: []const u8,
        owner: []const u8,
        index: u64,
        data: []const u8,
    ) !void {
        const alloc = self.arena.allocator();
        try self.map.put(alloc, try alloc.dupe(u8, id), .{
            .owner = try alloc.dupe(u8, owner),
            .data = try alloc.dupe(u8, data),
            .index = index,
        });
    }

    pub fn len(self: *const Accounts) usize {
        return self.map.count();
    }

    pub fn acquire(self: *Accounts) *Accounts {
        std.debug.assert(self.rc.acquire());
        return self;
    }

    pub fn release(self: *Accounts) void {
        if (self.rc.release()) {
            const backing = self.arena.child_allocator;
            self.arena.deinit(); // frees everything at once
            backing.destroy(self);
        }
    }

    /// Acquire a ref and project a single account as a SelectedAccount.
    pub fn acquireProject(self: *Accounts, id: []const u8) ?SelectedAccount {
        const entry = self.map.getEntry(id) orelse return null;
        return .{
            .id = entry.key_ptr.*,
            .account = entry.value_ptr.*,
            .source = self.acquire(),
        };
    }
};

ReferenceCounter API used here should be self-documenting and is not important for this example, but the examples repository has the full implementation if you’re curious.

With just this little bit of code in Zig we have everything we wanted:

  • Accounts entry is fast to create and destroy (using arena allocator)
  • acquireProject performs substructure projection, with lifetime of projection tied to the reference count, allowing for automatic cleanup with sharing across threads

Building in Rust

Rust’s std library doesn’t have an arena allocator, and it does not support this type of self-referential struct without unsafe. So we’ll pull in some useful crates to do the heavy lifting:

  • bumpalo for arena allocation (and hashbrown for a hash map that works with it)
  • self_cell for self-referential struct support
  • yoke for projecting substructure borrows with reference counting

And we’ll need to use unsafe to get Send/Sync support for sharing across threads with bumpalo.

I’ll give a quick example of each crate to show how things are built up before putting everything together.

Bumpalo

The bumpalo crate is equivalent to Zig’s std.heap.ArenaAllocator, but incorporates Rust’s compile time safety guarantees around lifetimes and threads.

Here’s basic usage of the bumpalo arena, using a hashbrown HashMap that supports the allocator trait implemented by bumpalo.

let arena = bumpalo::Bump::new();
let mut map = hashbrown::HashMap::new_in(&arena);

let id = arena.alloc_str("acct-sparky");
let owner = arena.alloc_str("Sparky-11");
let data = arena.alloc_slice_copy(&[0, 3, 6, 9, 12]);

map.insert(id, Account { owner, data, index: 11 });

let account = map.get("acct-sparky").unwrap();
assert_eq!(account.owner, "Sparky-11");

The Bump type is the arena, and it implements the same growing chunked list structure that the Zig std.heap.ArenaAllocator uses (with some minor differences for the growth coefficient and where the metadata is stored in the chunks). Under the hood it’s a list of allocated memory chunks, with each progressive chunk growing exponentially in size similar to a Vec in Rust (or std.ArrayList in Zig) to amortize the cost.

Note that it never reallocates memory as it grows like a Vec would because all of the memory locations must remain stable to avoid invalidating pointers already allocated by the arena.

Self Cell

self_cell makes it easy to define self-referential structs via a macro. Under the hood it performs a single heap allocation to ensure the memory location is stable for the struct to avoid dangling self-references if the struct is moved.

Here’s an example where we store the Bump arena alongside a string allocated by the arena.

// Type alias required for identifier in macro
type ArenaStr<'a> = &'a str;

// Macro defines the new self-referential type
self_cell::self_cell! {
    struct StringCell {
        owner: bumpalo::Bump,

        #[covariant]
        dependent: ArenaStr,
    }
}

// Construct and use the self-referential type
let cell = StringCell::new(bumpalo::Bump::new(), |arena| arena.alloc_str("hello arena"));
let value = cell.borrow_dependent();
assert_eq!(*value, "hello arena");

Without self_cell, another crate, or the use of unsafe, it would be impossible to define this structure in Rust. The str borrows data from the Bump, and both exist as fields within the same structure.

For more on Rust dealing with self-referential structs see the self_cell crate docs and this deeper dive post.

Yoke

yoke is a great crate I have used in the past to make zero-copy deserialization of JSON bytes shareable across threads. I’ll use that as the example here.

// Type for serde to deserialize as borrowed data and support use in yoke
// Note `Cow` is needed here as serde has to allocate instead of borrowing directly from JSON bytes
// if the strings have escapes; e.g., `"hello world\n"` bytes cannot be directly borrowed as a `str`
#[derive(Debug, serde::Deserialize, yoke::Yokeable)]
struct UserProfile<'a> {
    #[serde(borrow)]
    id: std::borrow::Cow<'a, str>,
    #[serde(borrow)]
    owner: std::borrow::Cow<'a, str>,
}

// Type that can be used like a `Arc<UserProfile<'static>>`, so it can be shared across threads
type SharedUserProfile = yoke::Yoke<UserProfile<'static>, std::sync::Arc<[u8]>>;

// The JSON bytes as Arc<[u8]> for simplicity; in my past use case the bytes were read from the
// network and loaded into a `Bytes` type (from the `bytes` crate), which is a shareable
// ref-counted buffer of immutable bytes
let bytes: std::sync::Arc<[u8]> =
    std::sync::Arc::from(&br#"{"id":"acct-sparky","owner":"Sparky-11"}"#[..]);

// Deserialize and construct the type
let profile: SharedUserProfile = yoke::Yoke::attach_to_cart(bytes, |bytes| {
    serde_json::from_slice::<UserProfile<'_>>(bytes).expect("deserialize should work")
});

let value = profile.get();
assert_eq!(value.id.as_ref(), "acct-sparky");
assert_eq!(value.owner.as_ref(), "Sparky-11");

The “cart” of bytes acts as a stable backing store for the deserialized UserProfile struct to borrow from.

If you want to learn more about yoke, I highly recommend the blog post explaining background (slightly outdated regarding GATs) and the yoke design doc.

Putting It Together

Here’s our example fully put together in Rust:

use std::sync::Arc;

use bumpalo::Bump;
use hashbrown::{DefaultHashBuilder, HashMap};
use yoke::{Yoke, Yokeable};

#[derive(Clone, Copy, Debug, Yokeable)]
pub struct Account<'a> {
    pub owner: &'a str,
    pub data: &'a [u8],
    pub index: u64,
}

#[derive(Debug)]
pub struct Accounts<'a> {
    slot: u64,
    map: HashMap<&'a str, Account<'a>, DefaultHashBuilder, &'a Bump>,
}

self_cell::self_cell!(
    struct AccountsCell {
        owner: Bump,

        #[covariant]
        dependent: Accounts,
    }
);

impl<'a> Accounts<'a> {
    fn new(slot: u64, arena: &'a Bump) -> Self {
        Self {
            slot,
            map: HashMap::new_in(arena),
        }
    }
}

pub struct AccountsBuilder {
    inner: AccountsCell,
}

impl AccountsBuilder {
    pub fn new(slot: u64) -> Self {
        Self {
            inner: AccountsCell::new(Bump::new(), |arena| Accounts::new(slot, arena)),
        }
    }

    pub fn insert_account(&mut self, id: &str, owner: &str, index: u64, data: &[u8]) {
        self.inner.with_dependent_mut(|arena, accounts| {
            let id = arena.alloc_str(id);
            let owner = arena.alloc_str(owner);
            let data = arena.alloc_slice_copy(data);

            accounts.map.insert(id, Account { owner, data, index });
        });
    }

    pub fn finish(self) -> FrozenAccounts {
        FrozenAccounts { inner: self.inner }
    }
}

pub struct FrozenAccounts {
    inner: AccountsCell,
}

// SAFETY: `Bump` is `Send`, and the `FrozenAccounts` API never exposes the underlying `Bump`
// and only returns immutable borrowed views into the frozen data.
unsafe impl Send for FrozenAccounts {}
// SAFETY: Same reasoning as above. Shared access is read-only and all exposed data is `Sync`.
unsafe impl Sync for FrozenAccounts {}

impl FrozenAccounts {
    pub fn slot(&self) -> u64 {
        self.inner.with_dependent(|_, accounts| accounts.slot)
    }

    pub fn len(&self) -> usize {
        self.inner.with_dependent(|_, accounts| accounts.map.len())
    }

    pub fn is_empty(&self) -> bool {
        self.inner.with_dependent(|_, accounts| accounts.map.is_empty())
    }

    fn get_account(&self, id: &str) -> Option<AccountKeyValue<'_>> {
        self.inner.with_dependent(|_, accounts| {
            // Need `get_key_value()` here, not just `get()`, because `AccountKeyValue` borrows
            // the arena-backed key stored in the map, not the shorter-lived `id` argument.
            let (id, account) = accounts.map.get_key_value(id)?;
            Some(AccountKeyValue(id, *account))
        })
    }
}

#[derive(Clone, Copy, Debug, Yokeable)]
struct AccountKeyValue<'a>(&'a str, Account<'a>);

#[derive(Clone)]
pub struct SelectedAccount(Yoke<AccountKeyValue<'static>, Arc<FrozenAccounts>>);

impl SelectedAccount {
    pub fn id(&self) -> &str {
        self.0.get().0
    }

    pub fn account(&self) -> Account<'_> {
        self.0.get().1
    }

    pub fn slot(&self) -> u64 {
        self.0.backing_cart().slot()
    }

    pub fn backing_cart(&self) -> &Arc<FrozenAccounts> {
        self.0.backing_cart()
    }
}

pub fn project_account(accounts: Arc<FrozenAccounts>, id: &str) -> Option<SelectedAccount> {
    // Looks a bit weird, but we use `try_attach_to_cart()` to propagate None if the account
    // doesn't exist, in order to have Option<Yoke<_, _>> instead of Yoke<Option<_, _>, _>.
    Yoke::try_attach_to_cart(accounts, |cart| cart.get_account(id).ok_or(()))
        .ok()
        .map(SelectedAccount)
}

There’s quite a bit going on here, so here’s the general idea:

  • AccountsCell is our self-referential struct with the Bump arena allocator
  • AccountsBuilder makes it easy to incrementally build the AccountsCell and then freeze it
  • FrozenAccounts is able to implement Send and Sync since it hides the inner type to prevent access to Bump which is !Sync
  • AccountKeyValue is the account’s borrowed (key, value) pair implementing Yokeable, this avoids needing to re-lookup the account in the hash map
  • SelectedAccount represents the borrowed projection with reference counting from the same Arc that wraps FrozenAccounts

unsafe is needed to deal with Bump being !Sync, this is because the Bump API allows mutating itself via &Bump without any thread synchronization, fortunately the safety is fairly easy to reason about: Bump is Send (can be sent across threads), and the references it returns after allocating are Sync if the referenced types themselves are Sync.

Note that if Bump was !Send then we could not implement Send on FrozenAccounts! Hiding the Bump behind the API of FrozenAccounts is not enough to uphold !Send requirements, but it does uphold the !Sync requirements. Also if the types allocated by the Bump were !Sync, then we would not be able to implement Sync on FrozenAccounts.

Some things of note with this approach in Rust:

  • We should have the exact same optimization benefits as we did with the Zig code
  • We must define dedicated view types, e.g., Account<'a> rather than Account, to track lifetimes for the compiler (an “owning” Account would use String or Box<str> instead of &str, etc.)
  • We maintain Rust’s compile time safety guarantees with minimal need for unsafe thanks to the safe APIs of the crates
  • The coding pattern would be easy to follow and apply to other data shapes by defining the types needed
  • We maintain Rust ownership patterns and utilize Drop for cleanup (decrementing reference count), no manual deinit pattern is required

I think the biggest downside to this approach is the need to define explicit view types with lifetimes. In a real code scenario you may need to define traits to abstract reading across both owned and borrowed types, or have owning types implement functions that return the same view types used by the cache. Zig does not have this downside as the memory management is external to the types.

The major upside to this Rust approach is that it is very safe compared to the Zig code. The one place that requires reasoning about safety for unsafe is very contained, and the actual usage of the types after that is all safe Rust.

Some Final Thoughts

There are some fairly elegant solutions to optimizing things in Rust, even things that typically would go against the grain of Rust’s ownership model and borrow checker. It requires a bit of deeper exploration and learning.

Now with LLMs for code generation I think these types of patterns (or patterns of types), have become more ergonomic. LLMs are great at generating these repeated structural patterns when given an example to start with, and Rust’s compile time guarantees can do most of the heavy lifting to verify the types generated.

When using a language like C, or Zig, you’re often forced to deeply consider the architecture required to solve the problem being tackled because everything is low level in what you must manage, and the risks are catastrophic (double free, use after free, buffer underflow/overflow, etc.). That forced attention to memory management can lead to elegantly optimized solutions as the code structure more aligns with the problem. Or it can lead to spaghetti code with memory bugs if attention is lacking.

Rust can be challenging for these memory optimization problems, it requires low-level knowledge of what is important to optimize and how to optimize it, and the Rust-specific knowledge of how to implement those optimizations in a way that is still safe, correct, and ideally ergonomic.

My Rust experience and knowledge has been hugely beneficial in learning and using Zig, and now Zig is inspiring me to push Rust to do more.


  1. For more on arena allocators I recommend reading the bumpalo crate docs and the source code for Zig’s std.heap.ArenaAllocator; both of these implementations follow the same list of growing chunks approach.