Skip to main content
Blog

How to Speed Up the Emulating Process with Pydrofoil | Carl Friedrich and Matti Picus

The RISC-V Golden model, also called the Sail model, defines the instruction execution of the RISC-V architecture. As such, it is useful for evaluating the correct execution of RISC-V software. The RISC-V technical community has been working on a project called Pydrofoil that 

accelerates the process of generating emulators by taking a Sail model and producing RPython code. Improving the performance of the simulation by using Pydrofoil allows for larger and more complicated software programs to be accurately evaluated and debugged. 

We appreciate the time and effort that John Witulski, Martin Berger and Matti Picus have spent developing this project for the benefit of the entire RISC-V ecosystem. The authors would like to thank RISC-V International for the financial support we received to fund a part of this effort.

In this post we’ll dive into the purpose of Sail, describe how to use Pydrofoil, and show some early benchmark results demonstrating Pydrofoil’s effectiveness.

Sail and the Sail RISC-V model

The Sail language is a domain-specific language developed at the University of Cambridge. The goal of the language is to be able to specify the semantics of instruction set architectures (ISAs) in a very precise way to be able to formally reason about the behavior of machine code programs.

The language is a statically typed functional language with pattern matching and some support for liquid types. It contains a number of datatypes that are particularly suited for the task of describing the behavior of ISAs, such as bit vectors, arbitrary-width integers, and real numbers.

A number of ISAs have been specified in this language, but we’ll only talk about the one for RISC-V (Sail-RISCV) in this post. The Sail ISA specification for RISC-V is interesting because it is supposed to be considered the “Golden Model” for RISC-V, meaning it is the ground truth of how the architecture is supposed to behave.

Getting Started with the Sail RISC-V Model

To get started, we’ll go through an example of the Sail code for the ITYPE RISCV instruction. ITYPE instructions are arithmetic integer instructions with an immediate value. 

The first part of the Sail model is an “AST” declaration, which shows the information that is encoded in such an instruction:

/* the assembly abstract syntax tree (AST) clause for the ITYPE instructions */

union clause ast = ITYPE : (bits(12), regbits, regbits, iop)

The bits(12) type is a bit vector of 12 bits. Storing the immediate regbits is declared elsewhere to be bits(5). The two regbits components are for the source and the destination register. iop is an enumeration that lists the various ITYPE instruction cases:

enum iop = {RISCV_ADDI, RISCV_SLTI, RISCV_SLTIU,

            RISCV_XORI, RISCV_ORI, RISCV_ANDI}    /* immediate ops */

The following function describes a mapping between that iop enum and the bits used in the instruction encoding. This is a bidirectional function (it uses <->) that can be used to both turn the instruction bits into the enum (when decoding an instruction). It also works the other way around:

/* the encode/decode mapping between AST elements and 32-bit words */
mapping encdec_iop : iop <-> bits(3) = {

  RISCV_ADDI  <-> 0b000,

  RISCV_SLTI  <-> 0b010,

  RISCV_SLTIU <-> 0b011,

  RISCV_ANDI  <-> 0b111,

  RISCV_ORI   <-> 0b110,

  RISCV_XORI  <-> 0b100

}

Decoding of RISC-V ITTYPE Instructions 

The next step in the Sail RISC-V model is the actual decoding of ITYPE instructions. It describes how the instruction bits are combined to form an ITYPE, and how those parts relate to the information in the ITYPE AST:

mapping clause encdec = ITYPE(imm, rs1, rd, op) <-> imm @ rs1 @ encdec_iop(op) @ rd @ 0b0010011

Here the @ operator is bit vector concatenation. encdec is again a bidirectional function so it can be used to decode an instruction or also used to take an AST and turn it into one. The next function defines the actual execution semantics of the instructions:

 

/* the execution semantics for the ITYPE instructions */
function clause execute (ITYPE (imm, rs1, rd, op)) = {

  let rs1_val = X(rs1);

  let immext : xlenbits = EXTS(imm);

  let result : xlenbits = match op {

    RISCV_ADDI  => rs1_val + immext,

    RISCV_SLTI  => EXTZ(rs1_val <_s immext),

    RISCV_SLTIU => EXTZ(rs1_val <_u immext),

    RISCV_ANDI  => rs1_val & immext,

    RISCV_ORI   => rs1_val | immext,

    RISCV_XORI  => rs1_val ^ immext

  };

  X(rd) = result;

  true

}

The code reads the value of source register rs1, then sign-extends the immediate to xlenbits (which could be 32, 64 or 128 bits) using the EXTS function. Then it pattern-matches on the op to compute the result. That resulting value is then stored into the destination register rd.

Turning the Sail RISC-V ISA model into an emulator

One of the features of the Sail Domain Specific Language (DSL) is that it can take an ISA specification and turn it into an emulator for that ISA. To do this, it compiles the Sail code into C, adds a bit of hand-written C support and glue code, and uses a C compiler to produce a binary.

The Sail-RISC-V emulator produced in this way can execute the RISC-V test suite which checks that the instructions are implemented correctly. It is also complete enough to boot Linux up to a login prompt.

One downside of an emulator generated this way is that it is relatively slow and only emulates between approximately 1000-10,000 instructions per second -which is not especially great considering that hand-written emulators like Spike and QEMU can run hundreds of millions instructions per second under certain conditions. This slow emulation speed can be a problem when trying to test somewhat complex software in emulation. For example, booting the Linux kernel takes over an hour on the Sail-RISC-V emulator.

Pydrofoil’s Architecture

The idea of Pydrofoil is to generate emulators from a Sail model in a different way. Instead of generating C code, Pydrofoil takes a Sail model and produces RPython code. In many ways, the ideas of the RPython project and the Sail projects overlap.

Sail is a language to describe the semantics of an ISA in a high-level manner. Sail provides a common infrastructure, such as libraries support for formal proofs and a way to generate an emulator. RPython is another language to describe semantics of dynamic languages in a high-level way. RPython provides common infrastructure such as a reasonably good garbage collector and a reusable just-in-time compiler. Therefore combining the two projects is a natural fit.

Pydrofoil generates RPython code by parsing Sail’s JIB representation (JIB helps to build containers). The Sail JIB is an intermediate language that the Sail compiler uses to represent the input programs. When producing JIB the Sail programs have already been parsed, type-checked and been optimized by the Sail compiler. Therefore Pydrofoil does not have to do these tasks. Pydrofoil parses the JIB files, does some minor transformations and then produces RPython code from them.

This RPython code is then combined with custom support code written in RPython. Most of that support code can be shared between different ISAs though some needed to be hand-written for RISC-V.

The speedups come from the following sources:

  • RPython’s tracing JIT can be used to perform dynamic binary translation from the guest RISC-V instructions that are running on top of the generated emulator to host machine code at runtime.
  • More efficient dynamically typed bitvector/integer representation (We’ll cover this in a later post).

Downloading Pydrofoil and booting Linux

The RISC-V International technical community now offers pre-built pydrofoil-riscv emulators at link. These emulators are built according to the build documentation and are available for x86_64 Linux and macOS. They can be use as follows to boot Linux from the Sail-RISCV repo:

dtc < os-boot/rv64-64mb.dts > os-boot/rv64-64mb.dtb
./pydrofoil-riscv -b os-boot/rv64-64mb.dtb os-boot/rv64-linux-4.15.0-gcc-7.2.0-64mb.bbl -l 230000000

This command will run the Linux image that is part of the sail-riscv repo until the login prompt. The dtb file is a device tree blob that describes the emulated hardware to the operating system. It is generated from a human-readable input file with the dtc command. Booting Linux takes a bit less than 2 minutes on Pydrofoil on a Ryzen 9 3900X. You can try the equivalent command on the standard Sail emulator which takes roughly 35 minutes.:

 

./c_emulator/riscv_sim_RV64 -b os-boot/rv64-64mb.dtb os-boot/rv64-linux-4.15.0-gcc-7.2.0-64mb.bbl -l 230000000 -V

What’s Next 

It was a pleasure to collaborate on applying the RPython meta-JIT for a different area of application than a typical dynamically typed language. The model works quite well in this new area, as well as in Pydgin and PyMTL3 which are previous work in the same general area. The work on Pydrofoil also was part of the motivation for the recent Z3 JIT bug-finding work due to Pydrofoil’s role in pushing RPython’s JIT in less commonly used directions.

There are many features that we hope to implement in the near future. For example, currently floating point instructions aren’t supported in Pydrofoil. We also have many plans for further optimizations specific to RISC-V in Pydrofoil and also a number of generally useful ones for the JIT, which would benefit Python performance as well.