
TestRIG (Testing with Random Instruction Generation) is a testing framework for RISC-V implementations. The RISC-V community has standardized a formal model of the architecture in the Sail language, giving a human-readable specification that can also be used for simulation and verification. Ideally, a RISC-V implementor could formally prove equivalence between their implementation and the Sail model, but proof tools are not yet sufficiently automated to be routinely used on the whole- processor level. They instead focus on equivalence between combinational logic functions, with verification of a full out-of-order microarchitecture remaining an open problem. As a pragmatic compromise, we use TestRIG to check equivalence between the model and an implementation by generating random instruction sequences, executing the same sequences on the model and the implementation under test, and comparing execution traces (tandem execution).
This approach does not prove equivalence because it cannot cover the entire space of all possible permutations. However, this approach can demonstrate divergence, and is usable in all stages of development. The DII interface allows fine-grain control over test sequences allowing test case shrinkage, which is very helpful during test-driven development to quickly pinpoint bugs.
In this article, we will delve into the architecture of TestRIG, its test generation, and the trace format. Finally, we will show practical applications of the TestRIG framework resulting in rapid prototyping of CHERI implementations.
TestRIG

Figure 1: An illustrative example of the TestRIG ecosystem with a Verification Engine communicating with any two RISC-V implementations over sockets. The Verification Engine injects instruction sequences and compares the execution traces until it finds a divergence.
Figure 1 gives an overview of the modular TestRIG ecosystem. In TestRIG, an interactive Verification Engine (VEngine) stimulates RISC-V implementations over RVFI-DII interfaces, which we will look at later in this article. An RVFI-DII compatible RISC-V implementation can reset, consume instruction sequences, and report execution traces via its RVFI-DII interface. A VEngine can drive one or more RVFI-DII compatible implementations; a VEngine might have an internal RISC-V model, or could drive two independent implementations and compare their RVFI traces, as we have done with QCVEngine, which is discussed later. Apart from that, VEngine instruction sequences could be loaded from disk, generated randomly, or produced with interactive architecture-driven state-space exploration.
The RVFI-DII bytestream interface allows models and implementations written in various languages to communicate through widely supported networking sockets. The TestRIG project spans multiple different languages. QCVEngine, the main verification engine we used, is written in Haskell, and the Sail RISC-V model is written in the Sail domain-specific language (either interpreted by an OCaml program or compiled into C). Spike and QEMU are RISC-V simulators written in C and C++. TestRIG also supports a wide range of hardware implementations, spanning from 32-bit Ibex (SystemVerilog) up to 64-bit out-of-order superscalar Toooba (Bluespec).”
RVFI-DII
To participate in the TestRIG verification ecosystem, implementations must be extended with RVFI-DII instrumentation. To ease development, we provide implementations of data structures and libraries in several languages to facilitate RVFI-DII connections over TCP ports.
The RISC-V Formal Interface (RVFI) is an existing trace format originally intended for formal verification. RVFI exposes select architecturally significant signals such as the instruction encoding and any memory address or value, as well as the indices and values of the operand and writeback registers. We use the RVFI output to sample the architectural state after executing an instruction. This allows us to compare whether two implementations agree on the result of an instruction sequence.
TestRIG extends RVFI with Direct Instruction Injection (DII). DII is for instruction input, RVFI is for trace output, and RVFI-DII supports full interactive verification, which is discussed below. Interactive verification enables automated simplification and shrinking. Existing RISC-V cores that implement RVFI can be augmented to participate in the TestRIG ecosystem by implementing DII, and conversely RVFI-DII designs may benefit from RVFI formal verification tooling.
QuickCheck VEngine
Our TestRIG Verification Engine, QCVEngine, leverages Haskell’s QuickCheck library[1]. Due to the simplicity of Direct Instruction Injection execution, which decouples the instruction stream from control flow, QCVEngine can use unmodified QuickCheck utilities to generate, compare, and shrink instruction sequences.
QuickCheck receives a function with a pass/fail return value, and generates inputs in search of a failure. To facilitate this, we construct a function that receives a list of instructions, sends these over two DII sockets, collects RVFI traces back from these sockets, asserts that they match, and returns the result. We then provide a set of generators of arbitrary instruction sequences that are used by QuickCheck to produce inputs to this function.
We use convenience functions to define instructions in a syntax closely resembling the RISC-V ISA manual, and provide tailored generators for each instruction field to promote register reuse. QuickCheck automatically discovers and uses these generators through the type system and uses them to construct arbitrary instruction sequences. We also provide targeted generators for simple subsets of the instruction set, as well as generators that leverage templates of varying complexity to reach deeper states, including virtual memory mappings and cache conflicts.
While Direct Instruction Injection allows us to primarily rely on QuickCheck’s builtin shrinking strategies, we augmented these with smart shrinking functions that not only eliminate instructions, but intelligently transform them to simplify the sequence.
Once a counterexample is found by QCVEngine, QuickCheck uses a builtin list-shrinking function that removes sequences from the list and tests again, hoping to eliminate instructions with no relevance to the errant behavior. Illustratively, here is an initial counterexample found for an artificial hardware bug where the LSB of the add instruction’s result (but not addi’s) is stuck at zero:
addi x7, x4, 123 # Generate odd immediate
addi x5, x3, 42 # Generate even immediate
addi x6, x7, 0 # Move x7 to x6
xori x1, x5, 745 # Irrelevant
add x1, x5, x6 # Perform buggy add
Our augmented shrinking results in:
addi x7, x4, 123 # Generate odd immediate
add x1, x5, x7 # Perform buggy add
Note that this implies not only deleting instructions, but also renaming registers. In practice, our initial sequences contain thousands of instructions, and usually shrink down to ten or fewer!
We also add a library of simplifications to be used during shrinking. These eliminate esoteric instructions from the trace that perform mundane functions and distract from the root cause of the failure. For example, memory operations often trap; thus, we might attempt to simplify a memory operation to an ecall, an instruction that only traps, to make the error more obvious.
Instruction traces can be converted to (and from) a human-readable format both for terminal reporting, and for reading and writing trace files – individually or in bulk from a directory. This has enabled us to collect a library of regression tests to quickly check all previous counterexamples. Unlike hand-written tests with assertions, these do not require maintenance, as expected behavior updates naturally with the model as the instruction set evolves. We have also used this feature to replay recorded test-suite examples (including riscv-tests[2] and RISCV-DV[3]), adding full trace-equivalence checking with shrinking. This feature has also allowed us to capture traces of an operating system booting on the model implementation, which we could then use to aid bring-up of the same operating system on implementations, with instruction shrinking rapidly highlighting any problems.
Sequences can include assertions – e.g., that the value written by the previous instruction was non-zero. These make it possible to fail without a divergence. Unusually, sequences with assertions do not require tandem verification to discover a failure, and we have used these to test the limits of implementation-defined behavior.
Evaluation

Figure 2: Counterexample size complexity comparing QCVEngine to riscv-tests and riscv-dv.
Short counterexamples greatly simplify debugging efforts. Our archive of QCVEngine traces comprises thousands of shrunken counterexamples discovered during development of our CHERI processor extensions. Figure 2 shows the distribution of our archive of counterexample lengths versus riscv-test and RSICV-DV trace lengths, which do not allow shrinking. The median value is 3 instructions, and the third quartile is only 5 instructions. The median for riscv-tests 561 instructions, which is more than 3 times the maximum counterexample found by QCVEngine, and the median RSICV-DV sequence is 15339. QCVEngine’s small counterexample size is facilitated by Direction Instruction Injection and smart shrinking as described previously. Single digit counterexample length greatly accelerated our discovery of failures and development of fixes compared to even a traditional unit test suite.
TestRIG for CHERI
TestRIG is not only utilized for the base RISC-V ISA, but also for future extensions. We extensively used TestRIG for multiple generations of the RISC-V CHERI ISA leading to the current draft specification proposed by the CHERI TG[4].
Capability Hardware Enhanced RISC Instructions (CHERI) is a security extension of conventional Instruction Set Architectures that adds capabilities – unforgeable and bounded tokens. A capability is a fat pointer[5] containing the address and metadata including permissions and bounds information. Furthermore, validity of capabilities is ensured by a hidden tag. A capability authorizes access to a region of memory, and no data or instruction access is possible without a valid capability. Furthermore, all capability operations are monotonic and therefore cannot increase the privileges a capability grants. As a result, CHERI enforces spatial safety, enables temporal safety, and supports fine-grained software compartmentalization.
We extensively used TestRIG in the bring-up of various CHERI implementations, including implementations for the newly drafted RISC-V CHERI specification. For example, when developing the initial format for the compressed encoding of CHERI capabilities, we had a bug that unnecessarily cleared the tag of a pointer when setting an address that wrapped the address space. This bug was found with this shrunken counterexample:
cSetBoundsImmediate x3, x1, 1106 # Set a short bound
cIncOffsetImmediate x2, x4, -197 # Small negative integer
cSetAddr x4, x3, x2 # Set the integer as the address
cGetTag x1, x4 # XXX Untagged
While this case may have been covered in an extensive unit-test suite composed at significant effort, our TestRIG generator required only a list of CHERI instructions to produce a counterexample far more compact than most hand-written tests.
Conclusions
We have collated all the current TestRIG-compatible implementations and verification engines into the open-source TestRIG repository. This repository includes documentation that has been followed and improved multiple times by new users. TestRIG accelerates development at all stages, providing a tighter debugging loop than we have experienced in any other processor development paradigm. We expect TestRIG to lead the way to a standardized testing framework for RISC-V that leverages instrumentation of open implementations, to greatly simplify verification. Such a framework improves upon traditional instruction-set-level unit testing in every way, and subsumes specialized random test generators into a cohesive community of easy-to-use verification tools. TestRIG has paved the way for fast prototyping of new ISA-extensions like the proposed RISC-V CHERI extension.
Take a look at TestRIG and join our verification community!
Want To Showcase Your Work?
Share your project in the Learn Repository on GitHub! You might find future collaborators or an organization interested in working with you.
Need further instructions? Learn more here!
Meet the Authors

Franz Fuchs is finishing his PhD at the University of Cambridge, Department of Computer Science and Technology. His research interests span computer architecture, FPGA prototyping, automated testing, and security, with particular focus on speculative execution attacks and compartmentalization.

Peter Rugg is a Research Associate, continuing the work from his PhD at the University of Cambridge Department of Computer Science and Technology specializing in automated testing and efficient capability microarchitecture for RISC-V processors.

Simon Moore is a Professor of Computer Engineering at the University of Cambridge Department of Computer Science and Technology, where he conducts research and teaching in the general area of computer architecture with particular interests in secure and rigorously-engineered processors and subsystems.
Acknowledgements
We would like to thank Alexandre Joannou, Jonathan Woodruff, Marno van der Maas, Matthew Naylor, Michael Roe, Robert N. M. Watson, and Peter G. Neumann who co-authored the “Randomized Testing of RISC-V CPUs using Direct Instruction Injection” paper, which is the basis for this article[6]. Furthermore, we would like to acknowledge the efforts by the entire CHERI research group, which provided the foundation for the work presented in this article.
References
[1] Koen Claessen and John Hughes. QuickCheck: a lightweight tool for random testing of Haskell programs. Acm sigplan notices, 46(4), 2011.
[2] riscv-tests. https://github.com/riscv-software-src/riscv-tests. February 2025.
[3] RISCV-DV. https://github.com/chipsalliance/riscv-dvm. February 2025.
[4] RISC-V Specification for CHERI Extensions. https://github.com/riscv/riscv-cheri. February 2025.
[5] Trevor Jim et al. Cyclone: A safe dialect of C. In ATEC 2002, Berkeley, CA, USA. USENIX.
[6] Alexandre Joannou, Peter Rugg, Jonathan Woodruff, Franz A. Fuchs, Marno van der Maas, Matthew Naylor, Michael Roe, Robert N. M. Watson, Peter G. Neumann, and Simon W. Moore. “Randomized testing of RISC-V CPUs using direct instruction injection”. In: IEEE Design & Test 41.1 (Feb. 2024), pp. 40–49. doi: 10.1109/MDAT.2023.3262741.