The design of modern hardware components such as processors and accelerators is a multidisciplinary effort at the intersection of hardware and software development. Hardware-software co-design is a challenging task that needs actionable data to identify bugs and bottlenecks. Renode, Antmicro’s open source simulation framework, enables pre-silicon HW/SW co-design for complete SoCs such as OpenTitan, with a fully controllable environment including cores and I/O blocks, capable of running binary-compatible software and providing complete insight into its execution.
Recently, in a joint effort with the Google AmbiML team, Antmicro added Renode support for trace-based simulation through a dedicated tool called Trace Based Model (TBM). This allows you to quickly measure the efficiency of microarchitectural choices in tandem with your code to swiftly identify and address design issues. The goal behind the Renode-TBM integration was to generate TBM-compatible data from a Renode simulation to gather meaningful metrics and enable automated testing and benchmarking in a continuous integration pipeline, thus shortening the feedback loop for HW/SW co-design.
Benchmarking hardware using execution traces
Trace-based simulation is a methodology that uses execution traces to predict the performance of the system, which in the case of TBM is based on data provided by an external simulator such as Renode. After simulating your hardware design and executing the software, Renode produces a rich set of logs that can be fed into TBM to provide information about performance and potential bottlenecks.
Although Renode is a functional and not a cycle-accurate simulator, its deterministic and fully observable nature provides access to useful metrics about hardware usage that can be used to optimize execution. Renode’s execution tracing capabilities allow you to simulate a complete system of interconnected components such as CPUs, memory, and a wide range of buses, generating extensive execution traces of unmodified binaries without changing the behavior of the simulation.
To enable the generation of TBM-compatible tracing data, we added memory access tracing to Renode, which is essential for determining the type of memory access hits. In addition, TBM has driven efforts to extend the existing support for RISC-V vector instructions in Renode to capture additional information and enable benchmarking of Renode’s RVV implementation.
Generating TBM-compatible traces in Renode
The work described here spanned a long period where Renode was being used in the real-world development of an upcoming SoC. The initial implementation of TBM, which was used in the earlier stages of the project, required you to use the gentrace-renode Python script to convert Renode traces to a TBM-compatible format, but now we have integrated TBM trace generation directly into Renode to allow a more automated workflow.
To generate TBM-compatible traces, run the following commands in the Renode Monitor:
(machine-0) cpu CreateExecutionTracing "tracer" @renode.trace TraceBasedModel true (machine-0) tracer TrackMemoryAccesses (machine-0) tracer TrackVectorConfiguration
The traces are saved to a file specified in the command when the simulation is finished. Assuming you have all the project requirements installed, now you can feed the execution tracing data to TBM to generate the report using:
python3 tbm/tbm.py -u config/rvv-simple.yaml --print-trace detailed --report-dont-include-cfg --report out_renode_tbm_report.txt --verbose ~/renode-portable/renode.trace
The report generated by TBM will contain the following information:
*** cycles: 1171 *** retired instructions per cycle: 0.85 (1000) *** retired / fetched instructions: 0.25 *** branch count: 330 *** scalar load/store stall rate: 1.50 stalls per-instruction *** stall cycles: SC: 14% (172) FE: 14% (172) *** instructions per cycle: lsu0.eiq: 0.28 (327) lsu0.pipe: 0.28 (327) lsu0.wbq: 0.00 (0) alu0.eiq: 0.29 (338) alu0.pipe: 0.29 (341) alu0.wbq: 0.29 (341) branch0.eiq: 0.28 (328) branch0.pipe: 0.28 (330) branch0.wbq: 0.00 (1) csr0.eiq: 0.00 (0) csr0.pipe: 0.00 (2) csr0.wbq: 0.00 (2) S: 0.85 (1000) V: 0.00 (0) FE: 3.37 (3952) *** utilization: lsu0.eiq: 38% (327) lsu0.pipe: 56% (327) lsu0.wbq: 0% (0) alu0.eiq: 37% (338) alu0.pipe: 29% (341) alu0.wbq: 15% (341) branch0.eiq: 98% (328) branch0.pipe: 28% (330) branch0.wbq: 0% (1) csr0.eiq: 0% (0) csr0.pipe: 0% (2) csr0.wbq: 0% (2) S: 53% (1000) V: 0% (0) FE: 60% (3952)
Automated HW/SW co-design benchmarking
This workflow can be further automated by a CI pipeline which simulates your hardware using Renode script and instantly generates a report from the execution:
apt update apt install -y python3 python3-pip git wget cmake cd ~ git clone https://github.com/google/flatbuffers.git cd flatbuffers/ cmake -G "Unix Makefiles" make -j cd ~ wget https://dl.antmicro.com/projects/renode/builds/renode-latest.linux-portable.tar.gz mkdir -p renode-portable tar -zxf renode-latest.linux-portable.tar.gz -C ~/renode-portable --strip-components=1 ~/renode-portable/renode \ --console \ --disable-xwt \ -e 'i @scripts/single-node/hifive_unmatched-tbm.resc' \ -e 'emulation RunFor "0.00001"' \ -e 'q' git clone https://github.com/AmbiML/trace-based-model.git cd ~/trace-based-model/ pip install -r requirements.txt ~/flatbuffers/flatc -o tbm --python config/instruction.fbs python3 tbm/tbm.py -u config/rvv-simple.yaml --print-trace detailed --report-dont-include-cfg --report out_renode_tbm_report.txt --verbose ~/renode-portable/renode.trace cat out_renode_tbm_report.txt
Renode’s advanced tracing capabilities
Renode’s tracing capabilities are not limited to TBM-compatible data. Some of the features allow you to get additional data during the simulation itself, e.g., logging executed function names with `cpu LogFunctionNames true` or accessing peripherals with `sysbus LogPeripheralAccess <peripheral-name> true`. In pre-silicon development, you can generate Renode traces compatible with the RISC-V DV framework for SoC design and verification. You can also use Renode’s built-in Metrics Analyzer to get metrics like executed instructions, memory accesses or exceptions in the form of an easy-to-read graph.
The main execution tracing functionality in Renode, executed using `EnableExecutionTracing`, supports several modes that let you track information like program counter values or executed opcodes, which can be converted to human-readable instruction names with all used arguments using Renode’s built-in LLVM-based disassembler.
Automate HW/SW benchmarking with Renode
Renode is a flexible tool that can be easily integrated into an existing workflow to develop both hardware and software in a repeatable, simulated environment. If you are interested in simulating your next hardware design and testing its performance in an automated CI pipeline, contact Antmicro at contact@antmicro.com.