The landscape of Machine Learning software libraries and models is evolving rapidly, and to satisfy the ever-increasing demand for memory and compute while managing latency, power and security considerations, hardware must be developed in an iterative process alongside the workloads it is meant to run.
With its open architecture, custom instructions support and flexible vector extensions, the RISC-V ISA offers an unprecedented capacity for such co-design. And by energizing the open hardware ecosystem, RISC-V has supercharged research and innovation into how to improve chipmaking itself to better leverage the methods and suit the needs of software. Initiatives such as Google’s OpenMPW Shuttle show how a more open and software-focused approach to building hardware (that Antmicro firmly believes in and helps Google push forward), are key to enabling a new wave of more powerful and transparent ML-focused solutions.
A RISC-V-based ML accelerator with a HW/SW co-design flow
In the past months, Antmicro has joined efforts with Google Research to work on a silicon project that’s not only interesting in itself but can also serve as a template for efficient hardware-software co-design. For their secure ML solution, the Google Research team supported by Antmicro has been developing a completely open source, rapid pre-silicon ML development flow using Renode, Antmicro’s open source simulation framework.
This builds on the result of our cooperation from last year in which Antmicro implemented Renode support for RISC-V Vector extensions which are used in the Google team’s RISC-V based ML accelerator code named Springbok. To allow a more well-rounded developer experience, as part of the project we are also working on improving the support for the underlying SoC as well as a large number of user oriented features such as OS-aware debugging, performance optimizations, payload profiling and performance measurement capabilities.
Springbok is part of Google’s AmbiML project that aims to create an open-source ML development ecosystem centered on privacy and security. By using the RISC-V Vector extensions, the Google Research team has a standard but flexible way to parallelize the matrix multiply and accumulate operations that are universal in ML payloads. And thanks to Renode, the team can make informed choices as to how exactly to leverage RISC-V’s flexibility by analyzing tradeoffs between speed, complexity and specialization in a practical, iterative fashion using data generated by Renode and the text-based configuration capabilities that let them play around with hardware composition and functionality in a matter of minutes, not days.
On the ML software side, the ecosystem revolves around IREE – Google’s research project developing an open source ML compiler and runtime for constrained devices, based on LLVM MLIR.
IREE allows you to load models from typical ML frameworks such as TensorFlow or TensorFlow Lite and then convert them to Intermediate Representation (MLIR), which later goes through optimizations on graph level and then through an LLVM compilation flow to get the best-fitted runtime for a specific target. When it comes to deploying models on target devices, IREE provides following runtime APIs:
- C API
- Python API
- TFLite C API – a C API that provides the same convention as TFLite for model loading, tensor management and inference invoking
Using the above-mentioned runtimes, the model can be deployed and tested, debugged, benchmarked and executed on target device or in a simulation environment like Renode.
Demoing the flow at Spring 2022 RISC-V Week
In the build up to the Spring 2022 RISC-V Week in Paris, the first such large open hardware meeting in years, an initial version of the AmbiML bare metal ML flow was released as open source. This includes both the ability to run interactively and an example CI using our GitHub Renode Action showing how such a workflow can be tested automatically on each commit. As a Google Cloud partner, we’re currently working with Google Cloud to make Renode available for massive scale CI testing and deployments for scenarios similar to this one.
In a joint talk by Google and Antmicro at the Paris event, we presented the software co-development flow, together with a demo of a heterogeneous multi-core solution, with one core running the AmbiML Springbok payload and another core running Zephyr.
In the presented scenario the Springbok core, acting as a ML compute offload unit to the main CPU, executed inference on the MobileNetv1 network and reported the work done to the application core via a RISC-V custom instruction. Adding and modifying custom instructions is trivial in Renode, either via a single line of Python, C#, or even co-simulated in RTL.
Renode helps ML developers and silicon designers not only to run and test their solutions, but also to learn more about what their software is actually doing. As part of the Paris demonstration, we showed how you can count executed instructions and how often specific opcodes are used to measure how well your solution is performing. These features, accompanied by execution metrics analysis, executed functions logging and recently developed execution trace generation, give you great insight into every detail of your emulated ML environment.
These capabilities join the wide arsenal of hardware/software co-development solutions in Renode, such as RTL co-simulation which we have been developing with Microchip and support for verilated custom instructions developed with another ML-focused Google team responsible for RISC-V Custom Function Units and also used in the EU-funded VEDLIoT project.
Future plans
This is just the beginning of a wider activity from the Google Research team we’re working with to release software and hardware components as well as tools supporting a collaborative co-design ecosystem for secure ML development, and we’re excited to be participating in that journey. If you think Renode, RISC-V and co-development could help in building your next ML-focused product, reach out to us at contact@antmicro.com to see how we could assist you with adopting the open source components, flows and methods we’re using here.