By: Charlie Cheng, Managing Director of Polyhedron LLC
Andes Technology Corp. was founded in 2004 and is headquartered in Taiwan, with a significant presence in North America. The company is publicly traded in both Taiwan and Europe and is a pure-play microprocessor licensor similar to ARM or previously MIPS. As a founding Premier Member of the RISC-V international body, Andes adopted the RISC-V instruction set RISC-V In 2018, the company went public on the strength of its RISC-V adoption and revenue momentum, and in 2019 it became the first to deploy a RISC-V processor with the Vector Extension, which gained a foothold in machine learning, initially for a hyperscaler. A hyperscalar is vertically integrated, and therefore its private cloud is often the first mover in new applications such as ML.
Since 2019, Andes’ ML design win momentum has grown steadily and today, one third of Andes’ revenues come from machine learning. To put it into some perspective, this is about twice the size of the company’s 2018 revenue when it went public. This reflects the scale of growth in the machine learning business since 2019, which now has spanned the full gamut of machine learning applications, from hyperscalers for social media in the cloud all the way to smallest application of keyword detection in earbuds.
As large language models become increasingly popular, the hyperscaler side of the business will likely continue to grow, with a corresponding increase in the number of design wins. According to ARK ETF Trust’s ARKK research since 2021, the ETF fund that follows key technologies, the traditional server market may not grow much in the cloud. Instead, the computing growth will be in the accelerator side of the business, which is forecasted to experience 20 percent compounded growth over 10 years.
Today, machine learning already comprises about 40% percent of the cloud computing workload, but most of this computing is still done by the x86 servers, with accelerators making up only 5 – 10 percent of the machine learning workload, according to estimates by SemiAnalysis. The next seven years of growth in this area will be fascinating, and large language models will likely be the biggest driver of this growth. To put this in perspective, OpenAI spent about $400 million to create the ChatGPT 3.5 model alone. Once deployed, the cost is about 1.5 cents per search, whereas Google search currently costs about 0.16 cents. A ChatGPT search costs about 10 times more than a regular search. To make advertising cost-effective and profitable, I believe that computing and acceleration will need to perform the workload at 10 times lower cost for search companies and other AI applications to be profitable. This indicates how much more computing resources are needed and how many opportunities for acceleration and cost reduction are available to hardware.
Where does RISC-V fit into this? Most licensees choose RISC-V because of its efficiency compared to traditional microprocessors. When we look at the pyramid of computing, with the cache taking just nanoseconds, DRAMs taking hundreds of nanoseconds, NAND flash taking milliseconds, and storage taking even longer. A typical 4 GHz server lives in the picosecond realm while burning considerable power getting there, but much of the ML data are milliseconds away, causing the servers to sit idle waiting for data. Therefore, licensees want more efficient processors, not just more high-performance ones. When we compare RISC-V to ARM and x86, we see that the computing performance per power is about 3 times more efficient. A large part of this advantage comes from the instruction set architecture, and another part comes from the microarchitecture and execution of the instruction set.
We believe that the 3X advantage may evolve into a 4X to 5X advantage as ARM and x86 continue to rely on traditional server computing architecture, while RISC-V becomes even more efficient for machine learning workloads. But as much as efficiency is a big reason why licensees choose RISC-V, they stay and get excited because the RISC-V instruction set allows users to add instructions. Andes can attest to this with actual user data, as 100 percent of our machine learning licensees add instructions to their RISC-V baseline processors.
To understand why this is the case, we need to look at machine learning today. Besides proprietary frameworks, such as CUDA from NVIDIA, there are tens of frameworks for developers, most notably with TensorFlow five years ago, and now with PyTorch and ONNX being the most popular ones. Models developed in these frameworks eventually get mapped into hardware platforms, from the traditional x86 servers to the exotics of neuromorphic and optical computing. This maze is very complex, and it explains why x86 is so popular and dominant in the cloud, and why ARM is the platform of choice for mobile devices. It’s hard for a particular model developer to cover all these frameworks and computing models.
In a short time, various participants in the machine learning communities have come up with different solutions to reduce this complex maze of ML frameworks to hardware platforms, such as Glow and TVM. These machine learning compilers are designed as a backend for high level machine learning frameworks like Pytorch, Tensorflow, and others. They take AI frameworks and translate them into intermediate format, and then use traditional compilers or native code generators to target specific hardware platforms. This has reduced the amount of work that design teams need to do to create an AI model using a framework that eventually targets hardware.
However, we can still improve because many of these new hardware platforms are embedding RISC-V cores. So the question now becomes, how can RISC-V encapsulate these accelerator functions in a way that’s easy for ML compilers to generate machine code and optimize for them. What we have been working on and delivering is a layer of abstraction for this hardware, by expressing each accelerator functionality as a RISC-V instruction using the user-defined Opcode space in RISC-V ISA. This way, software such as Glow and TVM don’t have to know the underlying specific hardware. They only see RISC-V instructions, including some created by the hardware vendors using RISC-V’s instruction extensions. These added instructions are often intermixed with the RISC-V baseline Vector Extension, RVV instructions. Andes provides the baseline processors that are high performance, specifically for machine learning applications.
To facilitate and ease this development, Andes automates the addition of instructions, and uniquely ML accelerator instructions that often time have memory movement instructions. This allows the underlying hardware developers to easily generate these instructions that link to traditional accelerator hardware, all the way to the most novel ones such as neuromorphic.
In parallel, as an ongoing process, Andes continues to drive its roadmap to optimize the interaction so that the RISC-V core and accelerators can work seamlessly. For example, as model developers experiment with data representations, the translations back-&-forth often occur between the accelerator and the RISC-V RVV processor. Andes has made that native to how the RISC-V custom instructions work. I believe these improvements are why RISC-V over time is going to be a critical part of machine learning because it eases the development of these highly specialized and highly efficient accelerators.
Machine learning AI is going to have a huge footprint in the future, not just in the cloud, but also on the edge, all the way to the smallest footprint of something like an earbud. There’s an insatiable appetite for performance and power to improve.
The tool vendors, software, middleware, and AI operators will continue to refine and evolve their framework compilers and tools. RISC-V is an incredibly good microprocessor to bridge the accelerators and these machine learning frameworks and compilers. Andes is very focused in enabling the most efficient and productive RISC-V platforms for these applications.