Spanish RISC-V IP developer Semidynamics has benchmarked the performance of its Tensor Unit running a LlaMA-2 7B-parameter Large Language Model (LLM) on an ‘all in one’ RISC-V AI IP core.
Semidynamics has run the full LlaMA-2 7B-parameter model (BF16 weights) on its All-In-One element, using its ONNX Run Time Execution Provider, and calculated the utilization of the Tensor Unit for all the matrix multiplication layers in the model.
The benchmarking demonstrates the combination of the tensor unit with the Gazzillion streaming data management IP. This is key for LLM models, which use transformer networks that are memory bound. This shows utilization above 80% for most use cases, including sparce networks, or shapes, regardless of the size of the matrices, in sharp contrast with other architectures.