Skip to main content

RISC-V Summit Europe 2025 · Paris, France - 12-15 May · Learn More

Blog

Enhancing Commercial Software Adaptation with XuanTie Optimized Computing Libraries

By March 28, 2025No Comments

By Yunfei Zhou, Alibaba DAMO Academy

1. Introduction

The RISC-V architecture has matured over time. Its open, flexible, and extensible nature shows great promise in various areas, like graphics, imaging, audio-video encoding/decoding, scientific computing, and machine learning. The software ecosystem is also growing richer. As commercial software integration increases, software libraries demand higher RISC-V hardware performance. The XuanTie team leverages the features of the XuanTie RISC-V CPU microarchitecture and uses RISC-V’s vector extension instructions. We analyze performance in code size, memory access, and instruction pipelining. This work has resulted in a series of XuanTie optimized computing libraries.

2. RISC-V Optimization Overview

RISC-V is an open instruction set architecture (ISA). Its design follows the reduced instruction set principle. This approach simplifies hardware implementation and opens many opportunities for software optimization.
One key advantage is the RISC-V Vector extension (RVV). Unlike fixed-length SIMD instructions, RVV supports variable lengths. Software libraries optimized with RVV can run unmodified on platforms with different vector lengths.

The XuanTie team focuses on several optimization aspects:

  • Performance Analysis: We analyze different application scenarios and libraries. Bottlenecks are identified and strategies are developed.
  • Computational Optimization:  We detect computation-intensive hotspots. When possible, these are vectorized using RVV instructions.
  • Memory Access Optimization: We study memory bottlenecks, reduce cache miss rates, and apply address alignment and data prefetching.
  • Front-end Optimization: We optimize branch prediction, reduce branch counts, and realign branch addresses and pipeline order.
  • Compiler Options Optimization: We use advanced compiler flags (e.g., -O2, -O3). They enable auto-vectorization, loop unrolling, and address alignment.
  • RISC-V Extension Instructions: The architecture supports many extensions. For example, the Zc extension reduces code size while the Zb extension boosts bitwise operation performance.

3. XuanTie Optimized Computing Libraries Overview

The XuanTie optimized computing libraries are a collection built for the XuanTie RISC-V CPU. They are divided into five categories:

  • Basic Libraries
  • Graphics and Image Libraries
  • Compression Libraries
  • Encoding and Decoding Libraries
  • Scientific Computing Libraries

Each category offers efficient and stable solutions that fully leverage the XuanTie CPU’s performance.

3.1 Basic Libraries

For the basic C libraries, XuanTie has deeply optimized string handling and math functions from the existing Bionic library. This improves performance and efficiency. For resource-constrained hardware, they developed a low-resource Minilib C library.
Minilib supports most standard C functions, such as string operations, memory management, and standard I/O. It maintains excellent performance. In terms of code size, Minilib shows clear advantages over newlib-nano in several function categories. Typical scenarios like “helloworld” and “drystone” reveal significant size reductions after linking.

3.2 Graphics and Image Libraries

In the graphics and imaging domain, XuanTie analyzes and optimizes graphic computation and related encoding/decoding libraries. They offer optimized versions of several libraries:

  • Skia: An open-source 2D graphics acceleration library. XuanTie has optimized its RGB-related functions.
  • CSI-2D: A proprietary 2D graphics acceleration library developed by XuanTie. It supports layer blending, color filling, and graphic transformations.
  • libjpeg: An open-source JPEG image library. XuanTie enhanced both its encoding and decoding performance.
  • libpng: An open-source PNG image library. Its decoding capability is improved.

Performance data shows that libpng’s decoding speed improves by about 50%. libjpeg’s encoding performance more than doubles and its decoding performance increases by around 50%.

3.3 Compression Libraries

In the compression and decompression field, XuanTie optimized popular algorithms such as zlib and zstd. They effectively use RISC-V vector instructions and parallel processing.

For zlib, the optimized version shows significant performance gains. In scenarios with no compression (only packaging), where memory performance is critical, the improvement is notable. Under compression, parallel processing accelerates hash checks. Overall, decompression performance increases by over 70%.

3.4 Encoding and Decoding Libraries

XuanTie also optimized encoding and decoding libraries for audio and video. They performed deep analysis on popular formats and improved both encoding and decoding processes.

  • For audio, they accelerated decoding for formats like mp3, aac, ogg, and amr. Audio libraries often run on resource-limited hardware. XuanTie applied DSP acceleration instructions, greatly enhancing efficiency and real-time performance. On the XuanTie E906 platform, CPU usage for all audio decoding remains below 50MHz.
  • For video, particularly H.264 decoding, XuanTie uses RISC-V vector instructions and parallel processing. Specific algorithm optimizations yield significant improvements. Normalized performance comparisons show that, across different resolutions, H.264 decoding performance increases by 50% to 85%.

3.5 Scientific Computing Libraries

In scientific computing, XuanTie examined various mathematical operations, linear algebra, and numerical simulations. Their optimized libraries include:

  • OpenBLAS: A high-performance BLAS library used for matrix and linear algebra computations. Matrix operators are heavily optimized.
  • Eigen: A fast, modular C++ library for linear algebra, matrix/vector operations, numerical solutions, and geometry. Matrix operations are a key optimization target.
  • OpenCV: An open-source computer vision library used for image processing, video analysis, machine learning, and computer vision. Some commonly used operators have been accelerated.

Matrix multiplications (e.g., sgemm for single precision and dgemm for double precision) are standard benchmarks. Eigen’s performance improvements are clearly illustrated in the comparison data.

4. Conclusion

Through continuous performance analysis and optimization on the RISC-V architecture, the XuanTie team has successfully built a series of efficient computing libraries for various applications. The optimized libraries deliver significant performance gains and meet the high efficiency requirements of commercial software adaptation. In the future, XuanTie will continue to address market needs by releasing more high-performance RISC-V computing library solutions. This will further boost the prosperity of the RISC-V ecosystem.