Skip to main content
BlogWhat's New

Chromium Performance Optimization on XuanTie RISC-V Processors

By January 8, 2025January 9th, 2025No Comments

Yang Li, Alibaba DAMO Academy

Chromium, the most widely adopted open-source browser engine, serves as the foundation for numerous mainstream applications, including Chrome, Electron, VSCode, and WebView. As a critical component of modern desktop and mobile systems, its performance has a profound impact on the overall user experience.

The rapidly advancing RISC-V architecture presents a natural synergy with Chromium. The development of RISC-V-compatible Chromium builds not only accelerates the convergence of open-source software and hardware but also leverages RISC-V’s inherent flexibility and extensibility. These attributes empower developers to implement architecture-specific optimizations for Chromium, significantly enhancing its performance and efficience. Such advancements contribute to the broader adoption and ecosystem growth of RISC-V in modern computing environments.

Since 2018, there have been efforts to add RISC-V support to Chromium. However, Chromium still does not officially support the RISC-V architecture. Community-built RISC-V versions of Chromium face several challenges, including slow webpage rendering, poor cloud application performance, unresponsive user interactions, and stuttering during video playback. These issues make these versions hard to meet production requirements.

Optimizing Chromium’s performance on RISC-V is a critical goal. It is a significant challenge for all terminal platforms. Achieving this goal plays a key role in driving the growth of the RISC-V ecosystem, and is essential for making RISC-V-based systems more competitive.

The XuanTie team has introduced several improvements to Chromium. These efforts address issues related to builds and performance. As a result, users can now access a version with significantly better performance.

Analyzing the Chromium Architecture, Exploring Key Performance Optimization Paths

Chromium processes web content through several key stages. It uses an HTML parser and the V8 JavaScript engine to interpret webpage code. The Blink module handles rendering and drawing. The Viz compositor then combines these elements into a final output. Finally, the display component presents the rendered content to the user. Developers can write JavaScript code to add controls and animations to web pages. This makes it easy to create flexible and dynamic web applications.

As shown in the architecture diagram, Chromium encapsulates the core V8 JavaScript engine and the rendering pipeline at its lower levels. At higher level, it provides extensive API interfaces to support a wide range of applications. This makes the JavaScript engine and the rendering pipeline critical path in determining Chromium’s overall performance. On the other side, over 90% of Chromium’s core codebase is written in C++, making the choice of toolchains another key factor influencing its performance.

The XuanTie team conducted a detailed analysis of Chromium’s architecture and identified three critical modules within the system. This helped us define a clear path for performance optimization.

Chromium Performance Optimization Practices

We focused on optimizing the Clang toolchain, the V8 JavaScript engine, and the rendering pipeline. We made a series of adjustments and practical changes, resulting in significant improvements in performance tests.

1. Clang Toolchain Optimization

As shown in Figure 2, the team optimized the Clang toolchain by offering multiple compilation optimization options and supporting the XuanTie extended instruction set. This improves the Chromium binary code generated. We also analyze profiling data from typical Chromium use cases. This analysis helps them identify performance bottlenecks in both the standard and extended instruction sets. We then enhanced the extended instruction set to address these bottlenecks, creating a continuous optimization cycle for Chromium and other softwares.

Meanwhile, we worked closely with the RISC-V community. We are actively promoting the growth of the application ecosystem for both the RISC-V generic instruction sets and extended instruction sets.

2. V8 JavaScript Engine Optimization

Early browsers could only render static HTML pages. This resulted in a rigid and unengaging user experience. The introduction of dynamically executable JavaScript (JS) revolutionized web browsers. JS engines, like V8, are responsible for executing these scripts. The V8 engine combines an interpreter and a JIT compiler to compile JS code into machine code in real time.

2.1 Optimization Plan

The interpreter first translates JS code into bytecode. It then uses profiling data to decide whether a JS function should be compiled into machine code or executed directly as bytecode. This profiling is performed dynamically at runtime. At startup, all functions are “cold.” The interpreter maps each bytecode to a static built-in function for execution. Over time, frequently called functions become “hot.” Once a function’s “temperature” exceeds a threshold, the profiling module flags it as a hotspot function. It then sends the function to the TurboFan compiler to generate optimized machine code. TurboFan applies techniques such as graph optimization, inlining, and instruction fusion to create efficient instructions and improve program performance.

The quality of the machine code generated by the JS engine directly determines the execution speed of JS code. To optimize this process on RISC-V, our team made several improvements to the TurboFan compiler backend. These efforts focused on four key areas:

  • Instruction Selection Optimization:

The team optimized instruction selection and fusion logic for the RV64 instruction set. These changes enabled the generation of more concise intermediate code on the RV64 architecture.

  • Instruction Execution Efficiency Optimization:

The team enhanced the RISC-V backend by adding support for the XuanTie extended instruction set. This allowed certain operations, which previously required multiple RISC-V instructions, to finish with a single extended instruction. This improvement increased execution efficiency and reduced the binary footprint.

  • Translator Efficiency Optimization:

Non-hotspot functions in the V8 engine are translated into bytecode for execution. Each bytecode maps to a static built-in function. Some of these functions are generated by TurboFan, while others are manually written in assembly code. The team optimized both types of functions, improving the execution efficiency of built-in functions and accelerating the bytecode execution process.

  • Trampoline Mechanism Optimization:

The V8 engine uses a trampoline mechanism to share built-in functions across different browser processes. This mechanism allows code transitions from heap-allocated regions to shared built-in function regions. However, on RISC-V, it generates many indirect jump instructions, leading to high branch miss rates and poor performance. The team redesigned this mechanism for RISC-V, to replace most indirect jumps with direct jumps, improving branch prediction accuracy and jump efficiency.

2.2 Optimization Results

These targeted optimizations significantly enhanced the performance and efficiency of the V8 JS engine on RISC-V.

3. Rendering Pipeline Optimization

After web parsing, the display pipeline renders the DOM tree and composes it into the final visual output. The speed of this process plays a crucial role in browser performance. 

3.1 Optimization Plan

We focused on optimizing two key areas in the display pipeline: video decoding and compositing.

  • Compositing Optimization:

In the final rendering stage, Chromium uses the GPU to composite UI elements with the desktop background. On high-resolution screens, this increases GPU usage and causes significant display latency. This delay can lead to dropped frames when real-time rendering requirements are not met. The XuanTie team resolved this issue by leveraging the X-Video (XV Overlay) extension on the Xorg desktop. We redirected the UI directly to the XV Overlay, bypassing GPU compositing. This approach reduced GPU usage by 20% to 40%, lowered display latency, and ensured real-time performance.

  • Video Decoding Optimization:

Chromium also supports embedded video playback on webpages. By default, video decoding requires copying decoded frames to the GPU for rendering. This process adds extra workload to both the CPU and GPU. The XuanTie team optimized the video decoding pipeline by removing the memory copy between CPU and GPU. This enabled zero-copy video decoding throughout the entire pipeline.

3.2 Optimization Results

Community versions of Chromium often struggle with heavy display pipeline workloads. Video playback, especially in high resolutions, tends to stutter, with both the CPU and GPU running at full capacity. The XuanTie team’s optimizations significantly reduced this workload. As a result, Chromium can now achieve smoother 4K video playback at 60fps.

Conclusion

The journey of Chromium on RISC-V is still in its early stages. There are many areas that require further exploration and refinement. The XuanTie team will continue to focus on these challenges. Future efforts will include optimizing the V8 JIT compiler, improving backend performance, and enhancing audio and video libraries. The team will also actively engage with the RISC-V community. By collaborating with developers worldwide, they aim to build a stronger and more open RISC-V ecosystem.