Skip to main content
BlogWhat's New

XuanTie Co-Processor Interface Solution: A Bridge to Efficient Collaborative Computing

By December 13, 2024December 19th, 2024No Comments

Weitong Su, Alibaba DAMO Academy

Co-processors are specialized units designed to assist the main processor in executing specific tasks, such as graphics processing and signal transmission. By working in parallel with the main processor, they offload certain workloads,thereby significantly enhancing the overall performance of the system. With the rapid advancement of AI technologies, coprocessors have become integral as acceleration units for machine learning, neural network training, and other AI-related computations. These units not only improve the efficiency of parallel processing but also strike an optimal balance between performance and power consumption.

The inherent scalability of RISC-V enables designers to implement extensions for coprocessors. Typically, such extensions are designed to connect the coprocessor to the main processor’s peripheral bus via a defined protocol, leveraging bus-based read and write operations to manage accelerated computations. However, the peripheral bus, which supports various devices, operates at a relatively low frequency, thereby limiting the data transfer performance between the coprocessor and the core. To optimize the operational efficiency of the coprocessor, it is essential to establish a direct interaction mechanism with the core. This can be achieved through specialized interfaces, such as the ROCC interface or the EAI interface, which ensure seamless integration and coupling of the extended coprocessor with the processor core.

The XuanTie team has designed a universal Co-processor Interface solution to optimize collaboration efficiency and communication speed between processors and co-processors. The XuanTie Co-processor Interface is designed for a wide range of collaborative computing scenarios. It supports existing co-processor technologies and incorporates a customizable interface design that ensures flexibility for future co-processor innovations, delivering significant performance improvements.

Supports Multiple Interconnect for Enhanced Performance

The XuanTie Co-processor Interface addresses interconnect challenges in single-core and multi-core systems involving co-processors. Scenarios include single-core to single co-processor, single-core to multiple co-processors, multi-core sharing a single co-processor, and multi-core sharing multiple co-processors. Key features of XuanTie Co-processor Interface include:

1. Co-processors sequentially receive 32-bit or 64-bit instructions from one or multiple main processors. It supports the ability to read data from up to two general-purpose registers in a single operation or perform multi-cycle reads from one or more vector registers. Furthermore, the coprocessor is capable of writing data back to general-purpose registers or vector registers, as depicted in Figure 1.

2. Supporting multiple interconnection architectures:

 

In advanced performance scenarios, the main processor is directly connected to the coprocessor to achieve optimal performance, as shown in Figure 2-1.

In more complex topological configurations, communication is facilitated through a Credit-based mechanism, which ensures enhanced performance. As shown in Figure 2-2.

Modular Design of Signal Channels to Enhance Flexibility

The XuanTie Co-processor Interface solution adopts a modular approach to define multiple signal channel sets, which could be used in different use cases. By selecting appropriate channel combinations based on the unique characteristics of the coprocessor, this approach not only optimizes system performance but also enhances flexibility and scalability.

The signal channels defined by this solution primarily include request channels, completion response channels, speculation channels, and interaction channels. These channels serve distinct functions, such as transmitting instructions, returning computation results, supporting instruction speculation, and enabling the credit-based interaction mechanism. Figure 3 illustrates an example of the signal connections.

For coprocessors that support speculative execution, additional spec_ and frsp_ channels are included, as illustrated in Figure 4.

Design of Multiple Mechanisms to Adapt to Various Co-Processor Connection Scenarios

1.Credit Communication Mechanism

The use of a credit-based communication mechanism facilitates the simultaneous transmission of multiple instructions by multiple cores, making it easier to implement control flow. In structures such as the one shown in Figure 5, where the coprocessor and the main processor are physically distanced, this approach can achieve better interface timing.

The implementation process is shown in Figure 6:

  • During initialization, the main processor does not possess any credit value. When the main processor is ready to dispatch instructions, it first issues an instruction accompanied by a “retry” indication. 
  • If the coprocessor has sufficient resources at that time, it accepts the instruction and returns a credit value to the main processor.
  • However, if the coprocessor lacks available resources, it records the instruction and, once resources become available, provides the corresponding credit value to the main processor. At this point, the main processor retransmits the instruction without the “retry” indication.

2. Speculative Execution Mechanism


The XuanTie Co-processor Interface supports speculative execution of instructions in co-processors. The process is shown in Figure 7:

  • The main processor can send multiple speculative instructions to the coprocessor through the request channel, allowing the coprocessor to execute these instructions in advance.
  • When the main processor determines that a speculative instruction should be executed, it sends the corresponding instruction ID along with the *spec_keep* signal via the speculation channel. Upon receiving this signal, the coprocessor processes the instruction and returns the corresponding computation result once the instruction is completed.
  • When the main processor determines that a speculative instruction needs to be cleared, it sends the corresponding instruction ID along with the spec_kill signal via the speculation channel. Upon receiving this signal, the coprocessor clears the specified instruction along with any subsequent instructions dependent on it.
  • After clearing the instruction, the co-processor sends an frsp_ok signal to notify the main processor.

Conclusion

 

The XuanTie Co-processor Interface solution is designed to support a wide range of interconnection architectures, offering exceptional flexibility to accommodate various coprocessor integration scenarios. When paired with XuanTie processors, this solution facilitates efficient parallel computing and optimized resource utilization. Its modular design, incorporating multiple configurable signal channel sets, further enhances its adaptability and scalability, making it more user-friendly.

Through this customizable interface solution, XuanTie RISC-V processors not only achieve significant advancements in extensibility but also provide a robust, efficient, and highly scalable computing platform. This platform is well-suited to an increasingly diverse array of applications, including big data processing, machine learning, artificial intelligence, and other high-performance computing domains, delivering outstanding performance and unparalleled flexibility.

For more information about XuanTie processors, please visit: https://www.xrvm.com