Today’s data center architectures are struggling to keep up with the explosive bandwidth and data growth requirements. In many applications, such as machine learning, sequential data bases and computational storage, we commonly run into practical limitations dictated by the maximum available size of main memory and the scale of storage capacity. Main memory is currently under control of the central processing unit (CPU), while storage is constrained by legacy implementations which do not scale effectively. This article will focus on the main memory challenges, but we will touch on future storage solutions briefly first.
Zoned Storage Addresses Storage Scale
Storage for data centers today consists of hard disk drives (HDDs) and flash-based solid state drives (SSDs). These devices have evolved from the initial computer architecture interfaces such as SCSI (Small Computer System Interface), SAS (Serial Attached SCSI) and SATA (Serial Advanced Technology Attachment). HDDs appeared to the host as contiguous sets of blocks, when in fact they were organized in zones and data was written and mapped to various physical sectors. When SSDs were first introduced, because they were so much faster than HDDs the controller inside the SSD device allowed the host to write data with virtually no restrictions. Inside the SSD, the controller managed erasing, organizing, moving, writing and so on.
As the growth of data surged, so did the demands of storage. The existing implementations do not scale because of all the overhead burden that HDDs and SSDs are required to handle. To enable storage devices to scale effectively requires a new framework for the data center and cloud providers.
One approach is Zoned Storage, an open-source, standards-based initiative introduced by Western Digital to enable data centers to scale efficiently for the zettabyte storage capacity era. For both HDDs and SSDs, Zoned Storage is comprised of open standards, which includes ZBC (Zoned Block Commands) and ZAC (Zoned ATA Command Set) for SMR (Shingled Magnetic Recording) HDDs and Zoned Namespaces (ZNS) for NVMe™ SSDs. At a high level, these standards partition the storage device as it physically is and the host software organizes the data before it is stored. By leveraging Zoned Storage, both HDDs and SSDs can support higher densities, increase endurance and lower TCO for data centers and cloud providers. Next let’s focus on the main memory challenge of the existing architectures.
Next Generation Memory Architecture – OmniXtend™
Because main memory is controlled by the CPU, today’s system architecture is required to conform to its interfaces. This effectively fixes the ratio of memory-to-compute in any practical system, which is an impediment to scaling many memory-centric applications.
There are various attempts to circumvent this limitation, but they all have drawbacks. For example, the use of Remote Direct Memory Access (RDMA) architectures requires software to manage the moving of bits from non-volatile storage into and out of main memory, as well as more software to synchronize the distant copies — in other words to provide coherence to the programmer. The software and network infrastructure needed is burdensome and costly.
Several new technologies are enabling architects to redo memory-centric computing. The first is the emergence of higher density, byte-addressable nonvolatile memories. These are quickly becoming cost-competitive to dynamic random access memory (DRAM). These new memories can become a new type of main memory. The second advancement is the growth of the programming language P4 and its use in dataplane-programmable Ethernet switches. This new level of flexibility allows architectures to use low-cost Ethernet hardware with completely new protocols. Lastly, the acceptance and openness of RISC-V. RISC-V is an open instruction set which has spawned numerous different processor microarchitectures. Many of these implementations are open-sourced, including the buses and messaging required for multiple CPUs to share cache and main memory. The cache coherency bus ensures that all hosts see a synchronized vision of the main memory they share.
To enable a cache coherent memory-centric architecture requires the sharing of the cache coherency bus among all existing and future devices which would access main memory. With the existing proprietary ecosystems, such as x86 and Arm®, the cache coherency bus is closed. With RISC-V, however, there are open implementations of on-chip cache coherency buses. With the bus specification open and unencumbered, main memory can be unleashed and shared between heterogeneous system components. See an example design in figure 1.
Figure 1
Given the new levels of dataplane programmability of P4 Ethernet switches, it is a logical medium to use for transporting the cache coherency messages. A thorough re-architecting of a compute-and-storage system can now take full advantage of these new technologies and enable continued scaling into the future.
To that end, OmniXtend was developed by Western Digital. It was introduced to the open-source community in 2019. OmniXtend is a cache coherence protocol which encapsulates coherence traffic in Ethernet frames. OmniXtend was motivated by the desire to unshackle main memory from a CPU and address the urgent need of the RISC-V ecosystem for a common scale-out protocol. A system diagram can be seen in figure 2.
Figure 2
OmniXtend builds upon the TileLink coherence protocol, which originated in the RISC-V academic community, to scale beyond the processor chip. OmniXtend uses the programmability of modern Ethernet switches to enable processors’ caches to exchange coherence messages directly over Ethernet fabric. OmniXtend allows large numbers of RISC-V and other CPUs, GPUs, machine learning accelerators and other components to connect to a shared and coherent memory pool. Figure 3 shows a high-level block diagram.
Figure 3
The header format of the OmniXtend packets includes the fields required for coherence protocol operations. The combination of these OmniXtend header fields encodes in every message the necessary information for coherence like the operation type,
permission, memory address and data. OmniXtend messages are encoded into Ethernet packets, along with a standard preamble followed by a start frame delimiter. OmniXtend keeps the standard 802.3 L1 frame to interoperate with Intel® Barefoot Tofino™ and future programmable switches by replacing Ethernet header fields with coherence message fields. The OmniXtend protocol shares the coherence policies by a series of permission transfer operations. A master agent (i.e. cache controller) must first obtain necessary permissions on a specific memory block through transfer operations to perform read and/or write operations. None, Read or Read+Write are the possible permissions that an agent can have to work on a copy of a memory block. The protocol initially supports a MESI (Modified, Exclusive, Shared, Invalid) cache state machine model.
OmniXtend has already been implemented in FPGA boards and with an Intel Barefoot Tofino switch.
The Xilinx VCU118 FPGA evaluation board has been configured to run a SiFive RISC-V U54-MC Standard Core with OmniXtend protocol. Two of these FPGA boards are connected to a top of rack (ToR) Intel Barefoot Tofino switch via SFP+ connectors. The RISC-V cores on each VCU118 board issue random read and write requests in the coherent mode for different memory block sizes hosted on the other board. The OmniXtend protocol ensures the processor coherently reads the memory locations. See the demonstration in figure 4.
Figure 4
OmniXtend is currently being further developed in the open source hardware group called CHIPS Alliance. This open organization will be developing the technical implementation and the open standard for all to use. OmniXtend is the first cache coherent memory technology providing open-standard interfaces for memory access and data sharing across a wide variety of processors, FPGAs, GPUs, machine learning accelerators and other components. Moreover, the programmability of OmniXtend capable switches allows any desired modifications to coherence domains or protocols to be deployed immediately in the field, without requiring new system software or new application-specific integrated circuits (ASICs). OmniXtend will accelerate innovation in data center architectures, purpose-built compute acceleration and CPU microarchitectures.
To learn more about OmniXtend or Zoned Storage, join the upcoming meetup on July 21 at 5:30 p.m. PT. https://www.meetup.com/Bay-Area-Zoned-Storage-ZNS-SMR-etc-Meetup-Group/events/271313921/
Ted Marena
Senior Director, RISC-V Ecosystem & ML Business Development
Western Digital
Ted Marena is responsible for evangelizing RISC-V, accelerating the build out of the RISC-V ecosystem and marketing machine learning solutions as well as Zoned Storage. Marena is the marketing chair for the RISC-V Foundation and is also the managing director for the open source hardware development organization, CHIPS Alliance.