Skip to main content
Blog

Addition of Single Precision Floating Point (F) extension in NucleusRV: RISC-V based RV32-IMC Core

By June 22, 2023No Comments

Linux Foundation Mentorship Spring 2023 at Micro Electronics Research Lab (MERL) sponsored by RISC-V International

Abstract

The goal of the project is to improve the NucleusRV core’s capabilities by incorporating the F-Extension from the RISC-V standard. This core is a fully parameterized, five-stage pipelined RV32-IMC core. NucleusRV is used in the SoCNow SoC Generator as the base core to generate an SoC, consisting of selective extensions and devices. The F-Extension enables the core to perform single-precision floating-point arithmetic operations which is IEEE 754-2008 compliant. The project makes use of the Berkeley HardFloat library, a dependable library for floating-point operations, to develop the floating-point arithmetic unit.

Project Methodology

Instantiation

NucleusRV is a parameterized core, consisting of the M and C extensions. In this project, the F extension was added. To instantiate a NucleusRV core, the desired extensions are to be selected via boolean values in the config variable in the Top file.

Data Path

Fetch Stage

The Fetch Stage operates normally, i.e. the program counter (PC) selects the instruction to be fetched and the fetched instruction is forwarded to the Decode Stage. The stalls for hazards and some of the floating point operations will happen in later stages which will signal the PC to stall.

Decode Stage

Three new modules were created for floating-point instructions

  1. Floating Point Decoder (FPDecoder)
  2. Floating Point Register File (FPRegFile)
  3. Floating Point Control Unit (FPControlUnit)

The decoded bits such as fmt and funct5 from the FPDecoder were used in FPControlUnit to control the floating-point data path which included,

  • Controlling the floating-point operations in the ALU.
  • Detecting data hazard of floating-point load instruction, structural hazard.
  • Selecting floating-point register values instead of integer register values.
  • Differentiating the forwarded rs1 and rs2 integer data from floating-point data for the Branch Unit and Jump Unit.
  • Selecting the FCSR register for dynamic rounding mode.

The decision of whether the data to be written back is for the integer file or the floating-point register file is made by the FPControlUnit. Therefore the ALU or the data memory load data that is written back will be accompanied by the FPControlUnit control pin and will be written to the correct register file.

Execute Stage

A dedicated Floating-Point Unit (FPU) was attached inside the ALU for floating-point operations. Our implementation of the FPU utilized Berkeley’s HardFloat library.

HardFloat

Berkeley HardFloat is a hardware implementation of binary floating-point that complies with the IEEE Standard for Floating-Point Arithmetic. Wide variety of floating-point formats are supported by HardFloat, which independently sets the exponent and significand fields’ widths based on module parameters. The common formats of 16-bit half-precision, 32-bit single-precision, 64-bit double-precision, and 128-bit quadruple-precision are included in the list of potential formats.

HardFloat prefers to convert IEEE-encoded data into equivalent recoded formats and act on those alternative representations rather than working directly on the standard formats. The fundamental goal of the recoding is to normalise subnormals and align them with other floating-point values, lowering the complexity of floating-point operations.

HardFloat supports the following floating-point operations.

  • Addition and Subtraction
  • Multiplication
  • Fused-Multiply Add
  • Division and Square Root
  • Comparisons

ALU

A conversion module between integer and floating-point values was separately implemented.

All but the division and square root operations were combinatorial operations and hence required a single clock cycle. The division and square root operations used sequential logic which took an arbitrary number of clock cycles. This raised the need for stalls which were controlled by a master-slave interface. The PC is signaled to stall until the completion of the division or square root operation.

The Forwarding Unit now requires an additional forwarding signal for the third operand of the floating-point operation Fused-Multiply Add. The same control signals used to differentiate the forwarded data values in the Branch Unit and Jump Unit were used in the Forwarding Unit to forward the integer and floating-point data when the same type of data is being calculated in the ALU.

Memory Stage

There were no changes in the memory stage as the floating point load/store instructions, namely flw.s and fsw.s respectively, utilized the same addressing scheme as their integer counterparts (lw and sw). The floating point value to be stored in the data memory was taken care of in the decode stage.

Write Back Stage

No changes were made in the write back stage as the control inputs for integer and floating point register file writes propagated throughout the 3 stages.

All the arithmetic floating point modules were first verified with Python generated random tests. Then compliance tests were successfully run on NucleusRV after integration of these modules.

FCSR:

The F Extension asks for the integration of a new control and status register (CSR) referred to as FCSR. The FCSR serves a dual purpose: facilitating the selection of the dynamic rounding mode and holding the accrued expectation flags. It was integrated into the pre-existing CSR framework that NucleusRV already possessed.

The FCSR comprises two distinct fields, namely frm (Floating Rounding Mode) and fflags (Floating Accrued Flags). Each field possesses its own distinct address, thereby enabling independent access and manipulation as required.

32                                               7                                                                                                5                                                                                                                     0

Reserved Rounding Mode (frm)

Accrued Exceptions (fflags)

NV

DZ OF UF

NX

Floating-point operations use either a static or dynamic rounding mode, this can be determined by the rm field found on some instructions. If set to 111, then it will use the rounding mode set on frm.

Rounding Mode

Mnemonic

Meaning

000

RNE

Round to Nearest, ties to Even

001

RTZ

Round Towards Zero

010

RDN

Round DowN (towards -∞)

011

RUP

Round UP (towards +∞)

100

RMM

Round to nearest, ties to Max Magnitude

111 DYN

DYNamic mode

The Accrued Exception Flags serve to keep track of the exception conditions that may have occurred during the execution of a floating-point arithmetic instruction.

Flag Mnemonic

Flag Meaning

NV

Invalid operation

DZ

Divided by Zero

OF

OverFlow

UF

UnderFlow

NX

Inexact

These flags are set when their respective descriptions are triggered.

Why CHISEL?

Chisel is a high-level hardware description language (HDL) embedded in Scala, a programming language. It enables concise and expressive hardware design by allowing designers to write circuit descriptions using familiar programming structures.

When it comes to CPU implementations, Chisel provides a powerful framework for designing and implementing processor architectures. It offers a higher level of abstraction compared to traditional HDLs, enabling designers to focus on the architectural aspects of the CPU rather than low-level implementation details. This allows for faster development cycles and easier experimentation with different CPU designs.

Chisel’s flexibility also allows for parameterized CPU designs, making it easier to create variations of a processor architecture by modifying parameters such as instruction set extensions in our case. This helps in exploring different design trade-offs and optimizing the CPU for specific applications or performance targets.

Acknowledgement

Our mentors Talha Ahmed, Shazaib Kashif, and Usman Zain deserve our gratitude for supporting and advising us during the program. We also want to express our gratitude to MERL and RISC-V for providing us the opportunity to contribute to this project, enabling us to increase our knowledge and gain experience in the RISC-V domain.

Bios

Shayan Hassan Baig is a Computer Science undergraduate student in his third year from Usman Institute of Technology (UIT). He got an opportunity to contribute to the open source RISC-V core, NucleusRV, as a mentee in the Linux Foundation Mentorship program.

Aldo Valentin Balsamo Reyes is a Computer Science undergraduate in his third year at National Dong Hwa University. He got a chance to support the implementation of a Floating Unit to NucleusRV as also a mentee in the Linux Foundation Mentorship program.

Mentors:
Talha Ahmed, Usman Zain, Shahzaib Kashif