Addition of Single Precision Floating Point (F) extension in NucleusRV: RISC-V based RV32-IMC Core

Linux Foundation Mentorship Spring 2023 at Micro Electronics Research Lab (MERL) sponsored by RISC-V International

Abstract

The goal of the project is to improve the NucleusRV core’s capabilities by incorporating the F-Extension from the RISC-V standard. This core is a fully parameterized, five-stage pipelined RV32-IMC core. NucleusRV is used in the SoCNow SoC Generator as the base core to generate an SoC, consisting of selective extensions and devices. The F-Extension enables the core to perform single-precision floating-point arithmetic operations which is IEEE 754-2008 compliant. The project makes use of the Berkeley HardFloat library, a dependable library for floating-point operations, to develop the floating-point arithmetic unit.

Project Methodology

Instantiation

NucleusRV is a parameterized core, consisting of the M and C extensions. In this project, the F extension was added. To instantiate a NucleusRV core, the desired extensions are to be selected via boolean values in the config variable in the Top file.

Data Path

Fetch Stage

The Fetch Stage operates normally, i.e. the program counter (PC) selects the instruction to be fetched and the fetched instruction is forwarded to the Decode Stage. The stalls for hazards and some of the floating point operations will happen in later stages which will signal the PC to stall.

Decode Stage

Three new modules were created for floating-point instructions

Floating Point Decoder (FPDecoder)
Floating Point Register File (FPRegFile)
Floating Point Control Unit (FPControlUnit)

The decoded bits such as fmt and funct5 from the FPDecoder were used in FPControlUnit to control the floating-point data path which included,

Controlling the floating-point operations in the ALU.
Detecting data hazard of floating-point load instruction, structural hazard.
Selecting floating-point register values instead of integer register values.
Differentiating the forwarded rs1 and rs2 integer data from floating-point data for the Branch Unit and Jump Unit.
Selecting the FCSR register for dynamic rounding mode.

The decision of whether the data to be written back is for the integer file or the floating-point register file is made by the FPControlUnit. Therefore the ALU or the data memory load data that is written back will be accompanied by the FPControlUnit control pin and will be written to the correct register file.

Execute Stage

A dedicated Floating-Point Unit (FPU) was attached inside the ALU for floating-point operations. Our implementation of the FPU utilized Berkeley’s HardFloat library.

HardFloat

Berkeley HardFloat is a hardware implementation of binary floating-point that complies with the IEEE Standard for Floating-Point Arithmetic. Wide variety of floating-point formats are supported by HardFloat, which independently sets the exponent and significand fields’ widths based on module parameters. The common formats of 16-bit half-precision, 32-bit single-precision, 64-bit double-precision, and 128-bit quadruple-precision are included in the list of potential formats.

HardFloat prefers to convert IEEE-encoded data into equivalent recoded formats and act on those alternative representations rather than working directly on the standard formats. The fundamental goal of the recoding is to normalise subnormals and align them with other floating-point values, lowering the complexity of floating-point operations.

HardFloat supports the following floating-point operations.

Addition and Subtraction
Multiplication
Fused-Multiply Add
Division and Square Root
Comparisons

ALU

A conversion module between integer and floating-point values was separately implemented.

All but the division and square root operations were combinatorial operations and hence required a single clock cycle. The division and square root operations used sequential logic which took an arbitrary number of clock cycles. This raised the need for stalls which were controlled by a master-slave interface. The PC is signaled to stall until the completion of the division or square root operation.

The Forwarding Unit now requires an additional forwarding signal for the third operand of the floating-point operation Fused-Multiply Add. The same control signals used to differentiate the forwarded data values in the Branch Unit and Jump Unit were used in the Forwarding Unit to forward the integer and floating-point data when the same type of data is being calculated in the ALU.

Memory Stage

There were no changes in the memory stage as the floating point load/store instructions, namely flw.s and fsw.s respectively, utilized the same addressing scheme as their integer counterparts (lw and sw). The floating point value to be stored in the data memory was taken care of in the decode stage.

Write Back Stage

No changes were made in the write back stage as the control inputs for integer and floating point register file writes propagated throughout the 3 stages.

All the arithmetic floating point modules were first verified with Python generated random tests. Then compliance tests were successfully run on NucleusRV after integration of these modules.

FCSR:

The F Extension asks for the integration of a new control and status register (CSR) referred to as FCSR. The FCSR serves a dual purpose: facilitating the selection of the dynamic rounding mode and holding the accrued expectation flags. It was integrated into the pre-existing CSR framework that NucleusRV already possessed.

The FCSR comprises two distinct fields, namely frm (Floating Rounding Mode) and fflags (Floating Accrued Flags). Each field possesses its own distinct address, thereby enabling independent access and manipulation as required.

32 7 5 0

Reserved	Rounding Mode (frm)	Accrued Exceptions (fflags)
		NV	DZ	OF	UF	NX

Floating-point operations use either a static or dynamic rounding mode, this can be determined by the rm field found on some instructions. If set to 111, then it will use the rounding mode set on frm.

Rounding Mode	Mnemonic	Meaning
000	RNE	Round to Nearest, ties to Even
001	RTZ	Round Towards Zero
010	RDN	Round DowN (towards -∞)
011	RUP	Round UP (towards +∞)
100	RMM	Round to nearest, ties to Max Magnitude
111	DYN	DYNamic mode

The Accrued Exception Flags serve to keep track of the exception conditions that may have occurred during the execution of a floating-point arithmetic instruction.

Flag Mnemonic	Flag Meaning
NV	Invalid operation
DZ	Divided by Zero
OF	OverFlow
UF	UnderFlow
NX	Inexact

These flags are set when their respective descriptions are triggered.

Why CHISEL?

Chisel is a high-level hardware description language (HDL) embedded in Scala, a programming language. It enables concise and expressive hardware design by allowing designers to write circuit descriptions using familiar programming structures.

When it comes to CPU implementations, Chisel provides a powerful framework for designing and implementing processor architectures. It offers a higher level of abstraction compared to traditional HDLs, enabling designers to focus on the architectural aspects of the CPU rather than low-level implementation details. This allows for faster development cycles and easier experimentation with different CPU designs.

Chisel’s flexibility also allows for parameterized CPU designs, making it easier to create variations of a processor architecture by modifying parameters such as instruction set extensions in our case. This helps in exploring different design trade-offs and optimizing the CPU for specific applications or performance targets.

Acknowledgement

Our mentors Talha Ahmed, Shazaib Kashif, and Usman Zain deserve our gratitude for supporting and advising us during the program. We also want to express our gratitude to MERL and RISC-V for providing us the opportunity to contribute to this project, enabling us to increase our knowledge and gain experience in the RISC-V domain.

Bios

Shayan Hassan Baig is a Computer Science undergraduate student in his third year from Usman Institute of Technology (UIT). He got an opportunity to contribute to the open source RISC-V core, NucleusRV, as a mentee in the Linux Foundation Mentorship program.

Aldo Valentin Balsamo Reyes is a Computer Science undergraduate in his third year at National Dong Hwa University. He got a chance to support the implementation of a Floating Unit to NucleusRV as also a mentee in the Linux Foundation Mentorship program.

Mentors:
Talha Ahmed, Usman Zain, Shahzaib Kashif

Addition of Single Precision Floating Point (F) extension in NucleusRV: RISC-V based RV32-IMC Core

Abstract

Project Methodology

Instantiation

Data Path

Fetch Stage

Decode Stage

Execute Stage

HardFloat

ALU

Memory Stage

Write Back Stage

FCSR:

Why CHISEL?

Acknowledgement

Bios

About

Technical Steering Committee
Board of Directors
FAQ
Contact Us
About RISC-V
History of RISC-V
Blog
News
Announcements
Genealogy

Policies

Privacy Policy
Code of Conduct
Antitrust Policy
Brand Guidelines

Specification

Ratified
Under Development
Contribute

Developers

Get Started
Training
Development Partners
Developer Boards
Labs
Mentorship
Technical Wiki

Industries

Automotive
Artificial Intelligence
Case Studies
Exchange
Landscape
Software Ecosystem Dashboard

Events

RISC-V Summit
Calendar
Videos
Community Meetings

Members

Current Members
Resources
Recognition

Resources

Get RISC-V Gear

Join RISC-V International

Becoming a member of RISC-V International allows companies and individuals to actively influence the development of an open, royalty-free instruction set architecture, driving innovation in custom processor designs.