In the world of advanced technology and exploration, some missions take us where ordinary microprocessors cannot dare to go. Whether exploring the depths of space or operating in avionics, the presence of intense radiation poses a grave threat to electronic components. To overcome this challenge, engineers have devised a solution: radiation-hardening. Radiation-hardened – or rad-hard – devices are designed to endure harsh radiation-rich environments, ensuring reliability and performance in mission-critical scenarios. In this article, we will uncover some techniques behind the creation of these cutting-edge microprocessors, which can be identified in three stages: radiation hardening, fault tolerance, and radiation testing.
Radiation Hardening Techniques
The general goal of radiation hardening is to ensure a device can maintain proper operation in the presence of ionizing radiation, which can cause transient and permanent faults in electronic components. Some key technology techniques that are used at the process level to design radiation-hardened electronic systems include:
- Tightly Controlled Doping: Careful doping of the silicon is necessary to create transistors that are less sensitive to radiation-induced charge effects. The doping process is optimized to minimize the chances of free charge carriers being trapped and causing faults.
- Silicon-on-Insulator (SOI): SOI technology is used in commercial semiconductor manufacturing and involves placing a layer of insulating material (e.g., silicon dioxide) between the silicon substrate and the active layer. This approach reduces the susceptibility of transistors to latch-up and single-event effects caused by radiation-induced charge build-up.
Examples of techniques applied at the layout and circuit level are:
- Radiation Hardening By Design (RHBD): RHBD involves designing the system with specific layout and circuitry techniques that minimize the impact of radiation-induced faults. This includes hardened latches, interlocks, and redundant elements.
- Guard Rings: Guard rings are structures placed around sensitive nodes in the flip-flop to mitigate the effects of radiation-induced charge deposition. These rings help to redirect and dissipate the charge, reducing the likelihood of Single Event Latchups(SELs).
- Bit Interleaved Memories: This concept revolves around distributing a data word across multiple memory modules in a way that allows simultaneous access to multiple bits of data at once. This prevents a single radiation hit from causing multiple errors on the same data word, which would require stronger error correction codes (ECC) that are more complex and expensive to implement.
- Error Correction Codes (ECCs): ECCs involve adding redundant bits to data to detect and correct errors that may occur due to radiation.
A common technique for adding fault tolerance is to apply Triple Modular Redundancy (TMR): TMR is a technique where three identical units run in parallel, and a voting mechanism is used to determine the correct output. If one unit experiences an error due to radiation, the other two can outvote it and produce the correct result. TMR can be applied to fundamental elements such as registers in a design and at higher levels up to complete computers. One difference here is that triplication at register level can, by voting, typically remove the error after a single clock cycle, while triplication at a higher level in the system requires a unit to be restarted and has a longer recovery time during which the system would be susceptible to additional errors.
Generally speaking, a combination of these techniques is implemented to design a single System-on-Chip (SoC). Technology platforms such as the ST Microelectronics 28 nm FDSOI (Fully Depleted Silicon On Insulator) GEO P2 or the IMEC DARE180 offer modern radiation-hardened Application Specific Integrated Circuit (ASIC) standard cell libraries that allow designing space-grade microelectronics.
Designing a device with radiation resilience and fault tolerance entails trade-offs, including diminished performance and increased power consumption in comparison to devices that are not rad-hard. While the temptation of designing a robust system through full triplication may be strong, it typically proves less efficient when compared to the utilization of hardened design elements.
Fault Tolerance and the NOEL-V
The primary goal of fault tolerance is to ensure that a system can gracefully handle radiation-induced faults and avoid catastrophic consequences or service interruptions in space. Error correction codes are an important feature, but the story is much more complex. — for example, what happens when an error is detected? In complex systems such as deep-pipeline processors, where many multiple actions are taken within every single clock cycle, how to ensure that errors are not propagated out of the system? And how can a designer make certain that the execution can safely resume after correcting the error? Further, how to ensure that the error-handling functionalities are completed in a timely manner that does not affect the performance in a way that could be destructive for a critical system?
All of these questions (and more) must be considered to successfully develop a reliable space processor. One example of an effective design is the NOEL-V, a RISC-V processor developed by Frontgrade Gaisler that offers industry-leading fault tolerant features ensuring correct execution, minimal performance degradation, and fault isolation. For example, if an error is detected, the correction is implemented in hardware where it is handled transparently, without the need for software intervention or extra memory accesses.
Space-grade microprocessors must be validated through accelerated ground testing, demonstrating their performance and functionality in relevant operational environments. Radiation tests are a series of assessments conducted on electronic components to evaluate their performance and resilience in the presence of ionizing radiation. Radiation tests are typically conducted in specialized facilities, equipped with the necessary tools, equipment, and radiation sources to simulate the radiation-rich environments that electronic components may encounter in their intended operational conditions.
The locations and types of radiation test facilities vary, depending on the specific particle beams required and the industries being served. Some common types of radiation test facilities include:
- Proton and Heavy Ion Facilities: These facilities use particle accelerators to generate and direct energized protons or heavy ions at electronic components or systems.
- Gamma Irradiation Facilities: Gamma irradiation facilities use gamma ray sources (e.g., Cobalt-60 or Cesium-137) to expose components and systems to ionizing radiation.
- Neutron Irradiation Facilities: These facilities use neutron sources (e.g., research reactors) to expose components to neutrons, simulating neutron-rich environments.
The GR765 and its Technology Demonstrator
One of the steps involved in the development of space-grade microprocessors is the creation of a technology demonstrator. These devices are a scaled-down implementation of the final SoC, whose purpose is to demonstrate the capabilities of the underlying ASIC technology and of its fault tolerance features. With the GR765, a RISC-V and SPARC octa-core space-grade microprocessor currently under development, the technology demonstrator is known as SQUAL4. SQUAL4 functionalities have been validated through two radiation test campaigns — one under heavy ions and another under protons. The results demonstrated that the target technology’s hardness and the effectiveness of the processors’ fault tolerance features in handling errors. The complete test results will be presented at the main European conference on Radiation and its Effects on Components and Systems (RADECS) in September 2023.