This blog was originally published on the Codasip blog.
The more complex a processor core, the larger the area and power consumption. But increasing complexity is not a single dimension as processors can be more complex in different ways. In selecting a processor IP core, it is important to choose the right sort of complexity for your project.
Some ways of thinking about complexity include:
- Word length
- Execution units
- Virtual memory
- Security features
Generally, the smaller the word length, the smaller the core and the lower the power, however this is not always the case. An 8-bit core, such as the 8051, is comparable in gatecount to the smallest 32-bit cores, but power consumption is usually worse. An 8-bit core requires more memory accesses due to less computation per clock cycle requiring more cycles. The net impact is that it requires more power to complete a computation.
Processor cores vary considerably in the complexity of their execution units. The simplest are basic single ALUs requiring many common operations to be implemented by the simple instructions – for example using shift and add to implement a multiplication. It is therefore commonplace for cores to have a hardware multiplier and divider. In the event of needing good floating-point performance, adding a hardware floating point unit will provide significantly better performance. This option is available for Codasip’s Bk3 and Bk5 RISC-V cores but at the price of roughly doubling the core size.
So far, we have assumed a single computational thread and scalar processing units which execute one instruction at a time. Superscalar architectures have instruction-level parallelism able to fetch multiple instructions and despatch them to different execution units. For example, the Western Digital EH1 and EH2 SweRV Cores have two execution units. A dual-issue core processing one thread can theoretically have up to double the performance of a single-issue core. However, a thread can stall making both execution units temporarily inactive. If there are two hardware threads (harts), then if one thread stalls, the other can continue execution.
Processors can vary considerably in pipeline depth and there is a direct relationship between this depth and latency. Some applications can tolerate high latency, with the consequence being slower response to interrupts, in return for high clock frequencies and throughput. Other applications require rapid responses to interrupts so need shorter pipelines.
Another area of complexity is privilege modes – the more modes, the more complex the core logic. Many embedded applications run in machine mode, which means that the code has full access to the core – like root privilege in Linux. Such code must be completely trusted to avoid negative consequences. In more sophisticated applications, a range of privileges such as machine, supervisor and user may be offered. Normal applications will run in user mode with the greatest amount of protection and some software requiring greater privilege will use supervisor mode. Linux requires all three modes, which is why Codasip developed the Bk7 core that supports them and is Linux-capable.
Virtual memory also requires additional processor resources such as a memory management unit (MUU) and translation lookaside buffer (TLB) to handle translating virtual memory addresses to physical addresses. This brings additional costs in terms of area and power dissipation without improving processor throughput. Nevertheless, virtual memory is necessary for using rich operating systems such as Linux which enable more complex software to be used.
So, when choosing a processor core, work out what sort of execution units, memory management, privilege and security you need. That combination will determine the complexity of the core.