Skip to main content
Blog

A dual-core TEE security solution based on E902

Jiacheng Yang, Yahui Teng, Bingquan Huang

Guangdong University of Technology

In the China Postgraduate IC Innovation Competition this August, T-Head sponsored the design challenge: Building the Dual-core TEE security solution based on Xuantie Open E902. In this competition, there are lots of excellent designs, and some of the teams take the initiative to share their designs. The team Reshaker has shared their design in TEE SoC Based on RISC-V. This is another team from Guangdong University of Technology, they won the second prize in the competition, they introduce their design in this blog.

Preface

Rapid development of IoT has engendered various information security issues. Trusted execution environment is an effective means to ensure security. In this article we will address the importance of building capability of TEE under RV processor to achieve such security.

Overall Solution

We will design a minimal SOC system with TEE support using E902 as the processor. To improve the security of the system, we start from the following aspects:

  1.  The dual-core solution, comprised of secure and non-secure cores, shows achievement in physical isolation of the CPU. This is to ensure that non-secure cores have no access resources within the secure world, so that each core has its own independent memory area within it for running programs.
  2. Crypto encryption and decryption modules are implemented in hardware. The hardware is also where AES/SM4 algorithm is used to achieve fast data encryption and decryption. SM3/SHA256 is incorporated for data compression and integrity verification; RSA algorithm is utilized for signature verification. We strive to improve the security of hardware cryptographic algorithms through anti-attack design, while ensuring root security.
  3. Restrict the access of non-secure cores to resources within the secure world through IOPMP modules, such as Crypto encryption and decryption modules.
  4. Implement a secure boot process. Running a non-secure core only after the secure core has been configured with PMP registers, i.e., after restricting the non-secure core’s access to some resources.

In summary, the overall security solution is shown in Figure 2-1.

Figure 2-1 Secure boot process

1.We conduct necessary preliminary preparations before secure boot process.

(a) Randomly generate the RootKey required for SM4/AES algorithm encryption and the PublicKey required for RSA algorithm,             and solidify them in the BootRom of the security core, while the private key of RSA is kept by the administrator; in addition, a               small piece of code is solidified in the BootRom of the security core for power-on self-boot, and the main function is to carry                   SPL (Second Stage Program loader) in Flash and perform administrator authentication as well as data integrity verification,                   and after decryption is completed, it will be carried twice.

(b) All the program code stored in Flash is encrypted by SM4/AES algorithm RootKey used is the same as that in the secure core BootRom. In addition, the program stores the encrypted program, as well as the hash value processed by RSA.

(c) A small program is also solidified in the BootRom of the non-secure core for power-on self-boot.  Its main function is to query the change of the Secure Configuration Register value.

2. The secure boot process is roughly divided into the following steps.

(a) When the secure core is powered on and reset, the chip will run from the starting address 0x00 in BootRom. Meanwhile, the secure core BootRom will access the Flash through the secure SPI serial port.  It reads the signature, hash value and encrypted SPL program, while the non-secure core is also powered on and reset. the Secure Configuration Register value.

(b) As shown in Figure 2-2, the security core calls the Crypto encryption and decryption module. Firstly, it calls the RSA algorithm and uses the PublicKey solidified in the security core BootRom to verify the signature while obtaining the correct hash value. Secondly, it calls the SM3/SHA256 module to process the received SPL program. This will generate a new hash value. If the result is consistent, the identity verification and data integrity verification are passed. Last but not the least, if the identity verification and data integrity verification are passed, the SPL program is decrypted with AES/SM4 using the RootKey. RootKey and PublicKey are carried to SRAM0. The TEE security core running address also jumps to the valid code start entry of SRAM0.

(c) We introduce a code program for data carry in the SPL program. Similarly, the second stage of Bootloader will carry encrypted TEE initialization and Runtime, as well as Normal World’s REE initialization and Runtime, the signature, hash value. After passing the authentication and data integrity verification, the code program is decrypted and the TEE initialization and Runtime program is carried to SRAM1. While the REE initialization and Runtime program is carried to the storage area of Normal World, the TEE secure core running address first jumps to SRAM1 which belongs to the secure world. This process starts the security configuration of the TEE OS, i.e., configuring the PMP registers to restrict the non-secure core’s access to access to security resources.

(d) After the security core completes the security configuration, it writes a value to the Secure Configuration Register. Once the non-secure core queries the change in the value of this register, it jumps the non-secure core running address to the entry location of the REE initialization and Runtime, i.e., it starts running the non-secure world,,thus ensuring the start of the the non-secure core.

(e) Subsequently, the communication between REE and TEE is based on the Mailbox interrupt mechanism; the REE core will write data and instructions to shared memory, then trigger an interrupt and send the stored address information to the TEE core. TEE core will respond to the interrupt according to the request; If the response is successful, TEE core will get the data and instructions according to the accepted address information, and then process the data according to the type of instructions in response to the REE core.

Figure 2-2 Security verification and decryption process

Building a hardware SOC system

The original Wujian 100 platform is shown in Figure 3-1.

Figure 3-1 Wujian 100 SOC platform architecture

The final hardware SOC platform architecture is shown in Figure 3-2;

Figure 3-2 Dual-core TEE security SOC platform architecture

 

  1. Dual-core

In order to meet independency software programs, each core has been incorporated piece of storage area. The two CPUs are tightly coupled by cutting out the original ISRAM and adding an ICACHE inside each CPU, since ICACHE is directly connected to the instruction bus without passing through the bus matrix, the system can speed up the instruction transfer as well as faster CPU processing speed.

2. AES/SM4

AES-128/192/256 and SM4 adopt key extension and wheel transformation parallel reconfigurable design, merging and integrating of four algorithm modes into one IP; the S-boxes of AES and SM4 are based on composite domain GF((2^2)^2)^2)^2) regular basis reconfigurable design, using reconfigurable S-boxes based on secret shared threshold, resisting side channel analysis attacks such as CPA, DPA and DFA, etc., design AES with SM4 based on random mask to resist power attack schemes, design error detection schemes for circuits based on error detection sub-modules, and optimize the design by combining reconfigurable ideas.

    1. Reconfigurable design of AES&SM4:
  • Multiplexing of the interface layer in sequence: including input of bus data, allocation of control registers, storage of calculation results and output of operation status.
  • Multiplexing of the control layer: all jump states can be combined into the same state, thus merging multiple state machines into one.

Figure 3-3(a) AES structure schematic Figure 3-3(b) SM4 structure schematic

 

  • Multiplexing of sub-operational layers:

(a) From the algorithm principle and hardware architecture of AES and SM4, it is clear that S-box is the core unit for encryption decryption and key expansion. It is also the module that takes up more resources. Analyzing S-box operations. The composite domain-based design can reconfigure S-boxes by merging three different S-boxes. The S-box can be found by algebraic calculations. These are the main steps of which can be implemented in three steps. Pre-affine transformations, multiplicative inverse operations on GF(28) domain, post-affine transformation. The structure of S-box circuit based on composite domain AES&SM4 algorithm is shown in Figure 3-4.

Figure 3-4 Complex domain-based S-box circuit structure

(b) The matrix multiplication operation on the GF(28) domain controls the switching between column mixing and inverse column mixing by algorithm selection signal.  It is designed to reduce hardware overhead of column mixing. The multiplexing of the encrypted and decrypted column mixing circuits, as shown in Figure 3-5.

Figure 3-5 Encrypted and Decrypted column mixing circuits

(c) While the line shift and linear transformation can be implemented by concatenation, the inverse order transformation is only used in the last round of SM4, causing none of them to be considered for reconstruction process.

3. RSA

We build a security chip based on international algorithms, use different encryption and decryption keys to encrypt and decrypt data information at the time of application, realize the functions of signature verification, and provide a complete, hardware-based protection mechanism solution. 

For RSA, the modulo-power operation uses an R-L binary expansion method combining randomization and pseudo-operations; the modulo-multiplication operation uses a modified high-speed Montgomery method with a base of 256bits, the computational flow of RSA is shown in Figure 3.6. By adding exponential masks and random power balancing operations to the mode power operation layer, the RSA algorithm is successfully tested against SPA/CPA/DPA/CFA attacks through the OSR side channel attack platform, the anti-attack scheme is shown in Figure 3.7. 

Figure 3-6 The computational flow of RSA

4. Anti-attack design

For various side-channel attacks, we use a two-order mask defense method based on data randomization. This approach divides the state information into two parts: an initialization part and a running part. The two parts use explicit randomized masking and exponential randomized masking by transforming and inverse transforming the mathematical operations.

The RSA anti-attack design route utilizes a similar mask to randomize the input plaintext as well as the exponentiation. It is followed by the introduction of a pseudo-operation in the modulo-power operation process to randomize the modulo-square and modulo-multiplication numbers. As a result, this will disrupt the correlation between the modulo-square and modulo-multiplication numbers and the key, thus effectively resisting the SPA, DPA, and CPA attacks.

The main idea of the symmetric anti-power attack protection method is to cut off the intrinsic connection between the encrypted intermediate value and the actual power consumption. The scheme mainly adopts the full mask as well as error detection for the anti-attack design. The random mask is generated by a true random number generator IP, which is based on a ring oscillator. Its generation of completely non-deterministic true random numbers is linked to the CPU through the APB, and the CPU reads the random numbers and then provides them to the algorithm IP.

Figure 3-7 anti-attack scheme

5. SM3/SHA256

Both SM3 and SHA256 algorithms have similarities in message padding, expansion, and compression processes, so this module reconfigures SM3 and SHA256 with design to reuse the circuits to save resources.SM3 and SHA256 hardware implementation structure diagram is shown in Figure 3-6 below. 

Figure 3-6 SM3&SHA256 hardware structure diagram

(1) Expansion circuit. The expansion circuit of this algorithm needs to have 64/68 intermediate variables. This module uses the idea of a shift to optimize the message expansion circuit. The specific flow is shown in Figure 3-7, only 16 registers are defined, which are W0~W15. When 16 data are input to W0~W15, message expansion is performed, and the values of these 16 registers are shifted forward in the next cycle and the value of W0 is rounded off. The cycle is repeated, and the message expansion is performed sequentially until the 64/68 intermediate variables are calculated. In the compression process, only the value of W15 needs to be taken each time for the operation.

Figure 3-7 Message Expansion Circuit

(2) Compression function optimization. The compression function is a key step in the hardware implementation of the hash algorithm. There are also a large number of iterative loop operations inside it. Each round of loop iterations has a large number of heterogeneous shift or addition operations. Since the intermediate processes of SM3 and SHA256 are partly the same, this module reconfigures the two algorithms for a special design to reduce hardware resource consumption. The specific iterative compression process is shown in Figure 3-8.

Figure 3-8 Structure of SM3&SHA256 iterative compression

6. IOPMP

To improve platform security, bus masters accessing slave devices need a mechanism to regulate their access to protect sensitive data stored in the slave devices from intentional or unintentional disclosure or tampering.

The PMP regulates the access behavior of the RISC-V core. This design proposes an enhanced Input/Output Physical Memory Protection Unit, referred to as IOPMP, for regulating its access behavior, as shown in Figure 3-11.

(1) The security platform needs to work with at least three components, PMP, IOPMP, and secure boot. Secure boot is a mechanism that ensures the integrity of the subsequent code that executes the other two components. PMP and IOPMP should be properly given a basic initialization, without secure boot, PMP and IOPMP will not be able to protect the platform.

(2) PMP and IOPMP initialize the highest priority rules in advance to areas that can only be accessed by the TEE, provide basic protection for sensitive areas by disabling unneeded MASTER in specific areas, blocking sensitive areas, and then locking those rules.

Figure 3-11 PMP&IOPMP

In the above figure, the PMPs of the constrained RISC-V cores are regulated and their PMPs are correctly initialized during secure boot with a MID of 0In some slave devices at the bottom, data may be legal or illegal before it reaches the slave device, so IOPMP should be placed accordingly to avoid illegal transactions.

7. Mailbox

As shown in Figure 2-6, the TEE core and the REE core communicate with mailbox, and the TEE wants to send data to the REE.  

(1) The TEE core first writes data to a specified shared memory buffer, which triggers an interrupt in the mailbox and writes the address information about the shared memory to the REE.

(2) REE reads the buffer data sent by the TEE core in the shared memory space by getting the information from the interrupt.

(3) After the REE core has read the data, it will trigger a mailbox interrupt to the TEE in response, telling the TEE core that the data you sent me has been accepted.

The process of sending data from REE to TEE is similar to the above, except that the relationship between sending and receiving, the object triggering the interrupt and the response object are swapped.

Figure 2-6 mailbox communication