With the high performance Alibaba T-Head XuanTie processor coming to market, we believe it will benefit the RISC-V industry to port RISCV to the Android OS. Over the past year, Alibaba T-Head has spent tremendous effort porting Android 10 to the RISC-V instruction set architecture (ISA). Of course, the Android ecosystem is rather complicated. Though we are still far from being done, we have made good progress, especially on some of the key aspects such as Android NDK, Bionic, ART and Chrome. In this blog, we will provide insight on how to facilitate those under RISC-V. The work is currently being done on a XuanTie ICE dev-board, which has two cores of the XuanTie C910 RISC-V processor running at 1200MHz.
Details about Android Porting
Below is a diagram of the whole Android Ecosystem, including AOSP Android OS source code, Soc Android BSP and ODM/OEM Android BSP, along with some tools and SDKs. Here our work is focused on the low level Android OS support, along with toolsets for Android OS like NDK, emulator and VNDK.
Porting to Android involves the below repos to build on the low level of Android OS:
Android OS repo
$repo init -u https://android.googlesource.com/platform/manifest-b android-10.0.0_r11
NDK repo
$repo init -u https://android.googlesource.com/platform/manifest -b ndk-r20 $repo sync $./ndk/checkbuild.py
CLANG/LLVM repo
$repo init -u https://android.googlesource.com/platform/manifest-b llvm-toolchain $repo sync $python toolchain/llvm_android/build.py
Emulator repo
$repo init -u https://android.googlesource.com/platform/manifest-b emu-master-dev $repo sync $cd external/qemu/ $./android/rebuild.sh
GCC repo
$git clone https://github.com/riscv/riscv-gnu-toolchain.git $git submodule update --init --recursive //change the target triplets $./configure --prefix=/opt/riscv make linux
Below are the basic steps to add RISC-V CPU architecture support into Android OS:
- Android OS
- Bionic in Android OS repo
- Update bionic syscall APIs and CPU arch Linux kernel header
- Update new bionic headers into CLANG/LLVM Android repo
- Update new bionic headers into NDK Android repo
- Add RISCV64 build target and build configuration into Android OS build system
- Add RISCV64 GCC version into Android OS build system
- Add first version NDK and CLANG
- Update Android OS make file to add riscv64 build target
- Build out some libraries in Android OS repo with first version CLANG, and NDK.
- Build out full Android OS image with full version CLANG, and NDK.
- Bionic in Android OS repo
- GCC
- Generate GCC version 8.1 with RISCV support
- CLANG/LLVM
- Add RISCV support into CLANG/LLVM Android version
- Build first version of CLANG toolchain
- Build out full version of CLANG and updated to Android OS repo
- NDK
- Add RISC-V build target into NDK
- Update Android OS repo with NDK, CLANG and GCC
- Update those libraries from the Android OS build into NDK repo to build out the full version NDK
Bionic
Bionic is Android’s C function library as shown in the figure below. One of the main functions it provides is to help user space application trap into the kernel through system calls. Bionic also provides various interfaces for applications such as dynamic loading, file access, memory allocation, etc. Furthermore, it provides those standard C APIs such as “printf” and “malloc/free”. All those are implemented in Bionic, which is compliant with the ISO and POSIX standard definitions, to provide native runtime libraries for Android environments.
The porting work includes the following parts:
System call
System call is the general entrance for user mode programs to access system services and resources provided by the Linux kernel. The Linux kernel defines around 436 general system calls in the include/uapi/asm-generic/unistd.h header file. These calls cover various scenarios such as file system, process management, clock, memory, and network. Some instruction architectures will implement several additional system calls due to compatibility, hardware specific functions like cache flashing, TLS configuration, old version interfaces, and so on.
To add the RISC-V system call, three files need to be updated: SYSCALLS.TXT, unistd.h and gensyscalls.py:
- SYSCALLS.TXT defines the list of system calls supported by the instruction architecture.
- unistd.h defines the system calls Configuration and serial number.
- gensyscalls.py is used to generate hundreds of separate assembly entry files based on the system calls generated by the above two files.
ENTRY(syscall_name) li a7, __NR_syscall_ scall li a7, -MAX_ERRNO bgtu a0, a7, 1f; ret 1: neg a0, a0 j __set_errno_internal END(syscall_name)
Most of the system call assembly implementation is as shown above:
- The call number will be placed in the index register.
- Parameters order or arrangement adjustment is needed if C API definition is not aligned with the kernel ABI definition. As RISC-V’s ABI definition is aligned with standard C APIs, nothing needs to be reordered.
- The system call will be further processed in the S state through the “scall” instruction.
- Once the system call is serviced, the return result is stored in a0 and will be returned to the next system call’s next instruction through the “sret” instruction.
- The return value will be compared with the standard error number. If the return value is in the error range, the negative error number is stored in errno, or if the return value is normal, the value of a0 is directly returned.
Several system calls have different steps. Some may return to different processes’ user modes, such as clone and fork. Those calls are stored in libc/arch-riscv64/bionic, which the parameter preparation and return process are different from generic system calls. The system call exported to bionic can be customized through the source code “arch/include/uapi/asm/unistd.h”. For example, the following macro switches can control the interface implementation of rename and stat related calls.
#define __ARCH_WANT_RENAMEAT #define __ARCH_WANT_NEW_STAT
These Linux Kernel header files under uapi define the behavior of user mode related interfaces, which can be updated to bionic through the generate_uapi_headers.sh script:
./libc/kernel/tools/generate_uapi_headers.sh --use-kernel-dir kernel_path
Thread local storage
Thread local storage (TLS) is private to each thread, and the address area where a single thread does not affect each other after modifying its variables. As shown below, the global TLS will find the last variable address according to the TP and tls_index data structure by generating __tls_get_addr call:
__thread int i __attribute__((tls_model("global-dynamic"))); typedef struct { unsigned long int ti_module; unsigned long int ti_offset; } tls_index; //assembly code example: la.tls.gd a0,i call __tls_get_addr@plt mv a5,a0 lw t0,0(a5) addi t0,t0,1 sw t0,0(a5)
Compared with some architectures that need to get TLS in kernel mode through instructions, the TLS pointer on RISC-V is stored in the x4(tp) register, and its thread variable access efficiency can be greatly improved.
# define __get_tls() ({ void** __val; __asm__("mv %0, tp" : "=r"(__val)); __val; }) __LIBC_HIDDEN__ void __set_tls(void* tls) { asm("mv tp, %0" : : "r" (tls)); }
In Bionic, tp points to the bionic_tcb structure, and there are 10 slots for storing variables or pointers such as tp, thread id, opengl, and dtv:
#define MIN_TLS_SLOT -10 #define TLS_SLOT_SELF -10 #define TLS_SLOT_THREAD_ID -9 #define TLS_SLOT_APP -8 #define TLS_SLOT_OPENGL -7 #define TLS_SLOT_OPENGL_API -6 #define TLS_SLOT_STACK_GUARD -5 #define TLS_SLOT_SANITIZER -4 #define TLS_SLOT_ART_THREAD_SELF -3 #define TLS_SLOT_DTV -2 #define TLS_SLOT_BIONIC_TLS -1 #define MAX_TLS_SLOT -1
The “RISC-V ELF psABI specification” describes the address relocation logic and related data structures of the four modes of TLS LE (Local Exec), IE (Initial Exec), LD (Local Dynamic), and GD (Global Dynamic). It doesn’t define the structure on TLS. Therefore, when the Android TLS supports the RISC-V architecture, we can reuse the layout of Glibc:
TP points to the end address of Bionic TCB. The block after TP is used to store TLS variables. TP+0x800 points to DTV. When processing TLS-related relocations, the linker calculates an offset that is fixed at 0x800 compared to DTV. Therefore, when processing TLS and TLS-related relocations, the corresponding offset needs to be added back to the TLS variable:
@@ -319,7 +326,11 @@ extern "C" void* TLS_GET_ADDR(const TlsIndex* ti) TLS_GET_ADDR_CCONV { if (__predict_true(generation == dtv->generation)) { void* mod_ptr = dtv->modules[__tls_module_id_to_idx(ti->module_id)]; if (__predict_true(mod_ptr != nullptr)) { +#ifdef __riscv + return static_cast<char*>(mod_ptr) + ti->offset + 0x800; +#else return static_cast<char*>(mod_ptr) + ti->offset; +#endif } }
Floating point processing
The bionic library also includes the mathematic library, which needs to handle the CPU architecture difference on the hardware floating-point unit, like floating-point exception flags, rounding modes, and status bits.
The hardware floating-point control register on RISC-V is FCSR, which contains two fields: exception accumulation and rounding mode:
- “exception accumulation” is used to monitor inaccurate, overflow, division by zero, and invalid operations in the software process. The state of the content in the program can determine whether it is software floating point recalculation, re-preparation of parameters, or other operations;
- “rounding mode” is often used for calculations in different algorithms or application scenarios.
Related operations are implemented in the libm/riscv64/fenv.c file, and the operations based on the floating-point status register implement the related interfaces in the fev.h standard header:
int feclearexcept(int __exceptions); //Clear exception state int fegetexceptflag(fexcept_t* __flag_ptr, int __exceptions); //Acquire exception state int feraiseexcept(int __exceptions); //Enable one exception state int fesetexceptflag(const fexcept_t* __flag_ptr, int __exceptions); //Set one exception state int fetestexcept(int __exceptions); //Check one exception state int fegetround(void); //Set the "rounding mode" int fesetround(int __rounding_mode); //Acquire the "rounding mode" int fegetenv(fenv_t* __env); //Get The the overall float-point status and setting int feholdexcept(fenv_t* __env); //Save current float-point exception state int fesetenv(const fenv_t* __env); //Set the overall float-point status and setting int feupdateenv(const fenv_t* __env); //Update the overall float-point status and setting
There is also the floating-point exception accumulation status register (FFLAGS) and the floating-point dynamic rounding mode register (FRM). These two registers are used for status checking, like the optimization of some floating-point algorithm libraries. In the bionic library, the two registers are not used.
As RISC-V does not support floating-point exception reporting, we created an empty implementation of the below two functions:
int feenableexcept(int mask __unused) { return -1; } int fedisableexcept(int mask __unused) { return 0; }
In addition to exception handling, there are some user/kernel contexts that need to be concerned with floating-point related registers. For example, there are __fpregs related fields in ucontext, which need to be set in “setcontext”:
union __riscv_mc_fp_state { struct __riscv_mc_f_ext_state __f; struct __riscv_mc_d_ext_state __d; struct __riscv_mc_q_ext_state __q; }; typedef union __riscv_mc_fp_state* fpregset_t; typedef struct mcontext_t { __riscv_mc_gp_state __gregs; union __riscv_mc_fp_state __fpregs; } mcontext_t;
Performance optimization
Among those hundreds of C library functions, memory and string related APIs are performance-critical functions in most cases. Using high-bandwidth variable-length vector instructions to implement these APIs can bring considerable improvements to the overall performance of the system. We provided a basic RISC-V implementation for those APIs.
NDK
There is an unbreakable triangular dependency among NDK, Android OS, and CLANG compilation tools. The current Arm NDK depends on the previous version of the NDK and CLANG tools. The new version always depends on the old version, where the old version depends on the older version, and the earliest version depends on the Android 2.3 version 10 years ago. It is impossible for us to have all that software ported to RISC-V. The way we generate the RISC-V NDK is quite different with Arm NDK. Below is a diagram to demonstrate the difference on generation of RISC-V NDK.
Emulator
The Android emulator is based on QEMU and is connected through an intermediate glue layer. The outermost layer provides functions such as virtual device management, image loading, GPU acceleration, camera simulation, network mapping and so on. It can also be compiled with different configurations for TVs, phones, wearables, tablets, car applications and so on.
Most of the features provided by the emulator have nothing to do with architecture, and RISC-V related tcg instruction response and C help function are already supported in the external/qemu/target/riscv directory, so here we only need to add cmake compilation support, the architecture string of the emulator, and the goldfish virtual device configuration under external/qemu/hw/riscv/. The virtual device configuration includes memory configuration, device tree creation, interrupt controller initialization, virtual device creation, and firmware loading as shown below:
... static const struct MemmapEntry { //memory region ... static void riscv_ranchu_board_init(MachineState *machine) { ... /* Init memory region */ memory_region_init_ram(main_mem, NULL, "riscv_ranchu_board.ram", machine->ram_size, &error_fatal); ... /* device tree creation */ fdt = create_fdt(s, memmap, machine->ram_size, machine->kernel_cmdline); ... /* create and register the interrupt controller */ s->plic = xxxx_plic_create(memmap[RANCHU_PLIC].base, ... /* create goldfish device */ create_device(s, fdt, RANCHU_GOLDFISH_FB); ... /* load the opensbi */ riscv_load_firmware (machine->firmware, memmap[RANCHU_DRAM].base, NULL); ...
Android build system
Android’s build system is defined in the build directory, which is divided into two parts: make and soong.
- The soong part maintains the function library path, compilation link options, ABI, architecture name string and other related content for the architecture.
- The make part mainly includes architecture name matching, tool chain paths and parameters, and general board-level related configurations.
The VNDK and SDK in the pre-compiled project are dependent on the build system, so the compilation framework and bionic are required to be supported before the two pre-compiled projects are generated. Adding a virtual RISC-V platform into build system requires adding the following files:
core/combo/TARGET_linux-riscv64.mk //ABI related compiling/linking option core/combo/arch/riscv64/riscv64.mk //instruction set, soc related build option target/board/generic_riscv64/BoardConfig.mk //Board and emulator setting target/board/generic_riscv64/device.mk //runtime configuration target/board/generic_riscv64/system.prop //property variable target/product/aosp_riscv64.mk //packages definition, and product name definition
ART
The Android application is a dalvik bytecode program compiled and generated based on the JAVA language. The function of the virtual machine is to interpret and execute the bytecode.
The process can be described with the following figure.
From the running process of the ART virtual machine in the above figure, there are three main parts that are the key to transplanting RISC-V:
- AOT Compiler: A tool for generating .oat files in the figure. The oat files contain executable binary code files. The function of the AOT compiler is to compile the dex bytecode into oat files. Under the default configuration, dex2oat will be called to complete when compiling or installing.
- JIT compiler: In the purple part of the figure, during the running of the dex bytecode, ART records whether the method executed is a hot method and generates profiling information. The role of the JIT compiler compiles hot methods based on profiling information.
- Interpreter: dex bytecode interpreter, used to execute Android’s dex bytecode. In addition, both Interpreter and compiler will use assembler and disassembler.
In the following content, we will give a brief introduction to the porting work from the aspects of assembler, interpreter, and compiler.
- Assembler: The function of the assembler is to convert the compiled instructions into machine code, which is the basic component of the compiler part in ART. In the process of porting to the RISC-V architecture, the assembly function of all the instruction sets of RISC-V is completed.
- Interpreter: The role of the interpreter is to interpret and execute the dex bytecode. During the RISC-V instruction architecture support process, this part of the work focuses on:
- Translate dex bytecode into RISC-V instructions
- Saving and restoring the context of c++/java field conversion
- Compiler: The AOT compiler and JIT compiler use the same set of compilation framework in ART, and reuse the same set of implementation codes. The purpose of both compilers is to compile dex bytecode into executable binary code.
Below is the diagram about the summary of RISC-V porting in ART.
RISC-V Android Source code downloading and building
All our work for Android RISC-V is located in the git: https://github.com/T-head-Semi/aosp-riscv
Before downloading AOSP source code, please check your operating system. It’s suggested to have a Linux system (Ubuntu is preferred) with more than 200GB disk space and more than 8 CPU cores. You can then follow the instructions from the links below to set up the build environment:
https://source.android.com/setup/develop#installing-repo
https://source.android.com/setup/build/initializing
- Run reproduce.sh to download AOSP to current directory and build AOSP for RISC-V emulator.
- Use the following commands to start emulator:
cd ${AOSP_RISCV_BUILD_TOP} source ./build/envsetup.sh lunch aosp_riscv64-eng emulator -selinux permissive -qemu -smp 2 -m 3800M -bios ${AOSP_RISCV_BUILD_TOP}/prebuilts/qemu-kernel/riscv64/ranchu/fw_jump.bin