Abstract: Heterogeneous computers integrate general-purpose host processors with domain-specific accelerators to combine versatility with efficiency and high performance. To realize the full potential of heterogeneous computers, however, many hardware and software design challenges have to be overcome. While architectural and system simulators can be used to analyze heterogeneous computers, they are faced with unavoidable compromises between simulation speed and performance modeling accuracy. In this work we present HEROv2, an FPGA-based research platform that enables accurate and fast exploration of heterogeneous computers consisting of accelerators based on clusters of 32-bit RISC-V cores and an application-class 64-bit ARMv8 or RV64 host processor. HEROv2 allows to seamlessly share data between 64-bit hosts and 32-bit accelerators and comes with a fully open-source on-chip network, a unified heterogeneous programming interface, and a mixed-data-model, mixed-ISA heterogeneous compiler based on LLVM. We evaluate HEROv2 in four case studies from the application level over toolchain and system architecture down to accelerator microarchitecture. We demonstrate how HEROv2 enables effective research and development on the full stack of heterogeneous computing. For instance, the compiler can tile loops and infer data transfers to and from the accelerators, which leads to a speedup of up to 4.4x compared to the original program and in most cases is only 15 % slower than a handwritten implementation, which requires 2.6x more code.