A Hypervisor for Shared-Memory FPGA Platforms

Session: Virtualized acceleration--Don't keep it real!

Authors: Jiacheng Ma (University of Michigan); Gefei Zuo (University of Michigan); Kevin Loughlin (University of Michigan); Xiaohe Cheng (Hong Kong University of Science and Technology); Yanqiang Liu (Shanghai Jiao Tong University); Abel Eneyew (Addis Ababa Institute of Technology); Zhengwei Qi (Shanghai Jiao Tong University); Baris Kasikci (University of Michigan)

Cloud providers widely deploy FPGAs as application-specific accelerators for customer use. These providers seek to multiplex their FPGAs among customers via virtualization, thereby reducing running costs. Unfortunately, most virtualization support is confined to FPGAs that expose a restrictive, host-centric programming model in which accelerators cannot issue direct memory accesses (DMAs). The host-centric model incurs high runtime overhead for workloads that exhibit pointer chasing. Thus, FPGAs are beginning to support a shared-memory programming model in which accelerators can issue DMAs. However, virtualization support for shared-memory FPGAs is limited. This paper presents Optimus, the first hypervisor that supports scalable shared-memory FPGA virtualization. Optimus offers both spatial multiplexing and temporal multiplexing to provide efficient and flexible sharing of each accelerator on an FPGA. To share the FPGA-CPU interconnect at a high clock frequency, Optimus implements a multiplexer tree. To isolate each guest’s address space, Optimus introduces the technique of page table slicing as a hardware-software co-design. To support preemptive temporal multiplexing, Optimus provides an accelerator preemption interface. We show that Optimus supports eight physical accelerators on a single FPGA and improves the aggregate throughput of twelve real-world benchmarks by 1.98x-7x.