HPC Meets Cloud: Opportunities and Challenges in Designing High-Performance MPI and Big Data Libraries on Virtualized InfiniBand Clusters

About VisorHPC

Although virtualization solutions are quite common in server farms and cloud computing environments, their usage is in contrast by far not prevalent in the domain of high-performance computing (HPC). This is because, at least up to now, virtualization solutions like the employment of virtual machines (VM) have proven to be too heavyweight to be acceptable in scalable HPC systems. However, future exascale systems, equipped with a much higher degree of computing power but also with a much larger amount of computing cores than today’s HPC systems, will demand for resiliency and malleability features that may only be provided by increasing likewise the degree of virtualization within the systems. Moreover, recent advancements, e.g. concerning container-based virtualization or regarding hardware abstraction of high-performance interconnects, currently propel the idea of introducing more virtualization solutions even in the domain of HPC. Prominent examples for such added values stemming from virtualization and fostering resiliency, usability and malleability also in HPC systems are the possibility of live-migration and transparent checkpoint/restart, as well as the option to provide each user with an individual environment in terms of customized VM images.

Keynote Abstract

Significant growth has been witnessed during the last few years in HPC clusters with multi-/many-core processors, accelerators, and high-performance interconnects (such as InfiniBand, Omni-Path, iWARP, and RoCE). To alleviate the cost burden, sharing HPC cluster resources to end users through virtualization for both scientific computing and Big Data processing is becoming more and more attractive. The recently introduced Single Root I/O Virtualization (SR-IOV) technique for InfiniBand and High Speed Ethernet provides native I/O virtualization capabilities and is changing the landscape of HPC virtualization. However, SR-IOV lacks locality-aware communication support, which leads to performance overheads for inter-VM communication even within the same host. In this talk, we will first present our recent studies done on MVAPICH2-Virt MPI library over virtualized SR-IOV-enabled InfiniBand clusters, which can fully take advantage of SR-IOV and IVShmem to deliver near-native performance for HPC applications under Standalone, OpenStack, and Containers environments. In the second part, we will present a framework for extending SLURM with virtualization-oriented capabilities, such as dynamic virtual machine creation with SR-IOV and IVShmem resources, to effectively run MPI jobs over virtualized InfiniBand clusters. Next, we will demonstrate how high-performance solutions can be designed to run Big Data applications (like Hadoop) in HPC cloud environments. Finally, we will share our experiences of running these designs on the Chameleon Cloud testbed.

The keynote will be delivered by Dhabaleswar K. (DK) Panda of The Ohio State University.

DK Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 400 papers in the area of high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, Omni-Path, iWARP and RoCE) libraries, designed and developed by his research group, are currently being used by more than 2,700 organizations worldwide (in 83 countries). More than 405,000 downloads of this software have taken place from the project’s site. As of Nov’16, this software is empowering several InfiniBand clusters (including the 1st, 13th, 17th, and 40th ranked ones) in the TOP500 list. The RDMA packages for Apache Spark, Apache Hadoop, Apache HBase, and Memcached together with OSU HiBD benchmarks from his group are also publicly available. These libraries are currently being used by more than 200 organizations in 29 countries. More than 19,000 downloads of these libraries have taken place. He is an IEEE Fellow.

More details about Prof. Panda are available at


Workshop Topics

All subjects concerning virtualization solutions in HPC, especially (but not limited to):

Call for Papers

Submission Guidelines

Preliminary Deadlines

Program Committee

Organizers and Contact