DistME: A Fast and Elastic Distributed Matrix Computation Engine using GPUs

Donghyoung Han, Yoon-Min Nam, Jihye Lee, Kyongseok Park, Hyunwoo Kim, Min-Soo Kim

Matrix computation, in particular, matrix multiplication is time-consuming, but essentially and widely used in a large number of applications in science and industry. The existing distributed matrix multiplication methods only focus on either low communication cost (i.e., high performance) with the risk of out of memory or large-scale processing with high communication overhead. We propose a distributed elastic matrix multiplication method called CuboidMM that achieves both high performance and large-scale processing. We also propose a GPU acceleration method that can be combined with CuboidMM. CuboidMM partitions matrices into cuboids for optimizing the network communication cost with considering memory usage per task, and the GPU acceleration method partitions a cuboid into subcuboids for optimizing the PCI-E communication cost with considering GPU memory usage. We implement a fast and elastic matrix computation engine called DistME by integrating CuboidMM with GPU acceleration on top of Apache Spark. Through extensive experiments, we have demonstrated that CuboidMM and DistME significantly outperform the state-of-the-art methods and systems, respectively, in terms of both performance and data size.

Stay in the loop.

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.