Horovod: fast and easy distributed deep learning in TensorFlow.
Training modern deep learning models requires large amounts of computation,often provided by GPUs. Scaling computation from one GPU to many can enablemuch faster training and research progress but entails two complications.First, the training library must support inter-GPU communication. Depending onthe particular methods employed, this communication may entail anywhere fromnegligible to significant overhead. Second, the user must modify his or hertraining code to take advantage of inter-GPU communication. Depending on thetraining library's API, the modification required may be either significant orminimal.
Existing methods for enabling multi-GPU training under the TensorFlow libraryentail non-negligible communication overhead and require users to heavilymodify their model-building code, leading many researchers to avoid the wholemess and stick with slower single-GPU training. In this paper we introduceHorovod, an open source library that improves on both obstructions to scaling:it employs efficient inter-GPU communication via ring reduction and requiresonly a few lines of modification to user code, enabling faster, easierdistributed training in TensorFlow. Horovod is available under the Apache 2.0license at https://github.com/uber/horovod
Continue reading and listening
Stay in the loop.
Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.