3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning.
The performance and efficiency of distributed machine learning (ML) dependssignificantly on how long it takes for nodes to exchange state changes.Overly-aggressive attempts to reduce communication often sacrifice final modelaccuracy and necessitate additional ML techniques to compensate for this loss,limiting their generality. Some attempts to reduce communication incur highcomputation overhead, which makes their performance benefits visible only overslow networks.
We present 3LC, a lossy compression scheme for state change traffic thatstrikes balance between multiple goals: traffic reduction, accuracy,computation overhead, and generality. It combines three newtechniques---3-value quantization with sparsity multiplication, quarticencoding, and zero-run encoding---to leverage strengths of quantization andsparsification techniques and avoid their drawbacks. It achieves a datacompression ratio of up to 39--107X, almost the same test accuracy of trainedmodels, and high compression speed. Distributed ML frameworks can employ 3LCwithout modifications to existing ML algorithms. Our experiments show that 3LCreduces wall-clock training time of ResNet-110--based image classifiers forCIFAR-10 on a 10-GPU cluster by up to 16--23X compared to TensorFlow's baselinedesign.
Stay in the loop.
Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.