Smooth Loss Functions for Deep Top-k Classification.

RSS Source
Authors
Leonard Berrada, Andrew Zisserman, M. Pawan Kumar

The top-k error is a common measure of performance in machine learning andcomputer vision. In practice, top-k classification is typically performed withdeep neural networks trained with the cross-entropy loss. Theoretical resultsindeed suggest that cross-entropy is an optimal learning objective for such atask in the limit of infinite data. In the context of limited and noisy datahowever, the use of a loss function that is specifically designed for top-kclassification can bring significant improvements. Our empirical evidencesuggests that the loss function must be smooth and have non-sparse gradients inorder to work well with deep neural networks. Consequently, we introduce afamily of smoothed loss functions that are suited to top-k optimization viadeep learning. The widely used cross-entropy is a special case of our family.Evaluating our smooth loss functions is computationally challenging: a na\"ivealgorithm would require $\mathcal{O}(\binom{n}{k})$ operations, where n is thenumber of classes. Thanks to a connection to polynomial algebra and adivide-and-conquer approach, we provide an algorithm with a time complexity of$\mathcal{O}(k n)$. Furthermore, we present a novel approximation to obtainfast and stable algorithms on GPUs with single floating point precision. Wecompare the performance of the cross-entropy loss and our margin-based lossesin various regimes of noise and data size, for the predominant use case of k=5.Our investigation reveals that our loss is more robust to noise and overfittingthan cross-entropy.

Stay in the loop.

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.