MoNet: Moments Embedding Network.
Bilinear pooling has been recently proposed as a feature encoding layer,which can be used after the convolutional layers of a deep network, to improveperformance in multiple vision tasks. Instead of conventional global averagepooling or fully connected layer, bilinear pooling gathers 2nd orderinformation in a translation invariant fashion. However, a serious drawback ofthis family of pooling layers is their dimensionality explosion. Approximatepooling methods with compact property have been explored towards resolving thisweakness. Additionally, recent results have shown that significant performancegains can be achieved by using matrix normalization to regularize unstablehigher order information. However, combining compact pooling with matrixnormalization has not been explored until now.
In this paper, we unify the bilinear pooling layer and the global Gaussianembedding layer through the empirical moment matrix. In addition, with aproposed novel sub-matrix square-root layer, one can normalize the output ofthe convolution layer directly and mitigate the dimensionality problem withoff-the-shelf compact pooling methods. Our experiments on three widely usedfine-grained classification datasets illustrate that our proposed architectureMoNet can achieve similar or better performance than G2DeNet . When combinedwith compact pooling technique, it obtains comparable performance with theencoded feature of 96% less dimensions.
Stay in the loop.
Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.