Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations.

RSS Source
Yuanzhi Li, Tengyu Ma, Hongyang Zhang

We show that the (stochastic) gradient descent algorithm provides an implicitregularization effect in the learning of over-parameterized matrixfactorization models and one-hidden-layer neural networks with quadraticactivations. Concretely, we show that given $\tilde{O}(dr^{2})$ random linearmeasurements of a rank $r$ positive semidefinite matrix $X^{\star}$, we canrecover $X^{\star}$ by parameterizing it by $UU^\top$ with $U\in\mathbb{R}^{d\times d}$ and minimizing the squared loss, even if $r \ll d$. Weprove that starting from a small initialization, gradient descent recovers$X^{\star}$ in $\tilde{O}(\sqrt{r})$ iterations approximately. The resultssolve the conjecture of Gunasekar et al.'17 under the restricted isometryproperty. The technique can be applied to analyzing neural networks withquadratic activations with some technical modifications.

Stay in the loop.

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.