Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations.

We show that the (stochastic) gradient descent algorithm provides an implicitregularization effect in the learning of over-parameterized matrixfactorization models and one-hidden-layer neural networks with quadraticactivations. Concretely, we show that given $\tilde{O}(dr^{2})$ random linearmeasurements of a rank $r$ positive semidefinite matrix $X^{\star}$, we canrecover $X^{\star}$ by parameterizing it by $UU^\top$ with $U\in\mathbb{R}^{d\times d}$ and minimizing the squared loss, even if $r \ll d$. Weprove that starting from a small initialization, gradient descent recovers$X^{\star}$ in $\tilde{O}(\sqrt{r})$ iterations approximately. The resultssolve the conjecture of Gunasekar et al.'17 under the restricted isometryproperty. The technique can be applied to analyzing neural networks withquadratic activations with some technical modifications.