# On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition.

We establish connections between the problem of learning a two-layers neuralnetwork with good generalization error and tensor decomposition. We consider amodel with input $\boldsymbol x \in \mathbb R^d$, $r$ hidden units with weights$\{\boldsymbol w_i\}_{1\le i \le r}$ and output $y\in \mathbb R$, i.e.,$y=\sum_{i=1}^r \sigma(\left \langle \boldsymbol x, \boldsymbol w_i\right\rangle)$, where $\sigma$ denotes the activation function.
First, we show that, if we cannot learn the weights $\{\boldsymbolw_i\}_{1\le i \le r}$ accurately, then the neural network does not generalizewell. More specifically, the generalization error is close to that of a trivialpredictor with access only to the norm of the input. This result holds for anyactivation function, and it requires that the weights are roughly isotropic andthe input distribution is Gaussian, which is a typical assumption in thetheoretical literature. Then, we show that the problem of learning the weights$\{\boldsymbol w_i\}_{1\le i \le r}$ is at least as hard as the problem oftensor decomposition. This result holds for any input distribution and assumesthat the activation function is a polynomial whose degree is related to theorder of the tensor to be decomposed. By putting everything together, we provethat learning a two-layers neural network that generalizes well is at least ashard as tensor decomposition. It has been observed that neural network modelswith more parameters than training samples often generalize well, even if theproblem is highly underdetermined. This means that the learning algorithm doesnot estimate the weights accurately and yet is able to yield a goodgeneralization error. This paper shows that such a phenomenon cannot occur whenthe input distribution is Gaussian and the weights are roughly isotropic. Wealso provide numerical evidence supporting our theoretical findings.