Globally Consistent Algorithms for Mixture of Experts.

RSS Source
Ashok Vardhan Makkuva, Sreeram Kannan, Pramod Viswanath

Mixture-of-Experts (MoE) is a widely popular neural network architecture andis a basic building block of highly successful modern neural networks, forexample, Gated Recurrent Units (GRU) and Attention networks. However, despitethe empirical success, finding an efficient and provably consistent algorithmto learn the parameters remains a long standing open problem for more than twodecades. In this paper, we introduce the first algorithm that learns the trueparameters of a MoE model for a wide class of non-linearities with globalconsistency guarantees. Our algorithm relies on a novel combination of the EMalgorithm and the tensor method of moment techniques. We empirically validateour algorithm on both the synthetic and real data sets in a variety ofsettings, and show superior performance to standard baselines.

Stay in the loop.

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.