Gradient conjugate priors and deep neural networks.

RSS Source
Pavel Gurevich, Hannes Stuke

The paper deals with learning the probability distribution of the observeddata by artificial neural networks. We suggest a so-called gradient conjugateprior (GCP) update appropriate for neural networks, which is a modification ofthe classical Bayesian update for conjugate priors. We establish a connectionbetween the gradient conjugate prior update and the maximization of thelog-likelihood of the predictive distribution. Unlike for the Bayesian neuralnetworks, we do not impose a prior on the weights of the neural networks, butrather assume that the ground truth distribution is normal with unknown meanand variance and learn by neural networks the parameters of a prior(normal-gamma distribution) for these unknown mean and variance. The update ofthe parameters is done, using the gradient that, at each step, directs towardsminimizing the Kullback--Leibler divergence from the prior to the posteriordistribution (both being normal-gamma). We obtain a corresponding dynamicalsystem for the prior's parameters and analyze its properties. In particular, westudy the limiting behavior of all the prior's parameters and show how itdiffers from the case of the classical full Bayesian update. The results arevalidated on synthetic and real world data sets.

Stay in the loop.

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.