An Information-Theoretic Optimality Principle for Deep Reinforcement Learning.

RSS Source
Felix Leibfried, Jordi Grau-Moya, Haitham Bou-Ammar

We methodologically address the problem of Q-value overestimation in deepreinforcement learning to handle high-dimensional state spaces efficiently. Byadapting concepts from information theory, we introduce an intrinsic penaltysignal encouraging reduced Q-value estimates. The resultant algorithmencompasses a wide range of learning outcomes containing deep Q-networks as aspecial case. Different learning outcomes can be demonstrated by tuning aLagrange multiplier accordingly. We furthermore propose a novel schedulingscheme for this Lagrange multiplier to ensure efficient and robust learning. Inexperiments on Atari games, our algorithm outperforms other algorithms (e.g.deep and double deep Q-networks) in terms of both game-play performance andsample complexity.

Stay in the loop.

Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.