An Information-Theoretic Optimality Principle for Deep Reinforcement Learning.
We methodologically address the problem of Q-value overestimation in deepreinforcement learning to handle high-dimensional state spaces efficiently. Byadapting concepts from information theory, we introduce an intrinsic penaltysignal encouraging reduced Q-value estimates. The resultant algorithmencompasses a wide range of learning outcomes containing deep Q-networks as aspecial case. Different learning outcomes can be demonstrated by tuning aLagrange multiplier accordingly. We furthermore propose a novel schedulingscheme for this Lagrange multiplier to ensure efficient and robust learning. Inexperiments on Atari games, our algorithm outperforms other algorithms (e.g.deep and double deep Q-networks) in terms of both game-play performance andsample complexity.
Stay in the loop.
Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.