Clipped Action Policy Gradient.
Many continuous control tasks have bounded action spaces and clipout-of-bound actions before execution. Policy gradient methods often optimizepolicies as if actions were not clipped. We propose clipped action policygradient (CAPG) as an alternative policy gradient estimator that exploits theknowledge of actions being clipped to reduce the variance in estimation. Weprove that CAPG is unbiased and achieves lower variance than the originalestimator that ignores action bounds. Experimental results demonstrate thatCAPG generally outperforms the original estimator, indicating its promise as abetter policy gradient estimator for continuous control tasks.
Stay in the loop.
Subscribe to our newsletter for a weekly update on the latest podcast, news, events, and jobs postings.