Comparative Evaluation of Reinforcement Learning with Scalar Rewards and Linear Regression with Multidimensional Feedback
This paper presents a comparative evaluation of two learning approaches. The first approach is a conventional reinforcement learning algorithm for direct policy search which uses scalar rewards by definition. The second approach is a custom linear regression based algorithm that uses multidimensional feedback instead of a scalar reward. The two approaches are evaluated in simulation on a common benchmark problem: an aiming task where the goal is to learn the optimal parameters for aiming that result in hitting as close as possible to a given target. The comparative evaluation shows that the multidimensional feedback provides a significant advantage over the scalar reward, resulting in an order-ofmagnitude speed-up of the convergence. A real-world experiment with a humanoid robot confirms the results from the simulation and highlights the importance of multidimensional feedback for fast learning.