Risk-Sensitive Reinforcement Learning Algorithms With Generalized Average Criterion

YIN Chang-ming; WHANG Han-xing; ZHAO Fei

Article Contents

Article Navigation > Applied Mathematics and Mechanics > 2007 > 28(3): 369-378

YIN Chang-ming, WHANG Han-xing, ZHAO Fei. Risk-Sensitive Reinforcement Learning Algorithms With Generalized Average Criterion[J]. Applied Mathematics and Mechanics, 2007, 28(3): 369-378.

Citation:

YIN Chang-ming, WHANG Han-xing, ZHAO Fei. Risk-Sensitive Reinforcement Learning Algorithms With Generalized Average Criterion[J]. Applied Mathematics and Mechanics, 2007, 28(3): 369-378.

Citation:

YIN Chang-ming, WHANG Han-xing, ZHAO Fei. Risk-Sensitive Reinforcement Learning Algorithms With Generalized Average Criterion[J]. Applied Mathematics and Mechanics, 2007, 28(3): 369-378.

PDF( 410 KB)

Risk-Sensitive Reinforcement Learning Algorithms With Generalized Average Criterion

1.
College of Computer and Communicational Engineering, Changsha University of Science and Technology, Changsha 410076, P. R. China;

Received Date: 2006-02-20
Rev Recd Date: 2007-01-16
Publish Date: 2007-03-15

Abstract

Abstract

A new algorithm which immolates optimality of control policies potentially to obtain the robusticity of solutions is proposed.The robusticity of solutions may become a very important property for a learning system due to when there exists nonOmatching between theory models and practical physical system,or the practical system is not static,or availability of a control action will change along with variety of time.The main contribution is that a set of approximation algorithms and its convergence results will be given.Applying generalized average operator instead of the general optimal operator max(or min)a class of important learning algorithm,dynamic programming algorithm were studied,and their convergence from theoretic point of view was discussed.The purpose is to improve robusticity of reinforcement learning algorithms theoretically.
- reinforcement learning,
- riskOsensitive,
- generalized average,
- algorithm,
- convergence

FullText(HTML)

References(13)

References

[1]	Sutton R S.Learning to predict by the method of temporal difference[J].Machine Learning,1988,3(1):9-44.
[2]	Sutton R S. Open the oretical questions in reinforcement learning[A].In:Proc of Euro COLT'99(Computational Learning Theory)[C].Cambridge, MA: MIT Press,1999，11-17.
[3]	Sutton R S,Barto A G.Reinforcement Learning: An Introduction[M].Massachusetts: MIT Press, 1998, 20-300.
[4]	Watkins C J C H,Dayan P.Q-learning[J].Machine Learning,1992,8(13):279-292.
[5]	Watkins C J C H. Learning from delayed rewards[D].England:University of Cambridge,1989.
[6]	Bertsekas D P,Tsitsiklis J N.Parallel and Distributed Computation: Numerical Methods[M].Englewood Cliffs, New Jersey: Prentice-Hall,1989,10-109.
[7]	YIN Chang-ming,CHEN Huan-wen,XIE Li-juan. A Relative Value Iteration Q-learning Algorithm and its Convergence Based-on Finite Samples[J].Journal of Computer Research and Development,2002,39(9):1064-1070.
[8]	YIN Chang-ming,CHEN Huan-wen,XIE Li-juan.Optimality cost relative value iteration Q-learning algorithm based on finite samples[J].Journal of Computer Engineering and Applications,2002,38(11):65-67.
[9]	Wiering M, Schmidhuber J.Speeding up Q-learning[A].In:Proc of the 10th European Conf on Machine Learning[C].Germany:Springer-Verlag,1998,352-363.
[10]	Singh S.Soft dynamic programming algorithms: convergence proofs[A].In:Proceedings of Workshop on Computational Learning and Natural Learning (CLNL)[C].Massachusetts:Town of Provinceton.University of Massachuetts,1993.
[11]	Cavazos-Cadena R,Montes-de-Oca R.The value iteration algorithm in risk-sensitive average Markov decision chains with finite state[J].Mathematics of Operations Research,2003,28(4):752-776. doi: 10.1287/moor.28.4.752.20515
[12]	Peng J,Williams R.Incremental multi-step Q-learning[J].Machine Learning,1996,22(4):283-290.
[13]	Singh S. Reinforcement learning algorithm for average-payoff Markovian decision processes[A].Procedins of the 12th National Conference on Artificial Intelligence[C].Taho city:Ca Morgan Kaufmann,1994,1:700-705.