| Citation: | YIN Chang-ming, WHANG Han-xing, ZHAO Fei. Risk-Sensitive Reinforcement Learning Algorithms With Generalized Average Criterion[J]. Applied Mathematics and Mechanics, 2007, 28(3): 369-378. | 
	                | [1] | 
					 Sutton R S.Learning to predict by the method of temporal difference[J].Machine Learning,1988,3(1):9-44. 
					
					 | 
			
| [2] | 
					 Sutton R S. Open the oretical questions in reinforcement learning[A].In:Proc of Euro COLT'99(Computational Learning Theory)[C].Cambridge, MA: MIT Press,1999,11-17. 
					
					 | 
			
| [3] | 
					 Sutton R S,Barto A G.Reinforcement Learning: An Introduction[M].Massachusetts: MIT Press, 1998, 20-300. 
					
					 | 
			
| [4] | 
					 Watkins C J C H,Dayan P.Q-learning[J].Machine Learning,1992,8(13):279-292. 
					
					 | 
			
| [5] | 
					 Watkins C J C H. Learning from delayed rewards[D].England:University of Cambridge,1989. 
					
					 | 
			
| [6] | 
					 Bertsekas D P,Tsitsiklis J N.Parallel and Distributed Computation: Numerical Methods[M].Englewood Cliffs, New Jersey: Prentice-Hall,1989,10-109. 
					
					 | 
			
| [7] | 
					 YIN Chang-ming,CHEN Huan-wen,XIE Li-juan. A Relative Value Iteration Q-learning Algorithm and its Convergence Based-on Finite Samples[J].Journal of Computer Research and Development,2002,39(9):1064-1070. 
					
					 | 
			
| [8] | 
					 YIN Chang-ming,CHEN Huan-wen,XIE Li-juan.Optimality cost relative value iteration Q-learning algorithm based on finite samples[J].Journal of Computer Engineering and Applications,2002,38(11):65-67. 
					
					 | 
			
| [9] | 
					 Wiering M, Schmidhuber J.Speeding up Q-learning[A].In:Proc of the 10th European Conf on Machine Learning[C].Germany:Springer-Verlag,1998,352-363. 
					
					 | 
			
| [10] | 
					 Singh S.Soft dynamic programming algorithms: convergence proofs[A].In:Proceedings of Workshop on Computational Learning and Natural Learning (CLNL)[C].Massachusetts:Town of Provinceton.University of Massachuetts,1993. 
					
					 | 
			
| [11] | 
					 Cavazos-Cadena R,Montes-de-Oca R.The value iteration algorithm in risk-sensitive average Markov decision chains with finite state[J].Mathematics of Operations Research,2003,28(4):752-776. doi:  10.1287/moor.28.4.752.20515 
					
					 | 
			
| [12] | 
					 Peng J,Williams R.Incremental multi-step Q-learning[J].Machine Learning,1996,22(4):283-290. 
					
					 | 
			
| [13] | 
					 Singh S. Reinforcement learning algorithm for average-payoff Markovian decision processes[A].Procedins of the 12th National Conference on Artificial Intelligence[C].Taho city:Ca Morgan Kaufmann,1994,1:700-705. 
					
					 |