
Accession Number : ADA309171
Title : MultiAgent Residual Advantage Learning with General Function Approximation.
Descriptive Note : Final rept.,
Corporate Author : WRIGHT LAB WRIGHTPATTERSON AFB OH
Personal Author(s) : Harmon, Mance E. ; Baird, Leemon C., III
PDF Url : ADA309171
Report Date : 03 APR 1996
Pagination or Media Count : 16
Abstract : A new algorithm advantage learning, is presented that improves on advantage updating by requiring that a single function be learned rather than two. Furthermore, advantage learning requires only a single type of update, the learning, while advantage updating requires two different types of updates, a learning update and a normalization update. The reinforcement learning system uses the residual form of advantage learning. An application of reinforcement learning to a Markov game is presented. The testbed has continuous states and nonlinear dynamics. The advantage function is stored in a singlehiddenlayer sigmoidal network. Speed of learning is increased by a new algorithm, Incremental DeltaDelta (IDD), which extends Jacob's (1988) DeltaDelta for use in incremental training, and differs from Sutton's Incremental DeltaBarDelta (1992) in that it does not require the use of a trace and is amenable for use with general function approximation systems. To our knowledge, this is the first time an approximate second order method has been used with residual algorithms. Empirical results are presented comparing convergence rates with and without the use of lDD for the reinforcement learning testbed and for a supervised learning testbed.
Descriptors : *ALGORITHMS, *LEARNING, *REINFORCEMENT(STRUCTURES), VELOCITY, FUNCTIONS, TEST BEDS, TRAINING, DYNAMICS, RATES, NONLINEAR SYSTEMS, RESIDUALS, CONVERGENCE.
Subject Categories : Psychology
Numerical Mathematics
Distribution Statement : APPROVED FOR PUBLIC RELEASE