Accession Number : ADA309171

Title :   Multi-Agent Residual Advantage Learning with General Function Approximation.

Descriptive Note : Final rept.,

Corporate Author : WRIGHT LAB WRIGHT-PATTERSON AFB OH

Personal Author(s) : Harmon, Mance E. ; Baird, Leemon C., III

PDF Url : ADA309171

Report Date : 03 APR 1996

Pagination or Media Count : 16

Abstract : A new algorithm advantage learning, is presented that improves on advantage updating by requiring that a single function be learned rather than two. Furthermore, advantage learning requires only a single type of update, the learning, while advantage updating requires two different types of updates, a learning update and a normalization update. The reinforcement learning system uses the residual form of advantage learning. An application of reinforcement learning to a Markov game is presented. The test-bed has continuous states and nonlinear dynamics. The advantage function is stored in a single-hidden-layer sigmoidal network. Speed of learning is increased by a new algorithm, Incremental Delta-Delta (IDD), which extends Jacob's (1988) Delta-Delta for use in incremental training, and differs from Sutton's Incremental Delta-Bar-Delta (1992) in that it does not require the use of a trace and is amenable for use with general function approximation systems. To our knowledge, this is the first time an approximate second order method has been used with residual algorithms. Empirical results are presented comparing convergence rates with and without the use of lDD for the reinforcement learning test-bed and for a supervised learning test-bed.

Descriptors :   *ALGORITHMS, *LEARNING, *REINFORCEMENT(STRUCTURES), VELOCITY, FUNCTIONS, TEST BEDS, TRAINING, DYNAMICS, RATES, NONLINEAR SYSTEMS, RESIDUALS, CONVERGENCE.

Subject Categories : Psychology
      Numerical Mathematics

Distribution Statement : APPROVED FOR PUBLIC RELEASE