Accession Number : ADA318671
Title : Convergence Behavior of Temporal Difference Learning.
Descriptive Note : Final rept.,
Corporate Author : WRIGHT LAB WRIGHT-PATTERSON AFB OH AVIONICS DIRECTORATE
Personal Author(s) : Malhotra, Raj P.
PDF Url : ADA318671
Report Date : MAY 1996
Pagination or Media Count : 9
Abstract : Temporal difference learning is an important class of incremental learning procedures which learn to predict outcomes of sequential processes through experience. Although these algorithms have been used in a variety of notorious intelligent systems such as Samuel's checker-player and Tesauro's Backgammon program. Their convergence properties remain poorly understood. This paper provides a brief summary of the theoretical basis for these algorithms and documents observed convergence performance in a variety of experiments. The implications of these results are also briefly discussed.
Descriptors : *ALGORITHMS, *LEARNING MACHINES, *ARTIFICIAL INTELLIGENCE, OPTIMIZATION, CONVERGENCE, HEURISTIC METHODS, SYSTEMS ANALYSIS, DYNAMIC PROGRAMMING, MARKOV PROCESSES, GAME THEORY, CONDITIONING(LEARNING).
Subject Categories : Cybernetics
Distribution Statement : APPROVED FOR PUBLIC RELEASE