Accession Number : ADA318671

Title :   Convergence Behavior of Temporal Difference Learning.

Descriptive Note : Final rept.,

Corporate Author : WRIGHT LAB WRIGHT-PATTERSON AFB OH AVIONICS DIRECTORATE

Personal Author(s) : Malhotra, Raj P.

PDF Url : ADA318671

Report Date : MAY 1996

Pagination or Media Count : 9

Abstract : Temporal difference learning is an important class of incremental learning procedures which learn to predict outcomes of sequential processes through experience. Although these algorithms have been used in a variety of notorious intelligent systems such as Samuel's checker-player and Tesauro's Backgammon program. Their convergence properties remain poorly understood. This paper provides a brief summary of the theoretical basis for these algorithms and documents observed convergence performance in a variety of experiments. The implications of these results are also briefly discussed.

Descriptors :   *ALGORITHMS, *LEARNING MACHINES, *ARTIFICIAL INTELLIGENCE, OPTIMIZATION, CONVERGENCE, HEURISTIC METHODS, SYSTEMS ANALYSIS, DYNAMIC PROGRAMMING, MARKOV PROCESSES, GAME THEORY, CONDITIONING(LEARNING).

Subject Categories : Cybernetics
      Operations Research

Distribution Statement : APPROVED FOR PUBLIC RELEASE