Accession Number : ADA323936
Title : Learning in Embedded Systems.
Descriptive Note : Doctoral thesis,
Corporate Author : STANFORD UNIV CA DEPT OF COMPUTER SCIENCE
Personal Author(s) : Kaelbling, Leslie P.
PDF Url : ADA323936
Report Date : JUN 1990
Pagination or Media Count : 208
Abstract : This dissertation addresses the problem of designing algorithms for learning in embedded systems. This problem differs from the traditional supervised learning problem. An agent, finding itself in a particular input situation must generate an action. It then receives a reinforcement value from the environment, indicating how valuable the current state of the environment is for the agent. The agent cannot, however, deduce the reinforcement value that would have resulted from executing any of its other actions. A number of algorithms for learning action strategies from reinforcement values are presented and compared empirically with existing reinforcement-learning algorithms. The interval-estimation algorithm uses the statistical notion of confidence intervals to guide its generation of actions in the world, trading off acting to gain information against acting to gain reinforcement. It performs well in simple domains but does not exhibit any generalization and is computationally complex. The cascade algorithm is a structural credit-assignment method that allows an action strategy with many output bits to be learned by a collection of reinforcement- learning modules that learn Boolean functions. This method represents an improvement in computational complexity and often in learning rate. Two algorithms for learning Boolean functions in k-DNF are described. They both perform well and have tractable complexity. A generate-and-test reinforcement-learning algorithm is presented. It allows symbolic representations of Boolean functions to be constructed incrementally and tested in the environment. It is highly parametrized and can be tuned to learn a broad range of function classes. Low-complexity functions can be learned very efficiently even in the presence of large numbers of irrelevant input bits.
Descriptors : *ALGORITHMS, *LEARNING MACHINES, *ARTIFICIAL INTELLIGENCE, OPTIMIZATION, ADAPTIVE CONTROL SYSTEMS, ROBOTS, THESES, INPUT OUTPUT PROCESSING, CONVERGENCE, HEURISTIC METHODS, SYSTEMS ANALYSIS, CONTROL THEORY, SPECIAL FUNCTIONS(MATHEMATICS), AUTOMATA, RANDOM WALK.
Subject Categories : Cybernetics
Distribution Statement : APPROVED FOR PUBLIC RELEASE