Accession Number : ADA272625

Title :   Synthesizing Regularity Exposing Attributes in Large Protein Databases.

Descriptive Note : Technical rept.,

Corporate Author : MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB

Personal Author(s) : De La Maza, Michael

Report Date : SEP 1993

Pagination or Media Count : 89

Abstract : This thesis describes a system that synthesizes regularity exposing attributes from large protein databases. After processing primary and secondary structure data, this system discovers an amino acid representation that captures what are thought to be the three most important amino acid characteristics (size, charge, and hydrophobicity) for tertiary structure prediction. A neural network trained using this 16 bit representation achieves a performance accuracy on the secondary structure prediction problem that is comparable to the one achieved by a neural network trained using the standard 24 bit amino acid representation. In addition, the thesis describes bounds on secondary structure prediction accuracy, derived using an optimal learning algorithm and the probably approximately correct (PAC) model. Representation reformulation, Neural networks, Genetic algorithms, Clustering algorithm, Decision tree system, Secondary structure prediction.

Descriptors :   *ALGORITHMS, *DATA BASES, *NEURAL NETS, *ARTIFICIAL INTELLIGENCE, *COMPUTER NETWORKS, ACCURACY, ACIDS, ADDITION, AMINO ACIDS, CLUSTERING, GENETICS, LEARNING, MODELS, NETWORKS, PREDICTIONS, PROCESSING, PROTEINS, SECONDARY, STANDARDS, STRUCTURES, THESES, TREES, CONTROL SYSTEMS.

Subject Categories : Psychology
      Theoretical Mathematics
      Computer Systems

Distribution Statement : APPROVED FOR PUBLIC RELEASE