Accession Number : AD0267901

Title :   The Construction of an Empirically Based Mathematically Derived Classification System,

Corporate Author : SYSTEM DEVELOPMENT CORP SANTA MONICA CALIF

Personal Author(s) : BORKO,HAROLD

Report Date : 26 OCT 1961

Pagination or Media Count : 1

Abstract : A method for developing an empirically based, computer derived classification system is described. The library of documents chosen for experimentation consisted of 618 psychological abstracts which were coded for computer processing. The total text consisted of approximately 50,000 words; nearly 6,800 were unique. Words were arranged in order of frequency of occurrence. From the list of words which occurred 20 or more times, excluding syntactical terms such as, and, but, of, etc., the investigator selected 90 for use as index terms. These were arranged in a data matrix with the terms on the horizontal and the document number on the vertical axis. The cells contained the number of times the term was used in the document. A correlation matrix, 90x90 in size, was computed which showed the relationship of each term to every other term. The matrix was factor analyzed and 53 eigenvectors obtained. Three groups of these vectors--the first four, the first 10, and the first 18--were selected as factors and were rotated for meaning. All factors were interpreted and the set of ten proved to be the most meaningful as classification categories. These factors were compared with, and shown to be compatible but not indentical to, the classification system used by the American Psychological Association. The results demonstrate the feasibility of an empirically derived classification system and establish the value of factor analysis as a technique in language data processing. (Author)

Descriptors :   CLASSIFICATION, CODING, CORRELATION TECHNIQUES, DOCUMENTS, FACTOR ANALYSIS, INDEXES, UNITED STATES GOVERNMENT

Distribution Statement : APPROVED FOR PUBLIC RELEASE