Accession Number : AD0281909

Title :   STATISTICAL SEMANTICS,

Corporate Author : SYSTEM DEVELOPMENT CORP SANTA MONICA CALIF

Personal Author(s) : Doyle,Lauren B.

Report Date : 11 JUL 1962

Pagination or Media Count : 5

Abstract : Three small libraries in physics, in European current events, and in information retrieval are represented by three groups of 100 lists, each list of which simulates output of a computer program which determines the 12 most frequent content words of a document. Homographs of words which occur in any two of the three libraries are inventoried to ascertain how cleanly the homographs are separated as a consequence of separating the libraries from each other. Three kinds of homograph separation are specified--doubtful, partial, and clean-cut. The latter was found to predominate in this study, as a result of the variegation and small size of the libraries. It is hypothesized that for statistically separable libraries somewhat closer in subject matter and/or larger, lower percentages of clean-cut separations should occur, but that there are countertrends which could make these effects less important. (Author)

Descriptors :   *CATALOGS, *DATA PROCESSING, *MATHEMATICAL ANALYSIS, COMPUTER LOGIC, LIBRARIES, MATHEMATICS

Distribution Statement : APPROVED FOR PUBLIC RELEASE