Accession Number : AD0751407

Title :   A Method for the Removal of Redundancy in Printed Text.

Descriptive Note : Doctoral thesis,

Corporate Author : ILLINOIS UNIV URBANA COORDINATED SCIENCE LAB

Personal Author(s) : Cullum,Robert Donald

Report Date : SEP 1972

Pagination or Media Count : 113

Abstract : A class of methods for redundancy removal from printed texts, called ID-methods was developed. ID-methods take into account only the statistics associated with word occurrences in printed text. However, it has been shown by means of models that these methods can be used to encode English text at a cost as low as 1.5 binary digits per character. This figure compares favorably with Shannon's upper bound on the entropy of printed English, which was determined by an experiment that implicitly took into account the syntactic structure and the semantics of English. Shannon's bound was 1.3 bit per character. An encoding experiment was performed, which verified the cost predictions and assessed the complexity of using ID-methods. It was found that text could be encoded at a rate that was on the order of a few thousand characters per second. An analysis indicates that text encoded using an ID-method could be decoded at a rate of 250,000 characters per second on a computer such as the IBM 360/75. (Author)

Descriptors :   (*CODING, ENGLISH LANGUAGE), BINARY ARITHMETIC, ENTROPY, DECODING, INFORMATION RETRIEVAL, COMPUTER PROGRAMMING, MATHEMATICAL LOGIC, THESES, DATA STORAGE SYSTEMS

Subject Categories : Linguistics
      Cybernetics

Distribution Statement : APPROVED FOR PUBLIC RELEASE