Accession Number : ADA334570

Title :   Developing a Corpus Specific Stoplist Using Quantitative Comparison

Descriptive Note : Master's thesis

Corporate Author : AIR FORCE INST OF TECH WRIGHT-PATTERSONAFB OH SCHOOL OF ENGINEERING

Personal Author(s) : Berg, Craig N.

PDF Url : ADA334570

Report Date : DEC 1997

Pagination or Media Count : 138

Abstract : We have become overwhelmed with electronic information and it seems our situation is not going to improve. It is becoming increasingly common for people to work with information on a daily basis. We seem to spend more and more time looking for information, and it is taking longer because more information is available. This thesis will look at how we can provide faster access to the information we want to find. Today's requirements are closely related to searching for information using queries. At the heart of the query process is the removal of search terms having little or no significance to the search being performed. Words considered to have little significance, in terms of their searching power, called stopwords, are compiled in a stoplist. Stoplists are usually constructed from commonly occurring words in the English language. This approach is acceptable for systems handling broad categories of information. We will build a stoplist for a specific area of interest based on a specific body of linguistic data, or corpus. A stoplist developed from an Air Force corpus will be tested to see if it is more effective than a stoplist created from a general use corpus.

Descriptors :   *DATA BASES, *INTERNET, *TEXT PROCESSING, DATA MANAGEMENT, INFORMATION EXCHANGE, COMPUTER COMMUNICATIONS, THESES, INFORMATION RETRIEVAL, COMPUTATIONAL LINGUISTICS, WORD RECOGNITION.

Subject Categories : Computer Systems

Distribution Statement : APPROVED FOR PUBLIC RELEASE