Accession Number : ADA327553

Title :   Large-Scale Topic Detection and Language Model Adaptation.

Corporate Author : CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE

Personal Author(s) : Seymore, Kristie ; Rosenfeld, Ronald

PDF Url : ADA327553

Report Date : JUN 1997

Pagination or Media Count : 19

Abstract : The subject matter of any conversation or document can typically be described as some combination of elemental topics. We have developed a language model adaptation scheme that takes apiece of text, chooses the most similar topic clusters from a set of over 5000 elemental topics, and uses topic specific language models built from the topic clusters to rescore N-best lists. We are able to achieve a 15% reduction in perplexity and a small improvement in word error rate by using this adaptation. We also investigate the use of a topic tree, where the amount of training data for a specific topic can be judiciously increased in cases where the elemental topic cluster has too few word tokens to build a reliably smoothed and representative language model. Our system is able to fine-tune topic adaptation by interpolating models chosen from thousands of topics, allowing for adaptation to unique, previously unseen combinations of subjects.

Descriptors :   *ADAPTIVE SYSTEMS, *SPEECH RECOGNITION, *TEXT PROCESSING, DATA BASES, MATHEMATICAL MODELS, WORDS(LANGUAGE), MARKOV PROCESSES, ERROR CORRECTION CODES, WORD RECOGNITION.

Subject Categories : Cybernetics

Distribution Statement : APPROVED FOR PUBLIC RELEASE