Accession Number : ADA289344

Title :   Isolated Digit Recognition Without Time Alignment.

Descriptive Note : Master's thesis,

Corporate Author : AIR FORCE INST OF TECH WRIGHT-PATTERSON AFB OH

Personal Author(s) : Gay, Jeffrey M.

PDF Url : ADA289344

Report Date : DEC 1994

Pagination or Media Count : 148

Abstract : This thesis examines methods for isolated digit recognition without using time alignment. Resource requirements for isolated word recognizers that use time alignment can become prohibitively large as the vocabulary to be classified grows. Thus, methods capable of achieving recognition rates comparable to those obtained with current methods using these techniques are needed. The goals of this research are to find feature sets for speech recognition that perform well without using time alignment, and to identify classifiers that provide good performance with these features. Using the digits from the TI46 database, baseline speaker-independent recognition rates of 95.2% for the complete speaker set and 98.1% for the male speaker set are established using dynamic time warping (DTW). This work begins with features derived from spectrograms of each digit. Based on a critical band frequency scale covering the telephone bandwidth (300-3000 Hz), these critical band energy features are classified alone and in combination with several other feature sets, with several different classifiers. With this method, there is one "short" feature vector per word. For speaker-independent recognition using the complete speaker set and a multi-layer perceptron (MLP) classifier, a recognition rate of 92.4% is achieved. For the same classifier with the male speaker set, a recognition rate of 97.1% is achieved. For the male speaker set, there is no statistical difference between results using DTW, and those using the MLP and no time alignment. This shows that there are feature sets that may provide high recognition rates for isolated word recognition without the need for time alignment.

Descriptors :   *SPEECH RECOGNITION, *PATTERN RECOGNITION, *WORD RECOGNITION, DATA BASES, REQUIREMENTS, DIGITAL SYSTEMS, NEURAL NETS, SIGNAL TO NOISE RATIO, ISOLATION, THESES, ALIGNMENT, FREQUENCY BANDS, BASE LINES, ARTIFICIAL INTELLIGENCE, SPEECH, VOCABULARY, BANDWIDTH, TELEPHONE SYSTEMS, DYNAMIC PROGRAMMING, MARKOV PROCESSES, ENERGY BANDS, SPECTROGRAPHY, STATISTICAL PROCESSES.

Subject Categories : Cybernetics
      Voice Communications

Distribution Statement : APPROVED FOR PUBLIC RELEASE