Accession Number : ADA309887
Title : Finding Acoustic Regularities in Speech: Applications to Phonetic Recognition.
Descriptive Note : Technical rept.,
Corporate Author : MASSACHUSETTS INST OF TECH CAMBRIDGE RESEARCH LAB OF ELECTRONICS
Personal Author(s) : Glass, James R.
PDF Url : ADA309887
Report Date : DEC 1988
Pagination or Media Count : 157
Abstract : Phonetic recognition can be viewed as a process through which the acoustic signal is mapped to a set of phonological units used to represent a lexicon. Traditionally, researchers often prescribe an intermediate, phonetic description to account for coarticulation. This thesis presents an alternative approach whereby this phonetic-level description is bypassed in favor of directly relating the acoustic realizations to the underlying phonemic forms. In this approach, the speech signal is transformed into a set of segments which are described completely in acoustic terms. Next, these acoustic segments are related to the phonemes by a grammar which is determined using automated procedures operating on a set of training data. Thus important acoustic regularities that describe contextual variations are discovered without the need to specify a set of preconceived units such as allophones. The viability of this approach depends critically on the ability to detect important acoustic landmarks in the speech signal, and to describe these events in terms of an inventory of labels that captures the regularity of phonetic variations. In the present implementation, the signal is first transformed into a representation based on an auditory model developed by Seneff. Next, important acoustic landmarks are located, and embedded in a multi-level structure called a dendrogram, in which information ranging from coarse to fine is represented in a unified framework. Each acoustic region in the dendrogram is then described by a set of acoustic labels determined through a hierarchical clustering algorithm. An analysis of the multi-level structure on a set of 500 utterances recorded from 100 different talkers indicates that over 96% of important acoustic-phonetic events are located, with an insertion rate of less than 5%.
Descriptors : *ACOUSTIC SIGNALS, *SPEECH RECOGNITION, DATA BASES, MATHEMATICAL MODELS, ALGORITHMS, SIGNAL PROCESSING, STOCHASTIC PROCESSES, AUTOMATION, INPUT OUTPUT PROCESSING, SOUND TRANSMISSION, CLASSIFICATION, PATTERN RECOGNITION, SPEECH ANALYSIS, ACOUSTIC FILTERS, ACOUSTIC DATA, SOUND PRESSURE, VOICE COMMUNICATIONS, SOUND PITCH, SPEECH ARTICULATION, AUDITORY SIGNALS, AUDITORY PERCEPTION, DECODING, PHONETICS, ACOUSTIC RESONANCE, PHRASE STRUCTURE GRAMMARS, CONTEXT SENSITIVE GRAMMARS, PHONEMES.
Subject Categories : Acoustics
Distribution Statement : APPROVED FOR PUBLIC RELEASE