Accession Number : ADA293714

Title :   A Hypothesis-Testing Approach to Discriminant Analysis with Mixed Categorical and Continuous Variables When Data are Missing.

Corporate Author : SOUTHERN METHODIST UNIV DALLAS TX DEPT OF STATISTICAL SCIENCE

Personal Author(s) : Miller, J. W. ; Woodward, W. A. ; Gray, H. L. ; McCartor, G. D.

PDF Url : ADA293714

Report Date : JUL 1994

Pagination or Media Count : 41

Abstract : In this report we consider the problem of discriminant analysis with discrete (categorical) and continuous variables with data missing at random. We use a hypothesis-testing approach based on the generalized likelihood ratio as proposed by Baek, et al. We use bootstrapping to determine critical values in order to control the Type 1 error rate. We present three algorithms for dealing with this case, each assuming a different model for the data: the INDICATOR algorithm replaces categorical variables with indicator variables, and treats these as if they were continuous, the FULL algorithm assumes a multinomial distribution for the discrete part, and a multivariate normal distribution (with mean and covariances depending on the discrete part) as the conditional distribution of the continuous part given the discrete part, and the COMMON algorithm assumes a multinomial distribution for the discrete part, and a multivariate normal distribution (with only the means depending on the discrete part) as the conditional distribution of the continuous part given the discrete part. (That is, a common covariance matrix is assumed across all multinomial cells.) The performance of these algorithms is compared through a simulation study. While the INDICATOR algorithm seems to have highest power, it also tends to display a higher Type 1 error rate than desired. The FULL and the COMMON algorithms have very similar power, but the COMMON algorithm appears to control the Type 1 error rate most effectively, and is least susceptible to problems occurring when some multinomial cells are sparsely represented. (AN)

Descriptors :   *MAXIMUM LIKELIHOOD ESTIMATION, *HYPOTHESES, *DISCRIMINATE ANALYSIS, MATHEMATICAL MODELS, ALGORITHMS, COMPUTATIONS, PARAMETERS, MULTIVARIATE ANALYSIS, STATISTICAL TESTS, PROBABILITY DISTRIBUTION FUNCTIONS, RANDOM VARIABLES, MATRICES(MATHEMATICS), COMPARISON, STATISTICAL DATA, ERRORS, DISCRETE DISTRIBUTION, COVARIANCE, NORMAL DISTRIBUTION.

Subject Categories : Statistics and Probability

Distribution Statement : APPROVED FOR PUBLIC RELEASE