Accession Number : ADA313545
Title : High Dimensional Clustering Using Parallel Coordinates and the Grand Tour.
Descriptive Note : Technical rept.,
Corporate Author : GEORGE MASON UNIV FAIRFAX VA CENTER FOR COMPUTATIONAL STATISTICS
Personal Author(s) : Wegman, Edward J. ; Luo, Qiang
PDF Url : ADA313545
Report Date : APR 1996
Pagination or Media Count : 24
Abstract : In this paper, we present some graphical techniques for cluster analysis of high-dimensional data. Parallel coordinate plots and parallel coordinate density plots are graphical techniques which map multivariate data into a two-dimensional display. The method has some elegant duality properties with ordinary Cartesian plots so that higher-dimensional mathematical structures can be analyzed. Our high interaction software allows for rapid editing of data to remove outliers and isolate clusters by brushing. Our brushing techniques allow not only for hue adjustment, but also for saturation adjustment. Saturation adjustment allows for the handling of comparatively massive data sets by using the alpha-channel of the Silicon Graphics workstation to compensate for heavy overplotting. The grand tour is a generalized rotation of coordinate axes in a high-dimensional space. Coupled with the full-dimensional plots allowed by the parallel coordinate display, these techniques allow the data analyst to explore data which is both high-dimensional and massive in size. In this paper we give a description of both techniques and illustrate their use to do inverse regression and clustering. We have used these techniques to analyze data on the order of 250,000 observations in 8 dimensions. Because the analysis requires the use of color graphics, in the present paper we illustrate the methods with a more modest data set of 3848 observations. Other illustrations are available on our web page.
Descriptors : *MULTIVARIATE ANALYSIS, *STATISTICAL DATA, *STATISTICAL DECISION THEORY, DATA BASES, ALGORITHMS, SOFTWARE ENGINEERING, DATA MANAGEMENT, MATRICES(MATHEMATICS), MATHEMATICAL PROGRAMMING, REGRESSION ANALYSIS, GRIDS(COORDINATES), CLUSTERING, COMPUTER GRAPHICS, DATA DISPLAYS, COVARIANCE, CARTESIAN COORDINATES.
Subject Categories : Statistics and Probability
Distribution Statement : APPROVED FOR PUBLIC RELEASE