Accession Number : ADA291384

Title :   Huge Data Sets and the Frontiers of Computational Feasibility.

Descriptive Note : Technical rept.,

Corporate Author : GEORGE MASON UNIV FAIRFAX VA CENTER FOR COMPUTATIONAL STATISTICS

Personal Author(s) : Wegman, Edward J.

PDF Url : ADA291384

Report Date : NOV 1994

Pagination or Media Count : 23

Abstract : Recently, Huber offered a taxonomy of data set sizes ranging from tiny (100 bytes) to huge (10 to the 10th power bytes). This taxonomy is particularly appealing because it quantifies the meaning of tiny, small, medium, large and huge. Indeed, some investigators consider 300 small and 10,000 large while others consider 10,000 small. In Huber's taxonomy, most statistical and visualization techniques are computationally feasible with tiny data sets. However with larger data sets computers run out of computational horsepower and graphics displays run out of resolution fairly quickly. In this paper, we discuss aspects of data set size and computational feasibility for general classes of algorithms in the context of CPU performance, memory size, hard disk capacity, screen resolution and massively parallel architectures. We discuss some strategies such as recursive formulations which mitigate the impact of size. We also discuss the potential for scalable parallelization which will mitigate the effects of computational complexity. (AN)

Descriptors :   *ALGORITHMS, *COMPUTATIONS, *DATA MANAGEMENT, DATA BASES, INFORMATION TRANSFER, DISTRIBUTED DATA PROCESSING, CAPACITY(QUANTITY), PARAMETERS, STATISTICAL DATA, RESOLUTION, COMPUTER ARCHITECTURE, PARALLEL PROCESSING, NONPARAMETRIC STATISTICS, FEASIBILITY STUDIES, LIMITATIONS, RECURSIVE FUNCTIONS, COMPUTER GRAPHICS, DATA DISPLAYS, STATISTICAL PROCESSES, OPTICAL STORAGE, BUFFER STORAGE, FLOATING POINT OPERATION, TAXONOMY.

Subject Categories : Operations Research
      Statistics and Probability

Distribution Statement : APPROVED FOR PUBLIC RELEASE