Accession Number : ADA547540

Title :   Clustering Systems with Kolmogorov Complexity and MapReduce

Descriptive Note : Technical rept.

Corporate Author : NAVAL ACADEMY ANNAPOLIS MD DEPT OF COMPUTER SCIENCE

Personal Author(s) : Troisi, Louis R.

PDF Url : ADA547540

Report Date : 02 JUN 2011

Pagination or Media Count : 21

Abstract : In the eld of value management, an important problem is quantifying the processes and capabilities of an organization's network and the machines within. When the orga- nization is large, ever-changing, and responding to new demands, it is di cult to know at any given time what exactly is being run on the machines. Accordingly, one could lose track of approved or, worse, not approved or even malicious software, as the machines become employed for various tasks. Moreover, the level of utilization of the machines may a ect the maintenance and upkeep of the network. Our goal is to develop a tool that can cluster the machines on a network, in a meaningful way, using di erent attributes or features, and it does so autonomously, in an e cient and scalable system. The so- lution developed implements, at its core, a streaming algorithm that in real-time takes meaningful operating data from a network, compresses it, and sends it to a MapReduce clustering algorithm. The clustering algorithm uses a normalized compression distance to measure the similarity of two machines. The goal for this project was to implement the solution and measure the overall e ectiveness of the clusters. The implementation was successful in creating a software tool that can compress, determine the normalized compression distance and cluster the machines. More work however, needs to be done in using our system to extract more quantitative meaning from the clusters generated.

Descriptors :   *ALGORITHMS, *COMPUTER PROGRAMS, *CLUSTERING, SOFTWARE TOOLS, PARALLEL PROCESSING, SCALING FACTOR, HIERARCHIES, ORGANIZATIONS, QUANTITATIVE ANALYSIS, COMPRESSION, NETWORKS

Subject Categories : Numerical Mathematics
      Computer Programming and Software

Distribution Statement : APPROVED FOR PUBLIC RELEASE