Accession Number : ADA304271

Title :   Randomization Testing of Machine Induced Rules.

Descriptive Note : Master's thesis,

Corporate Author : NAVAL POSTGRADUATE SCHOOL MONTEREY CA

Personal Author(s) : Berry, Eric D.

PDF Url : ADA304271

Report Date : SEP 1995

Pagination or Media Count : 90

Abstract : The Department of Defense (DOD) possesses tremendous amounts of data stored in many large databases. Given the size of these databases large scale data analysis tools are required to find previously unknown and interesting patterns. Data Mining tools which produce output in the form of production rules, i.e., 'If x, Then y' are preferred because the generated rules are understandable by humans and readily support decision making processes. This thesis investigates the problems associated with the statistical testing of rule generated from data mining systems. Statistical testing of rules generated by data mining systems is required to ensure that the generated rules are based on valid statistical relationships and are not the result of random variation in the underlying data. A strategy for the testing of rules using a non-parametric test known as the randomization test is implemented for the testing of rules from a prototype data mining system.

Descriptors :   *DATA BASES, *DATA MANAGEMENT, *SIZES(DIMENSIONS), *STATISTICAL TESTS, *OPERATIONAL EFFECTIVENESS, TEST AND EVALUATION, DEPARTMENT OF DEFENSE, DECISION MAKING, PRODUCTION, INFORMATION SYSTEMS, HUMANS, TOOLS, STATISTICS, THESES, PROTOTYPES, VARIATIONS, DATA STORAGE SYSTEMS, HYPOTHESES, INTELLIGIBILITY.

Subject Categories : Statistics and Probability
      Computer Systems

Distribution Statement : APPROVED FOR PUBLIC RELEASE