Accession Number : ADA332313

Title :   Predicting Data Cache Misses in Non-Numeric Applications Through Correlation Profiling

Corporate Author : CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE

Personal Author(s) : Mowry, Todd C. ; Luk, Chi-Keung

PDF Url : ADA332313

Report Date : SEP 1997

Pagination or Media Count : 27

Abstract : Software-based latency tolerance techniques offer the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. However, to fully exploit the benefit of these techniques, one must be careful to apply them only to the dynamic references that are likely to suffer cache misses-otherwise the runtime overheads can potentially offset any gains. In this paper, we focus on isolating dynamic miss instances in non-numeric applications, which is a difficult but important problem. Although compilers cannot statically analyze data locality in non-numeric applications, one viable approach is to use profiling information to measure the actual miss behavior. Unfortunately, the state-of-the-art in cache miss profiling (which we call summary profiling) is inadequate for references with intermediate miss ratios-it either misses opportunities to hide latency, or else inserts overhead that is unnecessary. To overcome this problem, we propose and evaluate a new profiling technique that helps predict which dynamic instances of a static memory reference will hit or miss in the cache: correlation profiling. Our experimental results demonstrate that roughly half of the 22 non-numeric applications we study can potentially enjoy significant reductions in memory stall time by exploiting at least one of the three forms of correlation profiling we consider: control-flow correlation, self correlation, and global correlation. In addition, our detailed case studies illustrate that self correlation succeeds because a given reference's cache outcomes often contain repeated patterns, and control-flow correlation succeeds because cache outcomes are often call-chain dependent. We also demonstrate that software prefetching can achieve better performance on a modern superscalar processor when directed by correlation profiling rather than s

Descriptors :   *DATA STORAGE SYSTEMS, SOFTWARE ENGINEERING, DATA MANAGEMENT, PERFORMANCE(ENGINEERING), OPERATIONAL EFFECTIVENESS, CORRELATION, CASE STUDIES, COMPILERS.

Subject Categories : Computer Hardware

Distribution Statement : APPROVED FOR PUBLIC RELEASE