Accession Number : ADA130821

Title :   Hard CPU Related Failures and System Activity: Measurement and Modelling.

Descriptive Note : Interim technical rept.,


Personal Author(s) : Iyer,Ravishankar K ; Rossetti,David J

PDF Url : ADA130821

Report Date : May 1983

Pagination or Media Count : 42

Abstract : This paper describes the measurement and analysis of hard CPU and memory errors, and system activity at the Stanford Linear Accelerator Center computational facility. Nearly 25 percent of the errors were estimated to be permanent. The occurrence of a failure was found to be strongly correlated with the level and type of workload prior to the occurrence of the failure. For example, it is shown that the risk of a permanent error increases in a non-linear fashion with the amount of interactive processing. The observed tendency is present in three years of load data. This observation is significant because a load-failure relationship found at the CPU level must, in our view, be considered fundamental. In addition, the fact that most of the errors are permanent, provides new information on these error types viz. their load dependent behavior. Our analysis procedure, used on the SLAC data, has been validated on an artificially created data base seeded with failures. (Author)

Descriptors :   *Systems analysis, *Failure(Electronics), *Central processing units, Memory devices, Error analysis, Mathematical models, Statistical analysis, Loading(Electronics), Workload, Reliability(Electronics), Measurement

Subject Categories : Administration and Management
      Computer Hardware

Distribution Statement : APPROVED FOR PUBLIC RELEASE