Title : Workload, Performance, and Reliability of Digital Computing Systems.
Corporate Author : CARNEGIEMELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE
Personal Author(s) : Castillo,Xavier
Report Date : 01 Dec 1980
Abstract : In this paper a new modeling methodology to characterize failure processes in TimeSharing systems due to hardware transients and software errors is summarized. The basic assumption made is that the instantaneous failure rate of a system resource can be approximated by a deterministic function of time plus a zeromean stationary Gaussian process, both depending on the usage of the resource considered. The probability density function of the time to failure obtained under this assumption has a decreasing hazard function, partially explaining why other decreasing hazard function densities such as the Weibull fit experimental data so well. Furthermore, by considering the Kernel of the Operating System as a system resource, this methodology sets the basis for independent methods of evaluating the contribution of software to system unreliability, and gives some non obvious hints about how system reliability could be improved. A real system has been characterized according to this methodology, and an extremely good fit between predicted and observed behavior has been found. Also, the predicted system behavior according to this methology is compared with the predictions of other models such as the exponential, Weibull, and periodic failure rate. (Author)
