Title :   Analysis and Modelling of Transient Errors in Digital Computers.

Personal Author(s) : McConnel,Stephen Roy

Report Date : JUN 1981

Abstract : Experimental data on transient errors from several digital computer systems is presented and analyzed. This is the first scale public study on the statistical distribution of transient errors. The systems for which data has been collected are the DEC PDP-10 series computers, the Cm multiprocessor, and the C.vmp tolerant microprocessor. Statistical tests indicate that transient errors follow a decreasing hazard rate distribution. This is at variance with the standard assumption of constant hazard rates (exponential distribution) used in reliability modeling, and requires models of greater complexity for accurate results. Models of common fault tolerant redundant structures are developed using the Weibull distribution, which has a time-varying hazard rate. Both analytical and simulation models are used to analyze the differences between the reliabilities predicted by Weibull based transient error models and those predicted by exponential based models. The analysis indicates a significant difference between the models based on the exponential distribution and those based on the decreasing hazard rate Weibull distribution. Reliability differences ranging from -0.10 to +0.20 and factors greater than 2.0 in Mission Time Improvement for Weibull parameters equivalent to measured system behavior are seen in the model results. System designers should be aware of these differences. (Author)

