System Reliability Theory. Marvin Rausand
Чтение книги онлайн.
Читать онлайн книгу System Reliability Theory - Marvin Rausand страница 39
3.5.2 Proximate Causes and Root Causes
The term root cause is often used in analyses of failures that have occurred. The term is defined in several standards, and each standard seems to have its own particular definition. Before giving our preferred definition, we define the term proximate cause, which is an immediately and (often) readily seen cause of a failure.
Definition 3.5 (Proximate cause)
An event that occurred, or a condition that existed immediately before the failure occurred, and, if eliminated or modified, would have prevented the failure.
A proximate cause is also known as a direct cause. A proximate cause is often not the real (or root) cause of a failure, as illustrated in Example 3.10.
A flashlight is part of the safety equipment in a plant. During an emergency, the flashlight is switched on, but does not give any light. A proximate (or direct) cause is that the battery is dead. If we have access to the flashlight and the battery after the emergency is over, it is straightforward to verify whether or not this was the true proximate cause.
Any battery will sooner or later go dead and if the flashlight is an essential safety equipment, it is part of the maintenance duties to test and, if necessary, replace batteries at regular intervals. “The battery has not been tested/replaced at prescribed intervals” is therefore a cause of the proximate cause. By asking “why?” this happened several times, we may get to the root cause of the failure.
For the purpose of this book, we define a root cause as:
Definition 3.6 (Root cause)
One of multiple factors (events, conditions, or organizational factors) that contributed to or created the proximate cause and subsequent failure and, if eliminated, or modified would have prevented the failure.
For some failure modes, it may be possible to identify a single root cause, but most failure modes will have several contributing causes. All too often, failures are attributed to a proximate cause, such as human error or technical failure. These are often merely symptoms, and not the root causes of the failure. Very often, the root causes turn out to be much more, such as (i) process or program deficiencies, (ii) system or organization deficiencies, (iii) inadequate or ambiguous work instructions, and/or (iv) inadequate training.
To identify root causes of failures and to rectify these is important for any system in the operational phase. It does not help only to correct the proximate causes (such as to replace the battery of the flashlight in Example 3.10) when a failure has occurred. This way, the same failure may recur many times. If, on the other hand, the root cause is rectified, the failure may never recur. Root cause analysis is briefly discussed in Section 3.7.
3.5.3 Hierarchy of Causes
The functions of a system may usually be split into subfunctions. Failure modes at one level in the hierarchy may be caused by failure modes on the next lower level. It is important to link failure modes on lower levels to the main top level responses, in order to provide traceability to the essential system responses as the functional structure is refined. This is shown in Figure 3.6 for a hardware structure breakdown. Figure 3.6 is further discussed in Section 3.6.5.
Figure 3.6 Relationship between failure cause, failure mode, and failure effect.
3.6 Classification of Failures and Failure Modes
It is important to realize that a failure mode is a manifestation of the failure as seen from the outside, that is, the nonfulfillment of one or more functions. “Internal leakage” is thus a failure mode of a shutdown valve because the valve loses its required function to “close flow,” whereas wear of the valve seal represents a cause of failure and is hence not a failure mode of the valve.
Failures and failure modes may be classified according to many different criteria. We briefly mention some of these classifications.
3.6.1 Classification According to Local Consequence
Blache and Shrivastava (1994) classify failures according to the completeness of the failure.
1 Intermittent failure. Failure that results in the loss of a required function only for a very short period of time. The item reverts to its fully operational standard immediately after the failure.
2 Extended failure. Failure that results in the loss of a required function that will continue until some part of the item is replaced or repaired. An extended failure may be further classified as:Complete failure. Failure that causes complete loss of a required function.Partial failure. Failure that leads to a deviation from accepted item performance but do not cause a complete loss of the required function.Both the complete failures and the partial failures may be further classified as:Sudden failure. Failure that could not be forecast by prior testing or examination.Gradual failure. Failure that could be forecast by testing or examination. A gradual failure represents a gradual “drifting out” of the specified range of performance values. The recognition of a gradual failure requires comparison of actual item performance with a performance requirement, and may in some cases be a difficult task.Extended failures may be split into four categories; two of these are given specific names:Catastrophic failures. A failure that is both sudden and complete.Degraded failure. A failure that is both partial and gradual (such as the wear of the tires on a car).
The failure classification described above is shown in Figure 3.7, which is adapted from Blache and Shrivastava (1994).
Figure 3.7 Failure classification.
Source: Adapted from Blache and Shrivastava (1994)
.
3.6.2 Classification According to Cause
Failures may be classified according to their causes as follows.
Primary Failures