System Reliability Theory. Marvin Rausand
Чтение книги онлайн.
Читать онлайн книгу System Reliability Theory - Marvin Rausand страница 40
Figure 3.8 A primary failure leading to an item fault.
Secondary Failures
A secondary failure, also called overstress or overload failure, is a failure caused by excessive stresses outside the intended operating context of the item. Typical stresses include shocks from thermal, mechanical, electrical, chemical, magnetic, or radioactive energy sources, or erroneous operating procedures. The stresses may be caused by neighboring items, the environment, or by users/system operators/plant personnel. Environmental stresses, such as lightning, earthquake, and falling object, are sometimes called threats to the item. We may, for example, say that lightning is a threat to a computer system and that heavy snowfall and storm are threats to an electric power grid. The overstress event leads to a secondary failure with some probability
A secondary failure usually leads to an item fault, and a repair action is usually necessary to return the item to a functioning state. The structure of a secondary failure is shown in Figure 3.9. Secondary failures are generally random events, but it is the overstress event that is the main contributor to the randomness.
Figure 3.9 A secondary failure, caused by an overstress event, leading to an item fault.
Systematic Failures
A systematic failure is a failure due to a systematic cause that may be attributed to a human error or misjudgment in the specification, design, manufacture, installation, operation, or maintenance of the item. A software bug is a typical example of a systematic fault. After the error is made, the systematic cause remains dormant and hidden in the item. Examples of systematic causes are given in Example 3.12.
A systematic failure occurs when a certain trigger or activation condition occurs. The trigger can be a transient event that activates the systematic cause, but can also be a long‐lasting state such as environmental conditions, as illustrated in Example 3.14. The trigger event is often a random event, but may also be deterministic.
A systematic failure can be reproduced by deliberately applying the same trigger. The term systematic means that the same failure will occur whenever the identified trigger or activation condition is present and for all identical copies of the item. A systematic cause can only be eliminated by a modification of the design or of the manufacturing process, operational procedures, or other relevant factors (IEC 61508 2010). A systematic fault leading to a systematic failure by the “help” of a trigger is shown in Figure 3.10. Systematic failures are often, but not always, random events, but it is the trigger that is random, whereas the item failure is a consequence of the trigger event.
Figure 3.10 A systematic fault leading to a systematic failure.
Example 3.11 (Airbag system in a car)
A new car model was launched and a person driving such a car crashed into another car. The airbags did not operate as intended and the driver was critically injured. After the accident, it was found that the airbag system was not correctly installed. Later, it was found that the same error was made for all cars of the same type. The airbag failure was due to a systematic cause and all the cars of the same type had the same systematic fault. All these cars had to be recalled for repair and modification. There was nothing wrong with the airbag system as such and the airbag system manufacturer could not be blamed for the accident (unless the installation instructions were misleading or ambiguous). The car manufacturer had to cover the consequences of the failure. For drivers and passengers, the cause of the failure does not matter. A systematic failure has the same consequences as a primary (random hardware) failure.
Example 3.12 (Failure causes of a gas detection system)
A heavy (i.e. heavier than air) and dangerous gas is used in a chemical process. If a gas leakage occurs, it is important to raise an alarm and shut down the process as fast as possible. For this purpose, a safety‐instrumented system (SIS) is installed, with one or more gas detectors. The SIS has three main parts (i) gas detectors, (ii) a logic solver that receives, interprets, and transmits signals, and (iii) a set of actuating items (e.g. alarms, shutdown valves, door closing mechanisms). The purpose of the SIS is to give an automatic and rapid response to a gas leakage. Many more details about SIS may be found in Chapter 13.
Assume that a gas leak has occurred without any response from the SIS. Possible causes of the failure may include the following:
A primary (i.e. random hardware) failure of the SIS.
The installed gas detectors are not sensitive to this particular type of gas, or have been mis‐calibrated.
The gas detectors have been installed high up on walls or in the ceiling (remember, the gas is heavier than air.)
The gas detectors have been installed close to a fan (no gas will reach them.)
The gas detectors have been inhibited during maintenance (and the inhibits have not been removed.)
The gas detector does not raise alarm due to a software bug. (Most modern gas detectors have software‐based self‐testing features.)
The gas detector is damaged by, for example, sand‐blasting. (Has happened several times in the offshore oil and gas industry.)
Security Failures
A security failure is a failure caused by a deliberate human action. Many systems are exposed to a number of threats. The threats may be related to physical actions or