Design for Excellence in Electronics Manufacturing. Cheryl Tulkoff
Чтение книги онлайн.
Читать онлайн книгу Design for Excellence in Electronics Manufacturing - Cheryl Tulkoff страница 13
Processor Watchdog Supervisor Performance Evaluation
This procedure is intended to verify that that the systems supervisor circuit was correctly implemented and is effective at recognizing faults and initiating corrective action attempts. Digital microprocessing devices use a “dead man's switch”‐like supervision circuit (a.k.a. watchdog or COP (computer operating properly) to monitor for the continued presence of a state of health (SOH) indicator signal. To ensure that disruptions and faults can be rapidly detected and corrected, the supervisor circuit monitors pulses the microprocessor is programmed to send within specified time intervals as the result of handshaking typically between timing interrupt routines and the main programming loop. If the supervisor is not toggled in time, it is assumed that the processor is hung up or executing an endless loop. The supervisor then generates a pulse to the processor to warn that a fault has occurred; typically, this directly or indirectly triggers a system reset that also triggers a diagnostics counter that documents the number of COP‐triggered resets over a specified number of system power‐up activation cycles.
Fault Injection Testing
Fault injection testing consists of a systematic series of evaluations where hardware and/or software elements are purposefully disrupted or disabled to test and grow the robustness of the whole system to deal with abnormalities and exception faults. The goal is to verify that a device is tolerant of potential system abnormalities. The fault injection procedures focus on functional stability during abnormalities. This requires that:
1 The device will not be physically damaged by an abnormal input or output,
2 The program can recognize fault conditions and abnormal I/O and automatically compensate via alternative operating or graceful degradation operating modes to continue to remain stable and ensure safe system operation to the highest degree possible while issuing fault alerts and logging appropriate diagnostic fault codes.
3 If the abnormality or disruption is removed, the device resumes its normal operating mode.
Before performing this procedure, a mechanization review of the device's internal and external hardware and software is required to organize the device into logical functional subsystems of related input and outputs and to identify the type of fault conditions appropriate to each I/O. This data is to be used to develop a detailed fault injection test script. When function‐critical parameters come from digital values, delivered over a data link, the denial or disruption of this data should be included as items in the fault tolerance evaluation plan.
Using Stress Testing
In usage stress testing, the previously defined usage cases should be applied to the device in increasingly faster sequences to evaluate the capability of the software response timing to keep up with highly dynamic usage conditions. The objective of the SW stress test is to determine the robustness of software by testing it well beyond the limits of normal operation. Stress testing is particularly important for mission‐critical software but is used for all types of software. Software stress tests emphasize robustness, availability, and error handling under a heavier load than would be considered correct behavior under normal circumstances. Stress testing tries to break the system under test by overwhelming its resources or by taking resources away from it. The primary purpose is to make sure that the software can deal with system overload fails and recover gracefully – this ability is known as recoverability.
Worst‐Case Testing
Worst‐case testing evaluates the software's ability to deal with variation, tolerance stackup and environmental drift in the device's I/O, hardware, and circuits that is short of the total failure situations evaluated during fault injection testing. Worst‐case software testing requires a detailed understanding of the design of the device and its software structure. It is the responsibility of the system engineers and software architects to define the specific requirements, test plan scripts, and acceptance criteria for the device. The plan should be reviewed with the product team and the results reported after the testing. The evaluation procedure may reveal that the system requires addition operating state inputs like temperature sensor inputs to identify the extremes of worst‐case operating conditions.
2.4 Reliability Data
Science is an ongoing race between our inventing ways to fool ourselves, and our inventing ways to avoid fooling ourselves.
Regina Nuzzo (2015), “How scientists fool themselves – And how they can stop”
This quote comes from a fascinating article Nature Magazine published in 2015 about how scientists deceive themselves and what they can do to prevent it. Nature was responding to a rash of articles decrying bias, reproducibility, and inaccuracy in published journal studies. Although the original focus of the articles concerned the fields of psychology and medicine, the topic is directly applicable to the electronics field and especially relevant to professionals performing failure analysis and reliability research. Reliability data has always been extremely sensitive both within and between companies. You'll rarely see reliability data unless the results are overwhelmingly positive or resulted from a catastrophic event. Furthermore, the industry focuses more on how to organize and analyze data and less about the best way to select or generate that data in the first place. Can you truly rely on the reliability data you see and generate?
Relevant bias recognition and prevention lessons should be learned and shared. For example, how many times have you been asked to analyze data only to be told the expected conclusion or desired outcome before you start? The term bias has many definitions, both inside and outside of scientific research. The definition we prefer is that bias is any deviation of results or inferences from the truth (reality) or the processes that lead to the deviation (Porta 2014). An Information Week article sums up the impact of data bias on industry well, stating: “Flawed data analysis leads to faulty conclusions and bad business outcomes” (Morgan 2015). That's something we all want to avoid. Biases and cognitive fallacies include:
Confirmation bias: A wish to prove a certain hypothesis, assumption, or opinion; intentional or unintentional
Selection bias: Selecting non‐random or non‐objective data that doesn't represent the population
Outlier bias: Ignoring or discarding extreme data values
Overfitting and underfitting bias: Creating either overly complex or overly simplistic models for data
Confounding variable bias: Failure to consider other variables that may impact cause and effect relationships
Non‐normality bias: Using statistics that assume a normal distribution for non‐normal data
Another particularly useful definition comes from the US government's Generally Accepted Government Auditing Standards. They use the concept of data reliability, which is defined as “a state that exists when data is sufficiently complete and error‐free to be convincing for its purpose and context.” Data reliability refers to the accuracy and completeness of data for a specific intended use, but it doesn't mean that the data is error‐free. Errors may be found,