Maintaining Mission Critical Systems in a 24/7 Environment. Peter M. Curtis

Чтение книги онлайн.

Читать онлайн книгу Maintaining Mission Critical Systems in a 24/7 Environment - Peter M. Curtis страница 36

Автор:
Жанр:
Серия:
Издательство:
Maintaining Mission Critical Systems in a 24/7 Environment - Peter M. Curtis

Скачать книгу

      Figure 3.1 “Seven steps” is a continuous cycle of evaluation, implementation, preparation, and maintenance

      (Source: Courtesy of PMC Group One, LLC)

% Uptime/Reliability Level Downtime Per Year
99% 87.6 hours
99.9% 8.76 hours
99.99% 52 minutes
99.999% 5.25 minutes
99.9999% 32 seconds

      In order to design a building with the appropriate level of reliability, a company must first assess the cost of downtime and determine its associated risk tolerance. Because recovery time is now a significant component of downtime, downtime can no longer be equated to simple power availability, measured in terms of one nine (90%) or six nines (99.9999%). Today, recovery time is typically many times longer than outages, since operations have become much more complex. Restoration of a shutdown IT infrastructure backbone must be carried out in a specific sequence so that IT equipment can be restored with limited communication conflicts and be brought back online speedily. Just turning IT equipment on again does not work with our complex IT systems. Is a 32‐second outage really only 32 seconds? Is it perhaps 2 hours or 2 days? The real question is: How long does it take to fully recover from the 32‐second outage and return to normal operational status? Although measuring in terms of nines has its limitations, it remains a useful measurement we need to identify. For a 24/7 facility:

      In new 24/7 facilities, it is imperative to not only design and integrate the most reliable systems, but also to keep them simple. When there is a problem, the facilities manager is under enormous pressure to isolate the faulty system without disrupting any critical electrical loads and does not have the luxury of time for complex switching procedures during a critical event. An overly complex system can be a quick recipe for failure via human error if key personnel who understand the system functionality are unavailable. When designing a critical facility, it is important that the building design does not outsmart the facilities manager. Companies can also maximize profits and minimize cost by using the simplest design approach possible or integrate automatic recovery or “self‐healing” automatic controls to recover from a failure. One prevalent example is the current use of Static Transfer Switches (STS’s) discussed in a later chapter. The STS will automatically and within milliseconds switch power sources to critical equipment.

      (Source: Data from Information Technology Intelligence Consulting).

Industry Average Cost per Hour in 2017
Energy $22,321,000
Brokerage $9,300,000
Media $9,000,000
Manufacturing $8,500,000
Health Care $6,900,000
Retail $6,600,000
Telecommunications $4,800,000
Credit Card Operations $3,100,000
Human Life “Priceless”

      * Prepared by a disaster‐planning consultant of Contingency Planning Research

      Imagine that you are the manager responsible for a major data center that provides approval of checks and other on‐line electronic transactions for American Express, MasterCard, and Visa. On the biggest shopping day of the year, the day after Thanksgiving, you find out that the data center has lost its utility service. Your first reaction is that the data center has a UPS and standby generator, so there is no problem, right? However, the standby generator is not starting due to a fuel problem, and the data center will shut down in 15 minutes, the amount of time the UPS system batteries can supply power at full load. The penalty for not being proactive is the loss of revenue, potential loss of major clients, and if the problem is large enough, your business could be at risk of financial collapse. You, the manager, could have avoided this nightmare scenario by exercising the standby generator every week for 30 minutes – the proverbial ounce of prevention.

      There are about ten times as many UPS systems in use today than there were 10 years ago, and many more companies are still discovering their worth after losing data during a power line disturbance. Do you want electrical outages to be scheduled or unscheduled? Serious facilities engineers use comprehensive preventative maintenance procedures to avoid being caught off‐guard.

      Mission critical facilities cannot be susceptible at any time to an outage, including during maintenance of the subsystems. Therefore, careful consideration must be given in evaluating and implementing redundancy in systems design. Examples of redundancy are classified as (N+1) and (N+2) configurations and are normally applied to the systems below:

       Utilities service

       Power distribution

       UPS

       Emergency generator

       Fuel system supplying emergency generator

       Mechanical

Скачать книгу