Maintaining Mission Critical Systems in a 24/7 Environment. Peter M. Curtis
Чтение книги онлайн.
Читать онлайн книгу Maintaining Mission Critical Systems in a 24/7 Environment - Peter M. Curtis страница 35
Power Utilities
1 Do you have a working and ongoing relationship with your electric power utility?
2 Do you know who in your organization currently has a relationship with your electric power utility – i.e., facilities management or accounts payable?
3 Do you understand your electric power utility’s “Electric Service Priority” (ESP) protocols?
4 Do you understand your electric power utility’s restoration plan?
5 Are you involved with your electric power utility’s crisis management/disaster recovery tests?
6 Have you identified regulatory guidelines or business continuity requirements that necessitate planning with your electric power utility?
7 What is the relationship between the regional source power grid and the local distribution systems?
8 What are the redundancies and the related recovery capacity for both the source grid and local distribution networks?
9 What is the process of restoration for source grid outages?
10 What is the process of restoration for local network distribution outages?
11 How many network areas are there in your city?
12 What are the inter‐relationships between each network segment and the source feeds?
13 Does your infrastructure meet basic standard contingency requirements for route grid design?
14 What are the recovery time objectives for restoring impacted operations in any given area?
15 What are recovery time objectives for restoring impacted operations in any given network?
16 What are the restoration priorities to customers – both business and residential?
17 What are the criteria for rating in terms of service restoration?
18 Where does your industry rank in the priority restoration scheme?
19 How do you currently inform clients of a service interruption and the estimated time for restoration?
20 What are the types of service disruptions, planned or unplanned, that your location could possibly experience?
21 Could you provide a list of outages, type of outage and length of disruption that have affected your location during the last 12 months?
22 What are the Reliability Indices and who uses them?
23 During an outage, would you be willing to pass along information regarding the scope of interruptions to a central industry source, e.g., an industry business continuity command center?
24 Are the local and regional power utilities cooperating in terms of providing emergency service? If so, in what way? If not, what are the concerns surrounding the lack of cooperation?
25 Would you be willing to provide schematics to select individuals and/or organizations on a non‐disclosure basis?
26 Could you share your lessons learned from the events of 9/11 and the Northeast outage of 8/14/03?
27 Are you familiar with the “Critical Infrastructure Assurance Guidelines for Municipal Governments” document written by the Washington Military Department Emergency Management Division? Is so, would you describe where (specify city) stands in regard to the guidelines set forth in that document?
28 Independent of the utility’s capability to restore power to its customers, can you summarize your internal business continuity plans, including preparedness for natural and manmade disasters (including but not limited to weather‐related events, pandemics, and terrorism)?
3 Mission Critical Engineering with an Overview of Green Technologies
“As a leader … your principal job is to create an operating environment where others can do great things.”
Richard Teerlink
3.1 Introduction
Businesses that are motivated to plug into the Information Age require reliability and flexibility regardless of whether the companies are large Fortune 500 corporations or small companies serving global customers. This is the reality of conducting business today. Whatever type of business you are in, many organizations have realized that a 24/7 operation is imperative. An hour of downtime can wreak havoc on project schedules or loss of critical information, resulting in lost hours re‐keying electronic data, not to mention the potential for losing millions of dollars.
Twenty‐five years ago, the facilities manager (FM) was responsible for the integrity of the building. As long as the electrical equipment worked 95% of the time, the FM was doing a good job. When there was a problem with downtime, it was usually a computer fault. As technology improved on both the hardware and software fronts, information technology began to design their hardware and software systems with redundancy, including dual corded equipment (either an A or a B power source can fully carry the IR equipment load). As a result of IT’s efforts, computer systems have become so reliable that they’re only down during scheduled upgrades.
Today the major reasons for downtime are human‐error or utility failures: poor power quality, power distribution failures, incorrect switching of equipment or accidental EPO initiation, and environmental system failures (although that percentage remains small). When a problem does occur, the facilities manager is usually the one in the hot seat. Problems are not limited just to power quality; but also, that the staff has not been properly trained in certain situations. Further complicating matters, recruiting qualified inside staff and outside consultants can be difficult, as facilities management, protection equipment manufacturers, and consulting firms are all competing for the same talent pool to support the mission critical industry. The stark increase in data center construction around the world has only exasperated the situation.
Minimizing unplanned downtime reduces risk, but unfortunately, the most common approach is reactive. That is, spending time and resources to repair a faulty piece of equipment after it has failed. Strategic planning can identify internal risks and provide a prioritized plan for reliability improvements. Also, only when both ends fully understand the potential risk of outages, including recovery time, can they fund and implement an effective plan. Because the costs associated with reliability enhancement are significant, sound decisions can only be made by quantifying the performance benefits and weighing the options against their respective risks.
Planning and careful implementation will minimize disruptions while making the business case to fund capital improvements and maintenance strategies. When the business case for additional redundancies, consultants, and ongoing training reaches the boardroom, the entire organization can be galvanized to prevent catastrophic data losses, damage to capital equipment, and even danger to life safety.