Overcoming the Causes of Data Center Outages

OVERCOMING THE CAUSES OF DATA CENTER OUTAGES

OVERCOMING THE CAUSES OF DATA CENTER OUTAGES

Executive SummaryExecutive Summary

A data center’s main function should be to provide constant uptime for the mission-critical applications it houses. However, unplanned outages can happen, and data center operators must be proactive in finding ways to prevent them. According to the Ponemon Institute’s 2016 survey, Cost of Data Center Outages, UPS system failure remains as the top cause of unplanned data center outages. Understanding the causes of data center outages and finding ways to address them are crucial to preventing business disruption that can lead to customer churn and damaged reputations.

What’s Causing Data Center Outages?

According to the Uptime Institute’s Data Center Industry Survey of 1,000 global data center operators and IT practitioners, between 25% and 46% of the respondents experienced a businessimpacting outage in 2014.

NUMBER OF BUSINESS-IMPACTING OUTAGES AT YOUR DATA CENTER IN THE PAST 12 MONTHS, BY INDUSTRY

OVERCOMING THE CAUSES OF DATA CENTER OUTAGES

Uptime Institute: 2014 Data Center Industry Survey
https://journal.uptimeinstitute.com/2014-data-center-industry-survey/

A study commissioned by Emerson detailed the causes of data center outages and cited battery failure, cybercrime, and human error as the top three reasons. UPS system failure, which accounts for one-quarter of all root causes, remains as the top cause of unplanned data center outages.

OVERCOMING THE CAUSES OF DATA CENTER OUTAGES

Sponsored by Emerson Network Power 
Independently conducted by Ponemon Institute LLC 
http://www.emersonnetworkpower.com/en-US/Resources/Market/Data-Center/Latest-Thinking/Ponemon/Pages/2016-Cost-of-DataCenter-Outages-Report.aspx

The Uptime Institute, citing the Emerson work, adjusted the results to equal 100% to show battery failure and human error being responsible for more than 50% of all outages.

OVERCOMING THE CAUSES OF DATA CENTER OUTAGES

Primary root causes or reported unplanned outages.

These unplanned outages have a significant impact on mission-critical applications and cost businesses millions of dollars each year.

Uptime Institute: Data Center Outages, Incidents, and Industry Transparency
https://journal.uptimeinstitute.com/data-center-outages-incidents-industry-transparency/

How Much Does an Unplanned Outage Cost?

According to the Electric Power Research Institute (EPRI), 98% of power outages last less than 10 seconds. But imagine how much it can cost your business if your mission critical applications fail for a mere 10 seconds.

Ponemon Institute’s 2016 survey showed that the overall average cost of data center outage is $740,357—a significant 38% increase since 2010.

OVERCOMING THE CAUSES OF DATA CENTER OUTAGES

Sponsored by Emerson Network Power 
Independently conducted by Ponemon Institute LLC 
http://www.emersonnetworkpower.com/en-US/Resources/Market/Data-Center/Latest-Thinking/Ponemon/Pages/2016-Cost-of-DataCenter-Outages-Report.aspx

An unplanned outage has serious business consequences in terms of lost revenue, lost customers, and lost brand loyalty.

The complexity of today’s data centers continues to create challenges for organizations as they architect and manage their IT infrastructure to reduce costly interruptions

Steve Hassell, President, Data Center Solutions, Emerson Network Power, 2016

What Can You Do to Prevent Data Center Outages?

The root causes of data center outages can be prevented through these proactive strategies:

UPS system failure

Emerson’s white paper on battery management recommends regularly monitoring UPS batteries’ ambient temperature and cell voltages to keep track of their status. Follow battery maintenance best practices when performing capacity testing and do it regularly.

Cybercrime (DDoS

Security must be addressed at every level especially since cybercrime has become the second leading cause of unplanned outages. To increase resistance against attacks, data center operators should perform regular system audits and ensure that their compliance certifications are updated. They can also leverage DDoS security solutions to help defend against sophisticated attacks. Automating security management to simplify patch management and provide early detection of attacks can also help in preventing unplanned outages due to cybercrime.

Accidental/human error

Accidental/human error still accounts for 22% of unplanned outages, with no change from Ponemon’s last survey in 2013. It should be noted that there is no significant progress in mitigating what should be an avoidable cause of an unplanned outage.

Conducting regular and comprehensive training for data center staff should be a top priority. You can also document methods of procedure (MOPs) for performing complex actions to minimize errors and ensure desired outcomes. Operators can reduce downtime by making sure that only experienced professionals are monitoring, maintaining, and managing the power and infrastructure 24/7.

Water, heat or CRAC failure

Utilizing N+1 redundancy, establishing good load management, and performing regular preventive maintenance can help mitigate outages due to water, heat or CRAC failures. Optimizing the air flow in your data center by adopting a cold-aisle strategy will also help.

Weather related

Natural disasters are inevitable but taking precautionary measures ahead of time can minimize the impact of an outage. Ensure that your disaster recovery plan and backup diesel generators are tested regularly.

Generator failure

Though generator failures account for just 6% of outages, it is still important to test generators and switchgears regularly. Utilizing N+1 redundancy and performing preventive maintenance should be a priority.

 IT equipment failure

Perform daily physical inspections to ensure that all systems are in excellent working condition. Utilizing N+1 redundancy in your data center can also mitigate infrastructure failures.

Businesses with mission-critical applications and sensitive enterprise data should consider colocating with a fully redundant and compliant data center with an excellent uptime track record. Today’s leading colocation facilities are designed with resilient critical systems, a redundant battery backup and cooling system, and experienced professional data center managers.

365 Data Centers offers colocation services that have provided 100% uptime for more than ten years.

How has 365 Data Centers Avoided Outages?

It’s not the design. It’s the attention to detail.

With a long, reliable operating history, 365 Data Centers has maintained 100% uptime through 100 power outages, 56 lightning storms, 5 hurricanes, 563 hailstorms, and 389 floods, among other incidents. Our experienced site managers and technicians have been managing the sites and pride themselves on maintaining uptime. We maintain a rigorous daily inspection of all components of power distribution and cooling. We regularly test our equipment and pro-actively prepare for major weather events. We’ve been able to avoid customer-impacting outages during 9/11, Hurricane Sandy, Hurricane Katrina and the Northeast Blackout of 2003.

Why Colocate with 365 Data Centers?

Using a combination of sophisticated systems, state-of-the art infrastructure, processes, and experienced technicians, 365 Data Centers has provided 100% uptime for more than ten years.

Our facilities are highly redundant with N+1 uninterruptible power systems (UPS), automatic transfer switches, and on-site backup generators. All 10 of our colocation facilities and processes are compliant with HIPAA, PCI, SSAE 16, and ISAE 3402.

Conclusion

Expensive and complex designs do not guarantee higher availability. In fact, increased complexity can lead to more downtime if the infrastructure is not professionally and rigorously managed, monitored and maintained.

Maintaining a disciplined schedule of daily inspection and pro-active maintenance is critical to maintaining uptime. With the help of an experienced technical staff and detailed methods and procedures, you’ll ensure that your business operations avoid disruption.

References

http://www.emersonnetworkpower.com/en-US/About/NewsRoom/NewsReleases/Pages/Emerson-Network-Power-Study-Says-Unplanned-Data-Center-Outages-CostCompanies-Nearly-9000-Per-Minute-.aspx
http://www.emersonnetworkpower.com/documentation/en-US/Resources/Market/Data-Center/Library/White-Papers/Documents/ProactiveBattMon.pdf
http://www.emersonnetworkpower.com/en-US/Resources/Market/Data-Center/Library/infographics/Pages/Cost-of-Downtime.aspx
http://www.emersonnetworkpower.com/en-US/Resources/Market/Data-Center/Latest-Thinking/Ponemon/Pages/2016-Cost-of-Data-Center-Outages-Report.aspx
https://journal.uptimeinstitute.com/2014-data-center-industry-survey/
https://journal.uptimeinstitute.com/data-center-outages-incidents-industry-transparency/
https://www.a10networks.com/sites/default/files/A10-SB-19140-EN.pdf
https://journal.uptimeinstitute.com/the-making-of-a-good-method-of-procedure/

map-inter

Download PDFDownloadDownload

Find Out More