Real Resiliency is Network Aware Disaster Recovery

Disaster_Protection_blogsizedWhen it comes to disaster recovery, there is an awful truth most vendors don’t like to discuss.  When businesses really need their disaster recovery solutions, they often don’t actually work or, at least, don’t work according to plan. While many factors are at play, in my experience, network issues are most frequently at the core of the problems. And it is not difficult to understand why.

We often imagine that there are just two major systems engaged in disaster protection: the data center that needs protection and the data center that provides recovery. What we forget is that a third system is equally as important: the network that connects the two data centers.  It is only courtesy of that network that the recovery data center can maintain its recovery systems and it is only courtesy of that network that you can access them. At the same time, that network can stand between you and the disaster protection for which you have paid. Unless it is fully integrated into your disaster recovery planning, you may be at risk.

How the network can be a problem:

After years of troubleshooting disaster recovery solutions, I have come to understand that network issues are often at the center of any challenges the customer may face. Here are seven ways in which the network can disrupt even the best laid recovery plans.

  1. Network failure: Failure of the network is among the most common causes of “data center outage”.[1] Not only does the network fail all too often, frequently the “last mile” data link is a single point of failure. Unless your user base can access them, the recovery systems for which you pay so dearly won’t provide any value.
  2. Network mismatched to your needs: Not all networks are created equal. Your particular applications may be sensitive to latency, bandwidth, or packet loss. It is vital that you select the right networking technology (e.g., VPN, MPLS, or EVPLS) for your unique needs.
  3. Failure of the network to adapt: During different phases of disaster protection you have different priorities. During the initial “seeding” of you recovery data center you need as much bandwidth as possible. When normal replication is proceeding, you need to ensure that more replication bandwidth is reserved for your critical applications while not interfering with production Internet traffic.  When you have failed over to your recovery data center, you need to ensure that latency and bandwidth are consistent with the needs of your user community. Because your needs change, your network should as well.
  4. Poor bandwidth management: Because WAN bandwidth is almost always a small fraction of what you have within your production data center, even small inefficiencies can have a big impact. Well managed replication that leverages caching, compression, traffic shaping, and de-duplication is vital to ensuring that you have the disaster protection you need.
  5. Stale DNS look-ups: Network providers typically cache DNS lookups for many hours. Under normal circumstances this behavior is helpful, but when you have failed over to a recovery data center, it can present a major problem.  DNS caching can mean that clients won’t get redirected to your recovery systems until as much as 24 hours have elapsed after your recovery systems launch.
  6. Need for network awareness during tests: In a real failover scenario, you want to redirect client traffic to the recovery data center. In a test of failover, you want the exact opposite behavior.  Unless, you are able to isolate your recovery data center when you need to, testing your DR solution may be a big problem.  Even if you can isolate your recovery data center, unless your recovery DNS is network aware, you can even inadvertently re-direct client traffic to the recovery data center when you re-connect it after a test.
  7. Network fee structure: Last but certainly not least, it is important that you purchase network service from a provider with a fee structure specifically designed for DR workloads. WAN bandwidth is expensive and, therefore, always a precious resource.  DR workloads, under normal circumstances, have high traffic into the recovery data center and very little out from it. Your network fees should be designed with this in mind.

Because CenturyLink is one of the world’s largest Internet Service Providers, it can directly incorporate network awareness into its disaster and resiliency services portfolio.  CenturyLink offers a full suite of networking technologies (VPN, MPLS, and EVPLS).  With CenturyLink you can pick the data link and routing technology best suited to your unique needs. Further, CenturyLink offers adaptive traffic shaping, compression, and additional forms of bandwidth optimization within it disaster protection and resiliency products.

With dozens of cloud and managed hosting data centers around the world, CenturyLink can help you pick a recovery data center that will have the right transport latency to your user community. Most importantly, CenturyLink has designed its network fee structure specifically to help reduce the costs for you to maintain your disaster protection environment.  CenturyLink meters only traffic out of its recovery data centers, so you pay nothing for all the normal replication traffic that you drive into them. To learn more about CenturyLink network-aware disaster protection and resiliency services, check out our website.

  1. Forrester, The State of Business Technology Resiliency, Q2 2014:

Leave a Reply

No comments yet