Transcription

Disaster recovery and business continuityYour expert guide to Subtitle

Disaster recovery and business continuityIn this e-guideDisaster recovery: RiskIn this e-guide:Investing in technologies and processes that can safeguard an enterpriseassessment and businessand its operations in the face of downtime should be a must for anyimpact analysisbusiness, as end-users can be remarkably unforgiving when unable toDisaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement planDisaster recoveryaccess the services they need during work and at play.Not only can a solid business continuity strategy protect organisationsfrom reputational damage and lost trade, but for those operating inregulated industries it can also prevent firms being hit with downtimerelated enforcement action.But even the most diligently prepared disaster recovery plan should beawareness and testingsubject to review from time-to-time to ensure it delivers the expectedrequire training, strategicresults.plansIn this guide, we take a look at the steps enterprise can and should takeEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 1 of 38to ensure, should their infrastructure fail, they can continue to trade andoperate, and why it pays to regularly test the robustness of their disasterrecovery processes.Caroline Donnelly, Datacentre Editor

Disaster recovery and business continuityIn this e-guideDisaster recovery: Riskassessment and businessimpact analysisDisaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement planDisaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformancePaul Kirvan, Guest ContributorDisaster recovery risk assessment and business impact analysis (BIA) arecrucial steps in the development of a disaster recovery plan. But, before welook at them in detail, we need to locate disaster recovery risk assessmentand business impact assessment in the overall planning process.To do that, let us remind ourselves of the overall goals of disaster recoveryplanning, which are to provide strategies and procedures that can helpreturn IT operations to an acceptable level of performance as quickly aspossible following a disruptive event. The speed at which IT assets can bereturned to normal or near-normal performance will impact how quickly theorganisation can return to business as usual or an acceptable interim stateof operations.Having established our mission, and assuming we have managementapproval and funding for a disaster recovery initiative, we can establish aproject plan.Case study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 2 of 38A disaster recovery project has a fairly consistent structure, which makes iteasy to organise and conduct plan development activity.

Disaster recovery and business continuityIn this e-guideDisaster recovery: Riskassessment and businessimpact analysisDisaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement planDisaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 3 of 38Adapted with permission from the BCM Lifecycle developed by the Business Continuity Institute.As you can see from The IT Disaster Recovery Lifecycle illustration, the ITdisaster recovery process has a standard process flow. In this, the BIA istypically conducted before risk assessment. The BIA identifies the mostimportant business functions and the IT systems and assets that supportthem. Next, the risk assessment examines the internal and external threatsand vulnerabilities that could negatively impact IT assets.

Disaster recovery and business continuityIn this e-guideDisaster recovery: Riskassessment and businessimpact analysisDisaster recovery trainingand staffing strategiesComing up with a newconfiguration and changeFollowing the BIA and risk assessment, the next steps are to define, buildand test detailed disaster recovery plans that can be invoked in cases critical IT assets. Such plansprovide a step-by-step process for responding to a disruptive event withsteps designed to provide an easy-to-use and repeatable process forrecovering damaged IT assets to normal operation as quickly as possible.Detailed response planning and the other key parts of disaster recoveryplanning, such as plan maintenance, are, however, outside the scope of thisarticle so let us get back to looking at disaster recovery risk assessment andbusiness impact assessment in detail.management planDisaster recovery risk assessmentDisaster recoveryIn the IT disaster recovery world, we typically focus on one or more of thefollowing four risk scenarios, the loss of which would have a negative impactawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 4 of 38 Loss of access to premisesLoss of dataLoss of IT functionLoss of skillsRisk assessments focus on the risks that can lead to these outcomes.Peter Barnes, FBCI, managing director of London-based 2C Consulting said,the impact on

Disaster recovery and business continuityIn this e-guidethe business if delivery of critical applications and services were to bedenied as a result of a fire or server failure, for example, and to assess therisksDisaster recovery: Riskassessment and businessimpact analysisDisaster recovery trainingA key aspect is to know what services run on which parts of theinfrastructure, said Andrew Hiles, FBCI, managing director of Oxfordshirebased Kingswell Internationalcompany had grown by acquisiand staffing strategiesComing up with a newconfiguration and changeOne easy way to create a risk assessment is illustrated by this table.management planDisaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 5 of 38Working with IT managers and members of your building facilities staff aswell as risk management staff if you have them, you can identify the eventsthat could potentially impact data centre operations.

Disaster recovery and business continuityIn this e-guideDisaster recovery: Riskassessment and businessimpact analysisBased on experience and available statistics, you can estimate the likelihoodof specific events occurring on a scale of 0 to 1 (0.0 will never occur, and1.0 will always occur). You can do the same with the impact of the event,using a 0 to 1 range (0.0 no impact at all, and 1.0 total loss of operations).The final column lists the product of likelihood x impact, and this becomesyour risk factor. Those events with the highest risk factor are the ones yourdisaster recovery plan should primarily aim to address.Disaster recovery trainingand staffing strategiesComing up with a newAnother way to capture and display risk information is with a risk matrix.Entries in each part of the above table can be plotted on a four-quadrantmatrix, as shown here.configuration and changemanagement planDisaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 6 of 38A risk matrix, adapted with permissionfrom "Principles and Practice of BusinessContinuity: Tools and Techniques," by JimBurtles, copyright 2007 by RothsteinAssociates; ISBN 1-931332-39-8

Disaster recovery and business continuityIn this e-guideDisaster recovery: RiskIn terms of how we treat these risks, we can use the followingcategorisation: assessment and businessimpact analysis Disaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement plan Prevent: High-probability/high-impact events (actively work tomitigate these)Accept: Low-probability/low-impact events (maintain vigilance)Contain: High-probability/low-impact events (minimize likelihood ofoccurrence)Plan: Low-probability/high-impact events (plan steps to take if thisoccurs)Types of risks to considerIn the previous section we described a basic disaster recovery riskassessment. But, there are many types of risk, so what are some of the keyones that should be addressed from a UK IT perspective?Disaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 7 of 38Supply chain disruptions present a key risk, said Susan Young, MBCI, a riskmanagement professional with a Londonan IT standpoint, reliance on outsourced providers not only presents a pureIT risk but also a supply chain risk. For example, in the Lloyd's insurancemarket in London, all businesses depend on a firm called Xchanging toprovide premiums and claims processing. This is a huge dependency withHardware failure is another key danger to UK organisations. Kingswellreport on UK email downtime

Disaster recovery and business continuityIn this e-guideDisaster recovery: Riskassessment and businessimpact analysisshowed hardware failure (server and SAN), connectivity loss and databasecorruption (in that order) as the main causes of downtime. A 2010 SunGardreport said the most common cause of UK invocations was hardware,followed by power andWater damage is a key risk to organisations in the UK, and sometimes theDisaster recovery trainingand staffing strategiesarea may bewhen taps are left running in the toilets two floors above when everyoneComing up with a newconfiguration and changemanagement planDisaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 8 of 38The BIAA BIA attempts to relate specific risks to their potential impact on thingssuch as business operations, financial performance, reputation, employeesand supply chains. The table below depicts the relationship between specificrisks and business factors.

Disaster recovery and business continuityIn this e-guideRisks can affect the entire company or just small parts of it. Operational andfinancial losses may be significant, and the impact of these events couldDisaster recovery: Riskassessment and businessimpact analysisBIAs are built on a series of questions that should be posed to key membersof each operating unit in the company, including IT. Questions shouldaddress the following issues, as a minimum:Disaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement planDisaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 9 of 38 Understanding how each business unit operatesIdentification of critical business unit processes that depend on ITFinancial value of critical business processes (for example, revenuesgenerated per hour)Dependencies on internal organisationsDependencies on external organisationsData requirementsMinimum time needed to recover data to its previous state of useSystem requirementsMinimum time needed to return to normal or near-normal operationsfollowing an incidentMinimum number of staff needed to conduct businessMinimum technology needed to conduct businessBIA outputs should present a clear picture of the actual impacts on thebusiness, both in terms of potential problems and probable costs. Theresults of the BIA should help determine which areas require which levels of

Disaster recovery and business continuityIn this e-guideprotection, the amount to which the business can tolerate disruptions andthe minimum IT service levels needed by the business.to define theDisaster recovery: Riskassessment and businessimpact analysisDisaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement planDisaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 10 of 38the tolerances to an outage for critical applications or infrastructureand reduce the risk of service loss, such that you can provide service to thebusiness in an acceptable timeframe.Next article

Disaster recovery and business continuityIn this e-guideDisaster recovery: Riskassessment and businessimpact analysisDisaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement planDisaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 11 of 38Paul Kirvan, Guest ContributorWhat are some steps companies can do to mitigate downtime resulting froma lack of trained IT staff in the aftermath of a disaster? Obviously, oneanswer is "Train additional IT staff members to perform IT tasks," but howrealistic is that? And what if those staffers are unable to respond following adisaster as well?Business continuity plans and disaster recovery training plans shouldexamine the staffing issue initially as part of the business impact analysis(BIA) and risk assessment (RA) phases. These initiatives should identifystaffing issues that need to be addressed. From a budget perspective,adding staff may not be an option. If that's the case, cross-training ofexisting IT staff is highly recommended, as is rotating the alternate staff inand out of production assignments, if possible, to ensure their skills arecurrent.If your organization has only one data center and your budget cannotunderwrite a second data center, consider one of the many hosted datacenter options currently available. These can be found under such headings

Disaster recovery and business continuityIn this e-guideDisaster recovery: Riskassessment and businessimpact analysisDisaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement planDisaster recoveryawareness and testingas Software as a Service (SaaS), Infrastructure as a Service (IaaS) or DataCenter as a Service (DCaaS). You can subscribe to as much (or as little)resources as your budget can handle. You'll also be contracting with trainedIT professionals, who should be able (with advance training, knowledge andsuitable documentation) to step in and support your production systems ifyour existing staff is unavailable.If your recovery time objectives (RTOs) are aggressive, it may be necessaryto arrange for data backup and recovery services, in addition to othermanaged IT services, to ensure that interruptions to your productionsystems will be minimal. Of course, if your organization has more than onedata center, and if the data centers are sufficiently distant from each other(e.g., at least 20-30 miles), you could replicate data from one data center tothe other and mitigate the impact of a staffing loss by spreading your ITstaff across sites and ensuring there is plenty of cross-training of allemployees.require training, strategicplansNext articleEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 12 of 38

Disaster recovery and business continuityIn this e-guideDisaster recovery: Riskassessment and businessimpact analysisDisaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement planDisaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 13 of 38Alex Barrett, Guest ContributorIn the context of information technology, the change management plan -and its kissing cousin configuration management -- are usually thought of assubsets of IT service management, or ITSM. They require configuration dataabout an organization's IT infrastructure and the services running on it.They say the only constant is change, and nowhere is that more true than inthe data center. Despite all our practice dealing with change, doing sogracefully and efficiently is still one of the most challenging aspects of IToperations.Change management helps IT operations professionals follow establishedprocedures for making changes to an environment -- or discover thechanges that cause a service to go awry, said Rob England, an IT consultantand blogger known as The IT Skeptic based in Wellington, New Zealand.According to England, these tools and processes can help IT departmentscan answer two central questions: "How fast and how accurately can youassess the impact [of a change] to your organization?" and "Does the costof downtime exceed the cost of adding more processes and tools?"

Disaster recovery and business continuityIn this e-guideDisaster recovery: Riskassessment and businessimpact analysisDisaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement planDisaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 14 of 38Indeed, no one does change management for the hell of it. IT organizationsfollow established practices and procedures in the hopes of minimizingoutages and maximizing service levels (the metric by which many of themare judged). But while we all want more uptime and the better outcomes thatchange management promises, the number of organizations that haveeffective processes in place is small.The CMDB letdownPart of the change management problem is the industry's own making. Notso long ago, IT management vendors and practitioners got it in their headsthat the first step toward change and configuration management was toimplement an IT Infrastructure Library (ITIL)-inspired configurationmanagement database (CMDB).At its core, a CMDB is a simply a database that stores so-calledconfiguration items (CIs). CIs describe and track individual assets, how theyare configured, and their relationships to one another. That data is oftenused in support of other IT management tools such as a service desk andincident management.This sounds straightforward enough, but depending on whom you ask,adoption of CMDBs has been somewhere between modest and downrightdisappointing. While CMDBs are commonplace in the Fortune 1,000, the

Disaster recovery and business continuityIn this e-guideDisaster recovery: Riskassessment and businessimpact analysisDisaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement planDisaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 15 of 38number of implementations trails off for smaller organizations, said RonniColville, an IT operations management analyst at Gartner.Among the problems that organizations have cited are high costs forsoftware and consulting, difficulty in populating the database, intergrouppolitics, and inflated expectations about CMDB capabilities."A CMDB sounds like a good idea in theory. In practice, if you try andimplement every little nuance, it's like driving pins in your eyes," said Brian deHaaff, Citrix Systems' senior product line director for GoToAssist, thecompany's IT service management offering.Indeed, in the early days of CMDBs, many organizations undertook initiativeswithout properly analyzing the work involved or the business justification,said Gartner's Colville. As a result, she said, "there were a lot of falset doesn't solve world hunger. It's not makingdinner. What the heck?'"England calls shops that need a CMDB "The 5% Club.""There are 5% of organizations that are so complex that they need a CMDB-- and have the resources to actually do it," he said. But for the remaining95%, implementing such a project is rarely worth the cost, time or effort,England said.

Disaster recovery and business continuityIn this e-guide"The main reason you would do a CMDB project is for impact assessment,"England noted. "If people can answer questions about the impact of achange fast enough, then you don't need to invest in a CMDB."Disaster recovery: Riskassessment and businessimpact analysisDisaster recovery trainingand staffing strategiesFor that 5% of shops that have paid their dues implementing a CMDB,however, it can be a beautiful thing.In part two of this article, see how a large packaged foods corporation isusing CMDB to pinpoint problems to keep production flowing in itswarehouses.Coming up with a newconfiguration and changemanagement planDisaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 16 of 38Next article

Disaster recovery and business continuityIn this e-guideDisaster recovery: Riskassessment and businessimpact analysisDisaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement planPaul Kirvan, Guest ContributorOnce you have drawn up a detailed disaster recovery plan, the next stagesin the project are twofold: to prepare and deliver disaster recoveryawareness and training programmes so all employees are prepared torespond as required by the plan in an emergency, and to to carry outdisaster recovery testing to ensure the plan works properly and that DRteams know their roles and responsibilities.Disaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 17 of 38 ISO/IEC 27031:2010, Information technology Security techniquesGuidelines for information and communication technology readinessfor business continuityThis is the global standard for IT disaster recovery as it applies to end users.Another ISO standard, ISO/IEC 24762, addresses Information andcommunications technology disaster recovery from a service providerperspective. Both these standards can help you develop and implement ICTdisaster recovery programmes.

Disaster recovery and business continuityIn this e-guideDisaster recovery: Riskassessment and businessimpact analysisDisaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement planDisaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 18 of 38Disaster recovery awareness and training strategiesimplemented to ensure that processes are in place to regularly promote ICTDR awareness in general, as well as assess and enhance competency of allrelevant personnel key to the successful implementation of ICT DRPerhaps the most important strategy in raising disaster recovery awarenessis to secure senior management support and funding for DR programmes.Visible and frequently occurring endorsements from senior management willhelp raise awareness of and increase participation in the programme.The next key strategy is to engage your human resources (HR) organisationin the process. They have the expertise to help you organise and conductawareness activities, such as department briefings and messages onemployee bulletin boards. You can also encourage HR to incorporatebriefings on DR as well as business continuity into new employee inductionprogrammes.Another important strategy is to leverage the Internet. If your organisationhas an intranet, launch a DR page that describes what your programmesdoes; answers FAQs; and provides links to forms and services, schedules,and other relevant materials.

Disaster recovery and business continuityIn this e-guideDisaster recovery: Riskassessment and businessBe sure that any awareness activities are approved by management and HR,as well as your own IT management. Your messages should be informativeactivities.impact analysisBuilding an awareness and training planDisaster recovery trainingHere are additional activities for successful disaster recovery awarenessand training programmes:and staffing strategiesComing up with a newconfiguration and changemanagement planDisaster recoveryawareness and testing require training, strategicplansEvaluating BC/DR programAs part of these activities, you should develop and conduct training on:performance Case study: Cloud collaboration boostsCumbria County Council'sdisaster response abilitiesPage 19 of 38Conduct an awareness and training needs analysis.Assess existing staff competencies regarding roles in DR plans.Establish an ongoing awareness and training programme.Establish record-keeping of staff training and awareness activities.Establish competency levels for IT staff and how they should bemaintained.Conduct staff performance assessments post-disaster and reevaluate training. Technical recovery activitiesEmergency response activities, for example, situation assessmentand evacuationSpecialised recovery, such as recovering to hot sites or cold sites orthird-party managed DR servicesReturn-to-normal activities

Disaster recovery and business continuityIn this e-guideDisaster recovery: Riskassessment and businessimpact analysisDisaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement plan Restoration of business systems and processesSince you will be working with a variety of vendors and specialised serviceproviders, examine their training programmes to see if they can beleveraged into your internally developed training activities.Disaster recovery testing strategiesThe most important strategy in disaster recovery testing is simply to test,test and test again. Your organisation depends on the availability of IToperational but that they can survive an unplanned outage. Disasterrecovery testing will ensure that all your efforts to provide recovery andresilience will indeed protect critical IT assets.Disaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 20 of 38instances, the whole set of IRBC [ICT readiness for business continuity]elements and processes, including ICT recovery, cannot be proven in onethat continually addresses the entire spectrum of operational andadministrative activities that an ICT organisation faces.Based on the size and complexity of your IT infrastructure, disaster recoverytesting activities should address recovery of hardware, software, data anddatabases, network services, data centre facilities, people (for example,

Disaster recovery and business continuityIn this e-guiderelocation of staff to an alternate site), and the business. For each of thesefactors, critical information will be identified in the business impact analysis,or BIA.Disaster recovery: Riskassessment and businessTypes of testsimpact analysisISO 27031 makes some key points with regard to disaster recovery testing:Disaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement planDisaster recoveryawareness and testingrequire training, strategicplansshould not expose the organisation to an unacceptable level of risk. The testand exercise programme should define how the risk of individual exercise isaddressed. Top-management sign-off on the programme should be obtainedand a clear explanation of the asswider business continuity management scope and objectives andcomplementary to the organisation's broader exercise programme. Eachtest and exercise should have both business objectives (even where there isno business involvement) and defined technical objectives to test or validateEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 21 of 38Since there are many aspects of an IT environment to be tested, there aredifferent kinds of tests to be initiated. This figure shows the three basic ITDR tests.

Disaster recovery and business continuityIn this e-guideDisaster recovery: Riskassessment and businessimpact analysisTypes of IT disaster recovery testsDisaster recovery trainingand staffing strategiesComing up with a newconfiguration and changemanagement planDisaster recoveryawareness and testingrequire training, strategicplansEvaluating BC/DR programperformanceCase study: Cloudcollaboration boostsCumbria County Council'sdisaster response abilitiesPage 22 of 38Basic disaster recovery testing begins with a desktop walk-through activity,in which DR team members review DR plans step by step to see if they makesense and to fully understand their roles and responsibilities in a disaster.The next kind of test, a simulated recovery, impacts specific systems andinfrastructure elements. Specifically, tests such as failover and failback ofcritical servers are among the most frequently conducted. These tests notonly verify the recoverability of primary and backup servers but also thenetwork infrastructure that supports the failover/failback and thespecialised applications that effect failover and failback.Operational exercises extend the simulated recovery test to a wider scale,typically testing end-to-end recovery of multiple systems, both internal andexternal, the associated network infrastructures that support connectivity oftho

A 2010 SunGard report said the most common cause of UK invocations was hardware, followed by power and Water damage is a key risk to organisations in the UK, and sometimes the area may be when taps are left running in the toilets two floors above when everyone