Disaster Recovery – What would you do?
The recent weather event, Hurricane Florence, that rocked the Carolinas underscored the need for many businesses to evaluate their disaster recovery contingency planning. The implementation and successful testing of such plans can spell the difference between business continuity and expensive/extensive downtime during and following a disaster. Such disasters can be in the form of power outages, system outages, weather events (hurricanes, earthquakes, tornadoes, floods), manmade events (terrorism, bomb threats, or sabotage), fire or gas explosions, that disrupt or shutdown the current business operation.
Business continuity focuses on the entire enterprise, people, buildings, work and computing resources. Disaster recovery focuses on the ability to restore data processing following a disruptive event.
While an overall Business Continuity Plan is crucial to the lifeline of a business, this article will focus on Disaster Recovery Planning as a subset of an overall BCP program.
While it is a great start to have a Disaster Plan in place, the key to a successful program is to periodically test the plans prior to an actual emergency event requiring the activation of such plan. The key to a successful Disaster Recovery Program includes buy-in from senior management, the business community, IT staff, internal audit and all employees, as well as planning for adequate budgeting and projected expenditures.
Once the buy-in and committed support is obtained from senior management and the business community, the following steps will need to take place.
Discovery Phase
Business Impact Analysis
It is key to conduct a Business Impact Analysis to identify all key business processes. Based on the BIA, all critical systems and supporting infrastructure (e.g. servers, databases, etc.) will be identified and captured.
- Prioritize those systems based on criticality and what you cannot live without.
- Identify key personnel who manage those systems and resource owners.
Compile/collect historical relevant disaster recovery information.
Note: In the event that some of your critical systems are either vendor hosted, or cloud based, the service provider should have an existing plan available for review (this should be addressed during a Vendor Security Assessment). These systems are often not considered within a comprehensive Disaster Recovery Program, although should be if critical.
As a last step in the discovery phase, identify and document your data backup and storage solution. Ideally, incremental backups should be performed daily and full backups weekly. Backups should be stored offsite in a secure location.
Disaster Recovery policy and plan (within the context of overall Business Continuity Plan)
The following steps in developing an overall Disaster Recovery Plan would need to be taken at a minimum.
-
Identify the Disaster Recovery coordination team and executive sponsor.
-
Identify who can declare a Disaster (usually an executive).
-
Record criticality ratings for your applications (should be based on Business Impact Analysis).
-
Incorporate the following, RTO (Recovery Time Objective), RTA (Recovery Time Actual – recorded after disaster recovery exercise), RPO (Recovery Point Objective) based on criticality ratings.
-
Determine and document how tests will be conducted (e.g. structured walkthroughs, simulation, table top, actual test).
-
Determine whether individual tests will be performed or site tests.
-
Determine the frequency of tests based on criticality ratings.
-
Document what constitutes a successful test and what constitutes a failed test.
-
Document how testing results are reported.
-
Document how evidence is maintained.
Critical applications require individual testing plans:
-
Testing plans should identify roles and responsibilities of team (need network, DBA, infrastructure team, testers (everyday users), application owners, business representation and coordinators).
-
Determine if testing will be site specific or application level specific. In my experience it has been better to start small and take incremental steps.
-
-
Start with table top exercise.
-
Then try a simulation or walkthrough.
-
Then try to swing the network over and test connectivity.
-
If network connectivity is successful, then test some applications.
-
If all the above are successful, then consider a site test that would encompass multiple applications/resources within one test.
-
-
Testing plans should identify what constitutes a test.
-
Individual team responsibilities should be listed along with amount of time needed to complete each step.
-
Validation steps at each juncture are needed (Each validator should have a documented desired outcome).
-
The test steps should include the testing protocol when in disaster recovery mode, and a testing protocol once the network is swung back following the test. Each tester should have a desired outcome documented.
-
Plans should include what the testers are testing and how evidence is captured (e.g. time and date stamps). Testers should never be allowed to execute live transactions.
-
Plans should also identify where/whom the evidence is provided to.
-
Change records and tickets should be created for actual test and referenced in the actual testing documentation. Backout plan should be included.
-
Testing documentation should include all of the steps taken when taken and how long each step takes (and who performed the step).
-
All problems issues should be documented and covered in a Post Disaster Recovery test meeting. Note any differences between RTO (Recovery Time Objectives) and RTA (Recovery Time Actual) and address the gap or adjust plan accordingly. All Lessons learned should be maintained and applied to future tests.
Score carding and reporting:
-
An overall scorecard should be maintained for each critical application regarding testing status (green, yellow, red) and when next test is due.
-
Consider utilizing a tracking tool .
-
Also, a D.R. calendar of events should be in place and reviewed at Change Control meetings.
-
Schedule ongoing meetings with Disaster Recovery team, management, business groups and Internal Audit.
At the end of the day, having an ongoing and tested Disaster Recovery plan in place can help ensure continued survival of a business entity.
Finding a credible expert with the appropriate background, expertise, and credentials can be difficult. CompliancePoint is here to help.