Business Resiliency (BCP & DRP)



Business resilience is the ability of an organization’s to adjust when the disruptions, disasters and negative incidents occur. It is the action plan which describes the steps to maintain continuous operations and protect the organization’s assets during disruptions.  An asset is anything which has value to the organization. There are many types of assets like Information assets (data files, databases), Software assets (system s/w, application s/w), physical assets (computer equipment and communication equipment), service assets (telecom, power), people assets, paper assets etc. Even company image is an asset for a company. Business resilience comprises of BCP and DRP.

The possible disasters or emergencies are: 

  • Denial of access
  • Failure of critical suppliers
  • Human Error
  • Technical error
  • Fraud, Sabotage, Extortion, Espionage
  • Industrial action
  • Natural disasters
  • Viruses and other security breaches 

Before understanding what is BCP and what is DRP, we will understand the difference in recovery and restoration.

Recovery and restoration:

Please note that, recovery and restoration are separate concepts. In this context, recovery involves bringing business operations and processes back to a working state while restoration involves bringing a business facility and environment back to a workable state.

DRP and BCP:

Most organizations have some degree of DRPs (disaster recovery plans) in place for the recovery of IT infrastructure, critical systems and associated data. However, many organizations don't have the BCP (Business Continuity Plan) which is nothing but the adaption plans for their key business units and business processes during a period of IT disruption. In absence of BCP, organizations don't know how the particular business unit will function till the system and IT infra recovers from disaster and disruption as per DRP. 

DRP: 

Disaster recovery (DRP) is a term that describes the plans a company puts into place that it will use to respond to a disaster or other critical event. This can include natural disasters, fire, data loss, cyber-attacks, terrorism, accidents, active shooters and other incidents that have the ability to hamper the business’ operations. Disaster recovery plans help to guide the organization in its response to the incident or event and provide guidance on returning to usual operations safely.

Personnel and Communications is an important part of DRP. A disaster recovery plan should also contain a list of personnel to contact in the event of

a disaster. Usually, this includes key members of the DRP team as well as personnel who execute critical disaster recovery tasks throughout the organization. This response checklist should include alternate means of contact (that is, pager numbers, mobile phone numbers, and so on) as well as backup contacts for each role if the primary contact becomes unavailable for some reason.

BCP:

In simple words, BCP (Business Continuity Plan) is related to business's processes to remain operational during and after a disaster. This includes contingency planning for how a company will operate, who will carry out roles, where the business will operate from, and what effects this will have on normal business operations. 

To summarize, it is important that both BCP and DRP should be in place and go together during disruptions. Both should be aligned with organization's goals and risk tolerance. Risk tolerance is the degree of risk or uncertainty that is acceptable to an organization. A risk tolerance minimum and maximum limit is usually set by the committee that oversees the organization's risk management strategy, and is then approved by leadership. High risk tolerance means that an organization is willing to take lots of risk, while low risk tolerance means the company isn't.

Which comes first? BCP or DRP?

Please note that, the top priority of BCP and DRP is always people. The primary concern is to get people out of harm’s way; then you can address IT recovery and restoration issues. We should understand the distinction between business continuity planning and disaster recovery planning. One easy way to remember the difference is that BCP comes first, and if the BCP efforts fail, DRP steps in, to fill the gap. Business continuity (BCP) focuses on keeping business operational during a disaster, while disaster recovery (DRP) focuses on restoring data access and IT infrastructure after a disaster. 

BCP planning:

The goal of BCP planners is to implement a combination of policies, procedures, and processes such that a potentially disruptive event has as little impact on the business as possible. BCP focuses on maintaining business operations with reduced or restricted infrastructure capabilities or resources. As long as the continuity of the organization’s ability to perform its mission-critical work tasks is maintained, BCP can be used to manage and restore the environment. If the continuity is broken, then business processes have stopped and the organization is in disaster mode; thus, disaster recovery planning (DRP) takes over.

The overall goal of BCP is to provide a quick, calm, and efficient response in the event of an emergency and to enhance a company’s ability to recover from a disruptive event promptly. The BCP process, has four main steps as per ISC2 (International Information System Security Certification Consortium):

  • Project scope and planning
  • Business impact assessment
  • Continuity planning
  • Approval and implementation

For developing BCP, doing Business impact analysis (BIA) is a critical step. BIA is used to evaluate the critical processes (and IT components supporting

them) and to determine time frames, priorities, resources and interdependencies. BIA requires a high level of senior management support/sponsorship and extensive involvement of IT and end-user personnel. The criticality of the information resources (e.g., applications, data, networks, system software, facilities) that support an organization’s business processes must be approved by senior management. For the BIA, it is important to include all types of information resources and to look beyond traditional information resources (i.e., database servers).

There are below different approaches for performing a BIA:

One popular approach is a questionnaire approach, which involves developing a detailed questionnaire and circulating it to key users in IT and end-user areas. The information gathered is tabulated and analyzed. If additional information is required, the BIA team would contact the relevant users for additional information. Another popular approach is to interview groups of key users. The information gathered during these interview sessions is tabulated and analyzed for developing a detailed BIA plan and strategy. A third approach is to bring relevant IT personnel and end users together in a room to come to a conclusion regarding the potential business impact of various levels of disruptions. The latter method may be used after all the data are collected. Such a mixed group will quickly decide on the acceptable downtime and vital resources. Wherever possible, the BCP team should analyze past transaction volume in determining the impact to the business if the system were to be unavailable for an extended period of time. This would substantiate the interview process that the BCP team conducts for performing a BIA. 

Both BCP and DRP will cost to the organization as below:

Recovery Cost is the total amount of money that you spend to restore your business operations or systems after a disruption. If the business continuity strategy aims at a longer recovery time, it will be less expensive than a more stringent requirement but may be more susceptible to downtime costs spiraling out of control. Normally, the shorter the target recovery time, the higher the fixed cost. The organization pays for the cost of planning and implementation even if no disaster takes place. If there is a disaster, variable costs will significantly increase (e.g., a warm site contract may consist of a flat annual fee plus a daily fee for actual occupation; extra staff, overtime, transportation and other logistics (e.g., staff per diem, new communication lines, etc.) need to be considered. Variable costs will depend on the strategy implemented.

BCP and DRP difference in detail: 

(1) Definition: 

BCP is a business's level of readiness to maintain critical functions after an emergency or disruption. These events can include Security breaches. Natural disasters, Power outages while DRP is an organization's ability to restore access and functionality to IT infrastructure after a disaster event, whether natural or caused by human action (or error)."

(2) Organizational Priorities:

BCP ensures business remains operational during crisis. It keeps disruptions to minimum while DRP attempts to limit the system failures and restores the system as quickly as possible"

(3) Scope of actions:

BCP includes all the necessary business functions that help to keep organization running. Often includes non-IT aspects of business. Involves alternate personnel, equipment, and facilities. DRP focuses on single IT system and storage of data"

(4) When to start:

When decision makers lean about critical situation, BCP plan implementation starts. Post incident response, DRP will start.

(5) When to conclude:

When business returns to normal at the conclusion of disaster, BCP concludes. When IT systems and infrastructure return to their pre-disaster state, DRP concludes.

(6) The plan: 

When creating a business continuity plan, you should take the following general steps: 

  • Form a continuity planning team 
  • Perform a business impact analysis 
  • Design and implement your plan 
  • Train and educate your employees 
  • Regularly assess and evaluate your plan

When creating a disaster response plan, you'll likely take the following general steps: 

  • Form a disaster recovery team 
  • Identify critical functions and potential disaster risks 
  • Design and implement a disaster recovery plan 
  • Create backup procedures (in case of cyber attack) 
  • Train personnel 
  • Regularly test and maintain the plan

(7) Example of Hurricane:

In the event of a hurricane, BCP will include below actions: 

  • Alerting all stakeholders to the threat 
  • Advising employees on emergency procedures and points of contact 
  • Transitioning to alternative operations, whether that's a backup workspace or remote work 
  • Maintaining internal network infrastructure 
  • Checking in with all employees to ensure safety and administer assistance, if necessary 
  • Adjusting supply chains if vendors or partners have been affected 
  • Communicating any changes with customers and other stakeholders.

While the DRP actions will include:

  • Assisting any employees who have been directly affected by the storm 
  • Rebuilding or restoring any damaged company property 
  • Restoring or recovering any lost data or company systems 
  • Welcoming employees back into the workplace once it's safe 
  • Bringing production levels back up to normal.

Disaster Recovery (DRP) Site Types:

(i) Cold Site - 

This is the most simplistic type of disaster recovery site. A cold site consists of elements providing power, networking capability, and cooling. old sites are standby facilities large enough to handle the processing load of an organization and equipped with appropriate electrical and environmental support systems. They may be large warehouses, empty office buildings, or other similar structures. However, it does not include other hardware elements such as servers and storage. They have no computing facilities (hardware or software) preinstalled and also has no active broadband communications links. Many cold sites do have at least a few copper telephone lines, and some sites may have standby links that can be activated with minimal notification. Using a cold site is very limiting to a business since before it can be used, backup data and some additional hardware must be sent to the site and installed. 

(ii) Hot Site - 

A hot site is the exact opposite of the cold site. In this configuration, a backup facility is maintained in constant working order, with a full complement of servers, workstations, and communications links ready to assume primary operations responsibilities. The servers and workstations are all preconfigured and loaded with appropriate operating system and application software. The data on the primary site servers is periodically or continuously replicated to corresponding servers at the hot site, ensuring that the hot site has up-to-date data. Depending on the bandwidth available between the sites, hot site data may be replicated instantaneously. 

(iii) Warm Site - 

Contain all the elements of a cold site while adding additional elements, including storage hardware such as tape or disk drives, servers, and switches. Warm sites are "ready to go" in one sense, but they still need to have data transported for use in recovery should a disaster occur. They don't have instantaneous live data available.

Many organizations now turn to cloud computing as their preferred disaster recovery option. Infrastructure as a Service (IaaS) providers, such as Amazon Web Services, Microsoft Azure, and Google Compute Cloud, offer on-demand service at low cost. Companies wishing to maintain their own datacenters may choose to use these IaaS options as backup service providers. Storing ready-to-run images in cloud providers is often quite cost effective and allows the organization to avoid incurring most of the operating cost until the cloud site activates in a disaster.

What is RTO (Recovery Time Objective)?

The recovery time objective (RTO) is the maximum acceptable time that an application, computer, network, or system can be down after an unexpected disaster, failure, or comparable event takes place. RTO captures the maximum allowable time between restoration of normal service levels and resumption of typical operations and the unexpected failure or disaster. RTO defines a turning point, after which time the consequences of interruption from a disaster or failure become unacceptable. 

What is RPO (Recovery Point Objective)?

Recovery point objective (RPO) is defined as the maximum amount of data – as measured by time – that can be lost after a recovery from a disaster, failure, or comparable event before data loss will exceed what is acceptable to an organization. An RPOs determines the maximum age of the data or files in backup storage needed to be able to meet the objective specified by the RPO, should a network or computer system failure occur. 

Difference between RPO and RTO: 

Although these two terms are related, it is important to understand the difference between them. Every BCP sets forth a maximum allowable tolerance or threshold for data loss during a disruption. The recovery point objective (RPO) describes the amount of time that can pass during an event before data loss exceeds that tolerance.

Example: An outage occurs. If the RPO for this business is 12 hours and the last good copy of data available is from 10 hours ago, we are still within the RPO’s parameters for this business continuity plan. In other words, recovery point objectives of a recovery plan specify the last point in time the IT team could achieve tolerable business recovery processing given how much data will be lost during that interval. 

What is a DR drill?

Moving operations from data center site to disaster recovery site (DC to DR) for testing is called DR drill. Disaster recovery drills are used to simulate fault scenarios, formulate recovery plans, and verify whether the plans are applicable and effective. Services are not affected during disaster recovery drills. When a fault occurs, you can use the plans to quickly recover services, thus improving service continuity. 

Business continuity plans should be tested regularly to ensure that they are up to date and effective. Any new departments/units should be considered in BCP during reassessment. Where applicable, redundant information systems should be tested to ensure the failover from one components to another components work as intended.

It is recommended to have partial drill every month and full drill every 6 months.

The DRP process document:

An organization’s disaster recovery plan is one of the most important documents under the purview of security professionals. It should provide guidance to the personnel responsible for ensuring the continuity of operations in the face of disaster. The DRP document provides an orderly sequence of events designed to activate alternate processing sites while simultaneously restoring the primary site to operational status. Once you’ve successfully developed your DRP, you must train personnel on its use, ensure that you maintain accurate documentation, and conduct periodic tests to keep the plan fresh in the minds of responders.

The DRP process document mainly contain below sections:

  • Purpose of document
  • Definitions and acronyms
  • DRP process flowchart
  • Switchover approval process
  • Escalation points
  • Crisis management team details
  • Disaster declaration criteria
  • DR scenarios and use cases
  • RPO and RTO expectations
  • Infrastructure and architecture
  • Steps for DR process
  • Validation of switchover
  • DR drill calendar and trainings
  • Critical systems to take on priority 
  • Personnel and Communications plan
  • Asset details excel sheet
  • Step by step procedures 
  • Accountable persons

Please note that, an annual management review of the DR program for adequacy of resources (people, technology, facilities, and funding) for BCP and DRP must be conducted. Disaster recovery planning is critical to a comprehensive information security program. No matter how comprehensive your business continuity plan, the day may come when your business is interrupted by a disaster and you face the task of restoring operations to the primary site quickly and efficiently.

Comments

Popular posts from this blog

Work from home: an elusive oasis!

Difference Between!