Striving for Operational Resilience

After completing this reading, you should be able to:

  • Compare operational resilience to traditional business continuity and disaster recovery approaches.
  • Describe elements of an effective operational resilience framework and its potential benefits.

Comparison of Operational Resilience with the Traditional Business Continuity (BC) and their Approaches to Disaster Recovery (DR).

Operational Resilience

Rico Brandenburg et al. defines operational resilience as the ability of an organization to continue providing business services in the event of adverse operational events by anticipating, preventing, recovering from, and adapting to such events.

Operational Resilience Versus Traditional Business Continuity (BC) and their Disaster Recovery Approaches

The following are the characteristics of both operational resilience and the traditional BC-DR approach divided into several categories.

Governance

Operational resilience approach

  • The roles, responsibilities, and accountability of the board and senior managers are clearly defined.
  • Resilience is incorporated into risk appetite statements and metrics across operational risk types.
  • Comprehensive and actionable reporting acts as an impetus to continuous improvement.

Traditional approach (BC/DR)

  • The board and the senior management’s roles are limited to post-event response
  • Unlike the operational resilience approach, resilience is not an explicit consideration in risk appetite statements and metrics under the traditional approach.
  • There are no continuous reports as updates are given only after exercise.

Organizational Focus

Operational resilience approach

  • The focus is on end-to-end vital business services.
  • It considers the broader economic impact of disruption, in addition to the firm-specific impact.

Traditional approach (BC/DR)

  • The traditional approach focuses on individual business units or specific technology assets
  • This approach considers the firm-specific impact of disruption

Integration

Operational resilience approach

  • There is a comprehensive view of the dependencies of critical business services on the value chain, including systems, data, people, among others.
  • Resilience considerations are put in place in the upfront design of business services and organizational assets.

Traditional approach (BC/DR)

  • The view of dependencies is mostly limited to the business unit or directly linked technology assets.
  • Continuity and recovery capabilities are put in place to satisfy requirements.

Measurement

Operational resilience approach

  • Stress scenarios related to the business are tailored to each critical service based on an aligned and forward-looking risk assessment.
  • Impact tolerances are embedded based on bespoke scenarios.

Traditional approach (BC/DR)

  • They focus on a standard set of disruption scenarios in all business units.
  • There are standard impact tolerances for all scenarios; that is, recovery time/point objectives.

Preparedness

Operational resilience approach

  • All incident types get the same incident response regime; that is, there is a unified incident command for all the incident types.
  • Resilience plans and capabilities are continuously monitored, tested, and adapted.
  • There is more emphasis on ensuring trust among crisis management team for effective response.

Traditional approach (BC/DR)

  • Plans and capabilities are tested infrequently.
  • The incident response regimes are distinct for different incident types; this negatively impacts response times.
  • There is minimal attention paid to the dynamics of the crisis management team.

Disadvantages of the Traditional BC and DR Approaches:

  • Although firms have many dependencies for service delivery, traditional BC and DC approaches focus on assets in siloes ignoring critical components of the end-to-end service delivery.
  • Traditional BC and DC approaches are reactive rather than proactive, and thus they are focused on recovery, which makes firms slow to adapt.
  • Traditional approaches focus on a standard set of disruption scenarios. This gives a firm a false sense of comfort that is prepared for all scenarios.

Building a clearly defined operational resilience function requires a firm to have the following prerequisites for achieving increased resilience:

  1. Defined governance: All the functions should have clear roles and responsibilities assigned across the organization.
  2. Ensured oversight: The operational resilience function should be in the center of the firm’s front-end and its control functions. It should define frameworks that are vital for achieving resilience. A central function should be set to oversee the key processes and ensure that they are not managed in the traditional “siloed” way.
  3. Structured communication across functions: In the traditional business continuity (BC) and disaster recovery (DR) approaches, processes are managed in silos in such a way that different functions work on their own, and there is no uniform communication. The operational resilience function must ensure that there is efficient communication among the different functions and ensure that the information is not distorted. Additionally, the function must ensure that the information reaches the right stakeholders in good time during periods of stress.

Elements of an Effective Operational Resilience Framework and its Potential Importance

The following are the elements of an operational resilience framework:

  1. Digital resilience: This ensures that the firm’s digital processes and its systems in the value chain key products and services are running. Newly introduced systems do not disrupt these vital technologies, and thus, digital resilience can act as a driver behind the digitalization of processes with the successive digitalization of systems. In this era of digital transformation, digital resilience is key to maintaining a competitive advantage.
  2. Data resilience: Most firms today are data intensive. If a firm wants to be resilient, it is essential to ensure the quality and availability of data. Data resilience also involves knowing what data is critical to provide the key services and products to the client. Data inventories and data flows become indispensable. To achieve data-resilience, an organization needs to have a fully functional chief data officer (CDO) who is responsible for ensuring completeness, accuracy, validity, and restricted access of the data supporting the firm’s critical processes.
  3. Sourcing and external dependencies: These ensure that vital services to the clients are maintained. Understanding these dependencies is very important as a firm is required to manage the respective risks appropriately.
  4. Cyber resilience: With the current increase in cyber-attacks, there is a need for the firm to put in place a cyber resilience mechanism to help prevent, detect, respond to, and recover from cyber-related threats. Cyber resilience contributes to the recovery capabilities of the firm in case of such attacks.
  5. Incident management: There is a need to put in place incident response processes that will be responsible for identifying, classifying, and helping ensure appropriate, measured responses to aid in recovering from any stress with the minimum impact on clients.
  6. Business strategy: The business or corporate strategy defines the commercial objectives and social license to operate. It identifies the most important processes of the business and drives strategic decisions on operational resilience investments and risk appetite levels.
  7. Crisis management: Firms should always try to anticipate risks and stress scenarios to avoid them. Sometimes, however, crises happen due to unexpected events; in this case, it is essential to ensure proper management of the situation to restore business operations in the shortest period possible.
  8. Operational risk framework: This is used to understand, classify, and map business risks across the value chain of the core products and services. The firm needs to understand and quantify the exposure of its key processes with key risk indicators to remediate resilience gaps as well as interdependencies.
  9. Appropriate level of sponsorship: The executive management and the board should be on board to invest in facilitating the achievement of strong operational resilience.
  10. Dedicated function: The organization should look for expertise and knowledge from all the elements of operational resilience; this includes processes, people, technology, premises, data, and third parties. Operational resilience function can be structured to report to different existing functions of an organization depending on its culture. A shift in the market from a model with operational resilience built into a different line of defense should be considered. The three models are explained below:

    Model A: Direct reporting line to the Chief Executive Officer (CEO):

    This is the most appropriate choice when the firm is considering the strategic importance of its operational resilience. If the focus is on the products and services, and there is a need to ensure that operational resilience becomes an essential part of the firm’s strategy, then this will be the appropriate governance decision for the firm.

    Model B: Direct reporting line to the Chief Operations Officer (COO):

    Operational resilience is directly tied to an organization’s operations management. The COO knows best how the firm’s systems, processes, and people interact in the right way to offer the products and services to the clients. If the operational resilience function is positioned in this, then the right knowledge on how the firm works is readily available for use.

    Model C: Direct reporting line to the Chief Risk Officer (CRO):

    Operational resilience is all about the readiness of the firm to face disruptions caused by changes in the business environment. Thus, it is intimately interconnected with the risk management function. Connections exist naturally between any risk management framework and the operational resilience frameworks. Therefore, if the firm’s focus is more on managing risks, this would be the best option.

  11. Dedicated Program: There should be a transparent governance and deliverables to drive operational resilience holistically.
  12. Communication and change of management: A well-governed and documented change processes that are in place and are fully understood by the organization ensure that resilience is embedded in change control and software development life cycle (SDLC) activity. Moreover, there must be a clear internal communication about why operational resilience is required, what should be done, and how to do it.
  13. Communication and change of management: A well-governed and documented change processes that are in place and are fully understood by the organization ensure that resilience is embedded in change control and software development life cycle (SDLC) activity. Moreover, there must be a clear internal communication about why operational resilience is required, what should be done, and how to do it.
  14. Physical security and facility management: An organization must ensure that its key locations and facilities are safeguarded from any external events. In the case of such events, actionable measures are established to ensure the availability and usability of any of those locations and facilities are not compromised.

Benefits of Operational Resilience

Organizations that establish an effective operational resilience program realize the following benefits of better resilience:

  • There is low exposure to risk: This can be attributed to improved visibility into risks, effective monitoring, a proactive approach to controls, as well as the ability to deliver services even after a stress scenario.
  • Better focus by the organization; based on lessons from continuous testing and monitoring engaged by operational resilience (e.g., identification of critical business services), a firm is driven to investing in the critical areas.
  • Support of the innovation agenda: The firm can carry out faster innovation cycles without compromising on risk management; this is made possible by ensuring that the firm is adaptable and resilient.
  • The firm is more effective and efficient: There is a clear understanding of critical service delivery that can help to reduce costs, e.g., by optimizing outsourcing relationships. Introduction of streamline processes, e.g., introduce tools and automation, and enhanced efficacy, e.g., identification and remediation of steps that are problem causing.

Primary Drivers of Firms’ Move towards Operational Resilience

Many organizations are moving towards operational resilience. The following are five primary drivers behind this trend:

Higher customer expectations: In most industries, customers expect to get services on a 24/7 basis. Customer expectations include sturdy delivery of services and responsiveness in the event of stressful scenarios. This also helps to build and cement the trust between organizations and customers.

Increased cyber threats: Organizations have widely benefited from technological innovations. However, technology has made it possible for masterminds to create cheap but effective cyber weapons; the use of these weapons has unpredictable consequences.

Severe natural disasters and extreme weather events: Climate change is associated with extreme natural events. If either an organization or an organization’s clients are global, then it is most likely going to be affected by such events. This raises the question of whether the firms are prepared for their clients under such conditions.

Higher risk linked to internal change failures: Firms are often switching to more advanced systems as technology advances. This brings about changes that need to be responded to. More advanced systems would lead to an elevated risk and relative potential impact, either financial or operational, in case of internal change failure.

Increased regulatory scrutiny: The 2008-2009 financial crisis, has resulted in the financial services sector evolving into a highly regulated landscape, to protect the client. A firm that is not operational resilient may find itself unable to respond accordingly to the demanding regulatory landscape.

Operational resilience is more than just cyber resilience and IT infrastructure. It is an impact on the organization’s Profit and Loss account. Companies tend to focus only on the IT aspect of resilience, disregarding an equally important component; the processes and people that are essential in the delivery of the final product or service.

Operational resilience should entail all the areas of a business to achieve success. Failing to include all the processes and people would result in a fragmented approach that is limited to one or just a few specific functions. This is not sufficient to support the final goal of having resilience covering a complete value chain.

Investment in resilience can positively contribute to a firm’s Profit and Loss account. It improves customers’ trust by ensuring it is always operational even in times of difficulties posed by environmental changes. The positive change in the profit and loss account can be attributed to the following:

Positive impact on costs: Operational resilience leads to avoidance of unexpected financial losses resulting from adverse events.

Increased revenue: This results from serving the client relentlessly. Also, the increase in stakeholders’ and investors’ trust can bring in even more financial investments and opportunities.

Void potential fees and losses: By responding effectively and efficiently to regulations, a resilient firm avoids potential fees and even losses.

Resilience is evidently playing an essential part in improving the financial performance of firms. Financial service institutions are not the only ones relying on old and siloed systems that result in limited resilience capabilities. Many firms are facing big or small resilience problems.

Steps to Achieving Resilience

Know your clients: The first step in building resilience is to identify the products and services that are essential to the clients. However, before that can be judged, there is a more important question to ask: who are the clients of the organization and what do they need?

Identify the products and services that are essential to the clients: Once the key products and services have been identified, the focus should be on the value chain that created them. The key processes leading to that output is identified. All the products in complex organizations are as a result of several processes and interactions. The key processes are the ones that impact the success of output or the organization. They ensure the competitiveness of a firm.

Identify the major processes and staff linked to the core business and identify dependencies either existing or in the design phase: After identifying the key products and services, the focus now is on creating them. But first, there is a need to identify the key processes leading to their output.

Identify digital dependencies: As a result of the reverse-engineering, the list of IT systems and dependencies that are part of the value chain of the products and services is obtained. Questions asked to include:

  • What software is used?
  • What networks are leveraged?
  • What is the digital infrastructure utilized?
  • What databases are used?
  • Where are the servers located?

The number of questions in this regard are quite varied and cover every step that a product passes through before it is delivered to a client.

Map third-party dependencies: Processes in the value chain of key products and services have already been identified in the second step. Now, after identifying the digital dependencies, it is essential to know the critical third parties. It is also important to understand all the interdependencies with other processes. This point should be investigated t in detail to ensure that all the third parties involved in all internal functions providing services are also identified.

Define possible threat scenarios: In this stage, there is a need to identify which services and products need to be maintained in stress conditions and, more importantly, the key processes/staff/IT systems/third parties that deliver or help deliver those products and services. An important question in this stage would be, what can go wrong with the identified value chain components? It is necessary to identify potential risk scenarios that impact the entire value chain, rather than single, isolated events.

Map risks to the value chain: At this stage, all the risks should be linked to the value chain of the essential products and services. All the interdependencies of the threats and risks to the value chain should be considered when defining mitigation measures. In order to implement proper mitigation techniques, it is essential to identify all of the risks and threats before their occurrence.

Learn from the past: There is a need to ensure that the lessons learned in previous crisis management are used to define better strategies and measures for the key processes and infrastructure. Once the firm is hit by a stress event that they didn’t anticipate, the event should be added to threats and risks to be anticipated in the future. Measures to come out of this event should also be put in place for future use.

Monitor your risk exposure: All big organizations should have key risk indicators in place. The organization’s risk exposure remains relatively low when it comes to risk management. In the context of operational resilience, however, the organization should always ensure that the processes and systems in the value chain for their key products and services are always working. Indications showing failure of provision of services in an efficient and continuous manner that is not captured in the firm’s key risk indicators is a signal that there is a need to reconsider the indicators. Questions that arise are whether the indicators are measuring the right parameters, whether the correct thresholds are applied. The key risk indicators should reflect the exposure and weaknesses of resilience capabilities, which include both proactive and reactive measures. The firm should strive to understand and quantify the exposure of the critical processes as well as effectively monitoring.

Framework for Operational Resilience

A firm needs to put in place the following pillars to achieve operational resilience. They also define the level of maturity of a firm’s operational resilience.

  1. Define the framework to achieve operational resilience: The framework has to be current, communicated, and understood by the organization. It is in place across the entire organization, with a clear definition and accountability for the different aspects of resilience.
  2. Embed operational resilience in the governance structure: The Board and senior management to actively oversee the firm’s resilience framework with respect to the firm’s strategy and risk appetite; this empowers them to make the correct investment and risk decisions.
  3. Ensure effective capacity management: Through testing and monitoring, organizations can demonstrate the effectiveness of capacity management.
  4. Strengthen the management of own risks: Resilience calls for adequate management of risks to reduce an impact to customers in case of a stress scenario. Managing risks of a stress scenario entails splitting responsibilities into separate lines of defense and ensuring that all lines contain elements of resilience.
  5. Enhance resilience capability and agility: The organization has sufficient skills, resources, agility, and a clear understanding of roles and responsibilities to deliver and help guarantee operational resilience.
  6. Promote a culture of continuous learning and improving: The organization should not only anticipate, but to also learn from adverse events affecting the organization, or the industry at large.

X