The Colonial Pipeline downtime created by the cybersecurity attack on the operators’ information technology (IT) systems underscores the need for resilient, redundant and responsible processes that enable operations to persist in the face of digital disruptions. The Colonial Pipeline cyberattack serves as a reminder that preparedness for the unexpected requires review of technology and processes and is essential to mitigate the risks of data loss, operations disruption, and loss of critical infrastructure.
Background on Colonial Pipeline cyberattack
On May 6, 2021, the company that operates the Colonial Pipeline suffered an email phishing attack that resulted in a ransomware incident on roughly 100 GB of extracted data through a customer portal. The resulting impact to operations has been to temporarily shut down the pipeline. The cyberattack on the Colonial Pipeline underscores the vulnerability of critical infrastructure in the United States and the need for resilient business processes and systems. Ransomware attacks on critical infrastructure have increased more than fivefold since 2018.
The Colonial Pipeline runs across the southeast to the northeast of the country carrying refined petroleum products. It is considered the jugular of the nation’s refined petroleum transportation infrastructure. The company’s operational technology (OT) infrastructure had implemented controlled isolation from its IT infrastructure. Yet operational processes and IT transaction systems remain critically entangled by way of business process. Many states in the southeast were beginning to see fuel shortages after seven days. Fortunately, Colonial resumed operations. In the meantime, the national average price for gasoline has reached the highest level since 2014.
The Colonial Pipeline cyberattack was attributed to the Russia-based cybercriminal group Darkside by the FBI and other authorities. The group stole more than 100 GB of data to leverage a ransom demand from the company and has claimed over 40 other victims in the last nine months. In its own code of ethics, the group claims to be only interested in and making money and is not politically motivated. It was revealed by industry sources that Colonial has paid a ransom of $5 million to decrypt the systems and signed FireEye Mandiant to lead the investigation.
Four lessons from the attack
The news about the Colonial Pipeline cyberattack isn’t about the ransomware attack. Ransomware can happen to any organization, big or small, industrial or consumer. The news illustrates the importance of business resilience that extends beyond technology. Technology should enable and support resilience in processes. Process-based fail safes and redundancies must be accounted for in risk assessment. While isolation of IT and OT systems protected the pipeline infrastructure from direct threat, the interdependencies and lack of resiliency in the process resulted in disruption to operations. There are a few key learnings from the incident as it relates to the need for resilience as a core business strategy:
1. The ransomware attack was driven through the enterprise business systems, not through operations. The attack was not an attack on the operational technology (OT) systems which control the pipeline nor were they directly compromised. Still, process interdependencies resulted in the pipeline being shut down. It is important for pipeline operators in the United States to understand all of their cybersecurity risks, not just those related to operational systems, to be resilient. Many companies tend to view cybersecurity and resilience measure as a cost center that affects the bottom line. Yet the risk of disruption to operations threatens revenue in addition to the public health and safety impacts. To protect against attacks, the Federal Energy Regulatory Commission (FERC) has mandatory standards for grid operators, however, does not have the same comparable standard for the network of pipelines. It is up to each organization to frame cybersecurity as a matter of business resilience and think holistically about processes and resultant impact.
2. Colonial’s OT infrastructure had controlled isolation from its IT infrastructure. This has been the modus operandi for operations for many years. The notion that IT/OT convergence requires seamless, two-way communication is a fallacy. The incident highlights the efficacy of network isolation and raises the issue of implementing network segmentation on the operations network in case future incidents occur on the OT network. Salient data can be exchanged without compromising operations, and resiliency in the network means being capable of mitigating spread and impact where a network is penetrated by bad actors.
3. Operations was still capable of delivering, but was starved of orders due to business systems being cut off. Those who spin this incident as a lesson in OT security are doing a dangerous disservice to the reality of the breach. The reality of OT cybersecurity is the biggest risks come from human vectors. In this case, it was an employee falling victim to a phishing effort. Nevertheless, the attack was a call to action for pipeline operators and other managers of critical infrastructure to understand the consequences of all of their cybersecurity risks, regardless of whether they are directed at OT systems or only IT systems.
4. Within two days, Colonial was already partially operating through manual intervention. The decision to shut down the pipeline was taken by the operator itself out of an abundance of caution and could have been overridden if there was indeed a national fuel emergency. The broader fuel supply system responded effectively through the mobilization of ships and trucks. Much of the resiliency of the response was not dependent on technology, an important reminder of how we must adapt in a world where we cannot depend on digital systems.
Not only is this a reinforcement of the discipline needed in IT security, but a lesson in resilience and ecosystems. Having all ecosystem partners aligned and moving in the same direction to make a potential national economic disaster into a weeklong disruption shows resilience. Even though a weeklong disruption pales in comparison to what could happen, a better and more secure decision-making process might have made this a day-long event. Lack of accurate and timely information is the archenemy of resilience. Looking to the future of operations in the industrial world means preparing for the worst while hoping for the best. And a resilient decision-making process that brings all the data, knowledge, and information together allows any potential disruption to be pre-empted or addressed as quickly as possible.