Operational technology (OT) has become a heightened target for cybersecurity attacks. The need to address OT cyber risks has never been greater. New threats are emerging every day – both targeted as well as untargeted collateral damage risks. According to IBM, the manufacturing and energy sectors are now the second and third most targeted industries, respectively, increasing from eighth and ninth last year.
Why? As the famous line from Willie Sutton said when asked why he robbed banks, “Because that’s where the money is.” Operating technology is critical to keeping industrial operations running. Downtime is expensive. As a result, ransomware groups – whether private or government-supported – have discovered the financial opportunity from targeting industrial operators companies, large and small.
Industrial organizations are now fighting a war, whether they know it or not. We have coined an acronym for the coming challenges – AIR-RAID:
- Attackers are increasingly targeting industrial processes.
- IT is now both a risk as connectivity grows, and is becoming a key participant in the design and implementation of security controls in OT.
- Regulators have increased their focus and requirements on industrial organizations, e.g., the recent TSA regulations on pipelines and the coming ones in rail and aerospace.
- Resource constraints are growing not only due to COVID but the long-term trend of retirements as well as a growing number of ICS cybersecurity vulnerabilities.
- Access is on the rise as there is increasing remote access and direct connectivity to vulnerable devices.
- Insurers are increasing their reporting and security requirements for OT as they pay out greater sums in ransomware fees and incident response.
- Directors of industrial organizations are raising more questions about OT security and are beginning to require demonstrating the same level of security as they are used to in IT or other more mature security organizations.
Taken together, these seven drivers are dramatically shifting the requirements for OT security. Gone are the days when simply monitoring the perimeter firewalls for anomalous network traffic was enough. “Visibility” is just the beginning. To address the increase in attacks as well as requirements from insurers, regulators, and directors, organizations must start managing OT systems to the same level of rigor as they do IT systems – something we call OT systems management.
One of the most critical elements of this new set of OT security requirements is to manage and defend the endpoint. Organizations need endpoint security and protection to stop ransomware in its tracks, but also to demonstrate improvement and secure baselines to various stakeholders.
We recognize this is not an easy task. There are many challenges in OT endpoint risk analysis and remediation.
OT endpoint security challenges in risk identification and remediation:
- Capturing basic inventory is difficult given the various device types, protocols, network architectures, etc.
- Old systems that can’t be updated to address known vulnerabilities without significant capital expenditure to upgrade an entire control system
- Critical systems which cannot be patched (requiring reboots) for either operational risk/performance reasons or regulatory change management challenges
- Wide range of endpoints that cannot be scanned safely with traditional vulnerability scanning tools
- Many of the modern endpoint detection and response tools require internet access – which is often impossible in segmented OT networks
- Vendors tend to limit choices to only “approved” software, forcing companies to use multiple endpoint solutions, none of which are best in breed
While these challenges are real, we do not need to accept the conventional wisdom we can only monitor our asset counts and detect potential threats through anomaly detection. They do not need to delegate their OT system security to each of the dozens of original equipment manufacturer (OEM) vendors they have in their environment.
There are ways to achieve efficient OT endpoint security, protection, and overall management without disrupting control systems. Such a program has to include several key components:
Information technology service management (ITSM) best practices to leverage for OT risk management include:
- Full, accurate, and up-to-date software inventories
- Accurate patch status (not just what the OEM-vendors provide as approved or what OS-version the device is operating, but full visibility into all available patches across all application software on the endpoint)
- Updated information on Anti-virus signature status or Application Whitelisting status
- Information on whether the device has a recent backup and whether that backup was successful
- Firewall configuration strength for the network protection that is supposed to be defending the asset
- User and account status as to whether the device has shared passwords or accounts, dormant accounts, etc.
- Asset criticality both to the operational process as well as to the network communications
- Efficient tools to harden assets or network architectures with no risk to operations
- XDR to bring together multiple forms of endpoint and network telemetry to detect AND respond to threats rapidly (if not automatically).
We have seen several companies successfully take a true endpoint risk management approach to their cyber defense efforts. They have followed these steps for success:
Step 1: Create 360-degree risk scores and profiles for each asset
This process begins with technology that enables deep vendor-agnostic, endpoint visibility including 100% software inventories, full patch status on all the application software as well as OS, detailed and regular information on configuration settings, password and user/accounts, defensive tool status such as A/V, whitelisting, network configuration rules and settings to understand network defenses, and asset criticality based on process and network.
This “360-degree” view of risk allows the organization to define the most effective and efficient means of remediating risks and securing a given endpoint. For instance, we obviously cannot deploy antivirus on a programmable logic controller (PLC), but that doesn’t mean there aren’t means to protect that asset through upstream compensating controls such as locking down its workstation or establishing a firewall in front of that device or through hardening the configuration of that device to stop the spread of a potential threat. Similarly, we may find two assets that are equally vulnerable, but one has multiple compensating protective controls such as application whitelisting, hardened configurations, etc. This allows the operator to make trade-offs on priorities and actions.
Step 2: Execute remediation plans based on the feasibility of different approaches
Too often, organizations start with a tool without a robust endpoint security remediation plan. While these tools may be helpful, the remediation plan allows the organization to step through a sequenced roadmap of actions – and technologies – that drive a consistent improvement in the endpoint security management of the enterprise. Success requires a strategy that prioritizes the right type of endpoint security for each of the risks identified.
Step 3: Implement vendor-agnostic, but OT-safe endpoint security management technology
Perhaps the largest OT security challenge comes from dependence on each OEM vendor to deploy their tool of choice on its systems. This leads to complexity, insecurity, and inefficiency. Successful organizations deploy an enterprise standard for endpoint security management that safely operates across vendor systems and enables centralized management functionality. To be clear, these solutions do not try to disintermediate the OT operator.
Verve has been in the industrial controls industry for almost 30 years. We understand how critical it is to keep OT operators involved in any changes to their systems. However, by creating a centralized view of endpoint security, operators can “Think Global, but Act Local” to centralize endpoint detections, alerts, risks, etc. to a central team for analysis, response planning, etc., but – with technology – enable the OT operator that understands his or her system best, be involved in approving and perhaps testing any security response. We understand to someone in IT this may sound crazy – this extra step of including a “man in the middle” of the response action could slow response. Yes, it can. But it avoids the “Type II” error of stopping critical processes that may affect the safety of the overall system.
As stated above, insurers, regulators, directors, and others are beginning to require a clear demonstration of security improvement. Industrial operators will need to show how they have moved from “red” to “green” in security, how updated their patch or backup or AV status is, whether they have dormant accounts that create risk, etc. This kind of centralized, vendor-agnostic system allows for improved tracking, reporting, and auditing on an ongoing basis.
Step 4: XDR for OT cybersecurity
“XDR” is often thought of as pertaining to cloud or hybrid environments. Successful industrial organizations consider this same concept for OT as well. Because traditional EDR (endpoint detection and response) may not be effective on embedded devices in OT or even in purely automatic response mode on critical control systems OS-based devices, industrial security requires a wide range of telemetry and response to be effective.
The “X” may be different in OT than in the cloud. It may refer to traditional telemetry such as endpoint logs, network traffic alerts, AV alerts, etc. But in OT, it should also include device performance metrics, physical alarm data, etc. By bringing these various forms of telemetry together, the endpoint detection becomes much more robust than if we just monitor packets for anomalous traffic.
Similarly, the “R” or response in EDR needs to be tuned for OT. The answer to each alert cannot be to shut down the plant. We need to adopt a mindset we call “least disruptive response.” This is the notion that in any event, security should try to take the action which has the least impact on operations. This requires security has deep endpoint visibility discussed in Point 1 and the ability to take endpoint actions in Point 2. This enables the security personnel to identify the threat and endpoint information about that asset as well as other assets in the attack path. Then, we must take a very specific action – at the endpoint – to stop that particular attack path. For instance, remove an account that is compromised, patch a particular vulnerability that is being exploited, remove a piece of risky software, adjust whitelisting rules, etc.
5. Establish a set of OT systems management guidelines and procedures
Last – but perhaps first in many ways – industrial organizations need to set their north star, their overall objective of security, as well as their expectations of maturity. This direction can flow down into policies, guidelines, and procedures to follow in implementing their endpoint security management. Different assets are likely to require different levels of security based on criticality, redundancy, etc. We have seen clients successfully prioritize these assets at a site level and all the way down to individual assets in a plant and then design different security targets for each one. These policies also help define the kind of response time expected for the “XDR” for different types of attacks and assets.
This 5-point approach has led to significant, rapid, and demonstrable improvements in industrial organizations’ OT cybersecurity maturity. Further, it is a way to get ahead of what’s coming: increased attacks, decreased resources, and greater reporting and auditing requirements.