Current industrial automation and control system (IACS) technology uses a blend of traditional control hardware and software with commercial off-the-shelf (COTS) information technology (IT) components (see Figure 1). This allows system designers to leverage an increasing array of COTS hardware and software choices when designing, building, maintaining and upgrading control systems. However, there are challenges when integrating traditional IT hardware and software into roles for an operational technology (OT) system.
Selecting and integrating security components for an OT System is one of these challenges. Gone are the days of an air-gapped control system on proprietary control system hardware. OT-specific software is now deployed on PC/server hardware, operating systems and the networking/communication components shared with the traditional IT world (see Figure 2). Because IT and OT systems are now built on much of the same underlying infrastructure, with common components in the hardware and software used, it is tempting to treat security for OT and IT systems the same way.
However, there are distinct differences between IT and OT systems the designers and operators need to understand and appreciate to properly craft a security and system design. A lack of understanding can lead to designs that result in improper co-mingling of IT and OT systems. This can be at a rack/equipment level, virtual local area network (VLAN) level or even at a virtual machine level, among others.
The fourth industrial revolution — also called Industry 4.0 — has provided data tools and advanced analytics for optimizing tangible processes such as manufacturing. However, it does this in a way that is much more interconnected to outside systems. This further blurs the division between IT and OT systems and creates new challenges from a security design standpoint (see Figure 3).
This article contrasts important IT versus OT differences. At the same time, care will be taken to acknowledge there are IT security practices and technologies vital in designing and maintaining a secure OT system. After all, cybersecurity management is an organizational responsibility for the business as a whole, and not for IT/OT as totally separate considerations.
A formal cybersecurity management system
A cyberattack can cause major financial and business continuity disruptions and erode confidence and public trust, which are critical factors in today’s business landscape for private and public institutions. The 2020 CISA Economic Analysis of Key Per-Incident Loss Estimates shows losses from cyber-intrusion varied from as little as $16,849 to greater than $1,000,000,000 per incident.
Just as on the IT side, it is important to have a framework of formalized guidelines, policies and procedures as part of a successful OT security program. An organization-wide approach to the continuous process of assessing and addressing risks is part of the definition of cybersecurity management systems (CSMSs) in ISA/IEC 62443. It is not a simple to-do list, but rather a process involving multiple parts of the organization such as legal, procurement, human resources, engineering, operations and more. Cybersecurity management requires top leadership support for finance and organizational commitment. Vendors and contractors often are also critical parts of a successful CSMS implementation.
As IT and OT system objectives differ, so do the risks and results of a security breach. In an IT system breach, organizations usually risk compromised data or a loss of services. OT cyberattack consequences go beyond data loss, potentially endangering bystanders, harming equipment, affecting the environment and impacting consumers through unregulated finished products and services. Given that the role of the IT and OT systems are different, and the risks are different, there should be distinct difference in the overall CSMS (including policies and procedures) between IT and OT systems.
IT versus OT data security priorities: The CIA triad
Threat remediation in the IT world is almost ubiquitously to shut everything down and disconnect it. OT systems require a finer juggling act between isolating infected devices and strategically disabling certain network routes to maintain at least some system operation. Data security priorities are often broken out into three main considerations: confidentiality, integrity and availability (CIA triad). Figure 4 identifies how the IT and OT domains generally prioritize these characteristics.
In an IT environment, confidentiality is the top priority. This affects architecture, policy, procedure, testing and maintenance efforts on the system. Availability is important, but if, for instance, an intrusion prevention system (IPS) proactively shuts down a core service because of a perceived threat, the goal of maintaining confidentiality is acted on at the expense of availability. The situation is different for the OT environment, where if IPS stops network traffic in an IACS, the top OT priority of availability is most likely degraded. In this case the action of an IPS can be a significant danger.
The differences in data security priorities need to cascade into every part of OT system architecture design, management policy and operational procedures.
Detection dilemmas, update uncertainties
Much of the focus for IT threat detection and network traffic characterization is less applicable for OT systems than for IT systems because email, web browsing and other variable traffic are not common in OT systems. Detecting anomalies in an OT system — which relies on very well understood and predictable traffic patterns — is much less of a challenge than its equivalent in a dynamic IT system. OT networks tend to be good candidates for intrusion detection system (IDS) solutions warning users of trouble, but do not take action like an IPS. An IDS is appropriate for OT systems because of the comparatively consistent character of the network traffic and protocols used, and the static nature of the assets attached to the network benefit that end.
Implementing an IPS instead of just an IDS requires additional consideration for the OT side than on the IT side. Consider a traffic anomaly that does not represent a malicious attack but results in an action by an IPS to shut down specific traffic. What if the traffic anomaly is an unusual message on a safety system? It may make sense to shut down questionable traffic on an IT network but having that as a default reaction on an OT system can result in real-world impacts.
Patch and software updates also create challenges. A patch represents a change to an operational system. Every change can represent a risk to system availability. Even within windows of planned downtime, patching can pose a longer-term risk to production systems. Since high availability is the linchpin for OT systems, care must be taken when patching systems. Some patches may not be able to be installed if they are not compatible with hardware or the IACS vendor software (see Figure 5).
Testing must be performed before applying a software update or patch to ensure all equipment and systems function normally with the update in place. Without adequate testing, it is possible to miss the impact of a service malfunctioning because of an update. This may be a serious yet difficult to detect problem such as the loss of alarm annunciation or improper historization of time-series data for compliance reporting (see Figure 6).
Update testing is best performed in a “sandbox” test environment. If that’s not an option, there are other avenues to manage the risk, but it really depends on the specifics of the system. For example, with the advantage of virtualization and operating system rollbacks, paired with system redundancy, it’s possible to create a sandbox-like environment from which to test the new update deployment. This must be performed by highly-qualified, experienced professionals who know the limitations of the system and the software. A plan to validate updates is better than no risk mitigation at all, and marching into “Patch Tuesday” with no testing or rollback capability is asking for trouble.
Standards-based security design and framework
It is important to establish organizational policies and procedures and select a framework for network infrastructure and security. There are several different frameworks available such as the ISA/IEC 62443 set of standards and the NIST Framework for Improving Critical Infrastructure Cybersecurity. Some industries have standards, mandates or guidelines for security policy as well. Businesses should apply these policies across the organization, and they must take steps to modify, remove and add infrastructure in support of their policies.
With a security framework and standards established, system design has a philosophy for its cyber-defense and policies adopted from frameworks acting as guidance and support. There are many other design steps requiring cybersecurity consideration, such as development of a proper zone and conduit model, but the result should be defensible by well-established and industry recognized standards.
Network hardening unlocks operations and business benefits
To create and upgrade automation systems with suitable cybersecurity provisions, the team of designers, engineers and security professionals must establish a unified management approach that recognizes the differences between OT and IT networks. This approach must cover the entire system lifecycle from design, system integration and long-term operations and maintenance.
Beginning down the path to secure OT systems is far from easy, but it is necessary for long-term organizational success. To help navigate the effort, manufacturers and agencies should partner with system integrators experienced in cybersecurity design and implementation best practices (see Figure 7).
Because the OT threat landscape is constantly changing — especially in today’s connected production systems — network hardening is a continuous process without a destination, nor is there a single perfect solution. But the benefits are worth the endeavor. Defined processes and resiliency plans, and OT cyber-physical protection mechanisms, empower organizations to enjoy safer remote connectivity, cloud-based analytics engines for operations optimization, always-on alerting capabilities and other advantages.