With the recent deluge of ransomware event articles discussing risks, likelihood, payment options and proposed solutions, it’s a good idea to take a step back to see where one stands with regard to preparedness, response and recovery.
If someone has had a risk or vulnerability assessment in the last several years, the organization was likely advised to take steps to help prevent and prepare for a large-scale malware or ransomware event. Managing cyber risk or being “prepared” is much more than writing documents or installing technology. Fundamentally, it is the result of operationalizing all activity that involves people, processes or technology (PPT). This equates to effective risk management and reduced impact should an event occur. For example:
Imagine, Monday morning as the plant begins to execute a startup after a weekend shutdown, a flurry of tech support tickets and escalations begins to stack up. According to the last audit, there is an ad hoc process to restore a backup and recover from a single system failure, but this seems out of the ordinary, and someone intuitively begins to suspect the worst. What now?
Someone wonders: Are my backups any good? How does someone stop the spread? How does someone plug the holes and get control of assets and their users? What are the assets? Who does someone need to call? Who will help someone resolve the issue? There are too many questions and too little time.
The awareness and training aspects of being prepared can be overwhelming, especially to get the closest simulation, but for an initial smoke test, a rudimentary skit can be devised to illustrate gaps in an organization’s processes, resources, training and even technology. Tabletop exercises do not need to be “hacker” orientated, don’t require elaborate props or expensive third-party trainers and platforms, and needn’t be limited to just the security team. With a little time and effort, they can be made effective and accessible to a wider audience of stakeholders.
Executing a low-cost ransomware event or cyber event tabletop (TTX) or paper-based training has several benefits:
- It raises awareness within the organization about the current state of maturity and incident/event preparedness.
- It satisfies a compliance or framework check box.
- Such training offers a low-cost, high-reward way to illuminate gaps that could threaten the organization’s overall event response.
- The exercises can be devised internally by individuals who understand how the facilities actually run.
- It brings all parties to the table and settles disputes over who owns what.
- Communications driven by the tabletops often facilitate organizational change and foster improved interdomain trust.
Creating technical simulations from real data such as the S4 ICS Detection Challenges, the principal components in creating a skit can be simplified when a straightforward event needs to be explored:
- Frame: Devising a relevant scenario that could affect the organization.
- Composure: Describing the scenario playing out using the organization’s systems, processes, technology, personnel and the attack itself. This is the largest component and multiple pathways for the attack/response should be considered.
- Implement: Pulling all of the pieces together based on the “frame” and composure elements means you will need to have scripts, supporting material, roles/responsibilities, processes/playbooks and everything aligned to represent the realities of the organization and ransomware event. This can be simulated standalone without technologies or console access, for example.
- Execute: Running the event includes scheduling required resources, facilitating the event, distributing material and recording results and observations.
- Next steps: Summarizing all of the events’ learnings and acting on any identified gaps is a critical component of the tabletop exercise.
Using those phases, let’s start by creating and facilitating awareness events that include technical and non-technical participants.
Framing the ransomware event
This step entails the creation of a summary scenario that outlines the whole exercise. This can be crafted by an individual or by a team with relevant understanding of the critical functions of the organization and their overall technology and security posture. The framing is an outline that describes an initial hypothesis and activities for the event within scope. For example, a frame may be built using the following elements:
- Survey “environment” high-level data (similar to assets that are in scope for red-teaming)
- Research the domain and the business/site itself (generate context)
- Determine high-value targets and end-games that would define a “bad day” at the organization.
- Describe the kind of noise (extraneous details) that would likely be present.
- Evaluate level of anonymization required (estimate sensitivity).
- Deep-anonymizing and what that would appear to be (optional to some extent).
- Construct a compelling and realistic scenario (playwright scenario).
- Describe a high-level attack in one sentence based on the data, company details and attack vector.
- Define the objective of the exercise.
- Consider judging, event direction and outputs in the context of stakeholders, internal audit requirements and the inputs and desires of management.
If people can flesh out all of these areas, they’re on their way to creating an informative exercise. To illustrate framing with an example: A large-scale asset owner (MarineCo) operating a maritime port.
Risk manager (RM) from MarineCo has been watching the news and heard about the Maersk ransomware event. RM’s company is a $100M company with profits tightly correlated to the organization running smoothly. Any disruption to product moving in or out of the facility has a string impact on both the company’s bottom line and on the local economy. The RM knows that the team is aware of this risk from several audit findings, but he wants to know if the others in the organization are prepared for a massive outage that would likely occur in the wake of a ransomware attack. The RM is also aware that many MarineCo systems run on antiquated software, leverage end-of-life operating systems and suffer from subpar network and user management.
The RM frames the incident using these statements:
- “A malicious party enters ABC site through common attack vectors in the business network. The attacker then moves toward XYZ critical system as the target for ransomware with the goal of disabling operations by disrupting an OT process hosted on IT infrastructure.”
- “Without operations, cargo cannot be moved, transported, loaded or unloaded, and the organization will burn $123 dollars per hour until the situation is resolved.”
- “The systems affected would be ABC and XYZ. These reside here, and they are supported by these particular groups who are reported to follow and have a variety of processes at hand.”
Then the RM arrives at this scenario:
“If the organization faced an aggressive ransomware incident that was spreading quickly from a vulnerable and compromised system, could we manage it as I’ve been promised, communicate to our customers during a disruption, and recover efficiently – even to a degraded state?”
The RM would then move to the next phase: The scenario.
Composing the ransomware event scenario
Beginning with the framing materials, people need to scope out the scenario much in the way an author or playwright defines their story. First, people need to know:
- Where would a plausible compromise originate from? IT systems used for accounts receivables or order confirmations, likely through an email infrastructure or a compromised VPN account. The facility or organization is not particularly special but would certainly face a similar risk exposure profile to other organizations that have been ransomed already.
- How would an incident occur in order to establish likelihood and a common footing to base the event? These systems are generally multi-purpose, users watch YouTube and open a variety of emails and corporate attachments. Users of these systems have a fairly high phishing fail rate during anonymous awareness testing campaigns. Certainly these systems are under corporate control, but their policies and controls are not concrete due to union or personnel policy complications. Isolation and eradication would be difficult, and an attack here would be likely. Given that the plant network is often improperly segmented due to its age and design, malware can spread into OT quite easily. Once into OT, if compromised logistics servers were targeted or even the AD servers, operations would likely grind to a halt.
- Who would be the participants and what roles they would play in the detection, identification, response and remediation of the event? E.g., analyst, local operator, local administrator, facility manager, corporate security manager, technical, director, executive, legal, etc.
- What supporting systems, infrastructure and evidence would be used in the scenario? At a minimum, something that resembles screenshots and data from an email client, Windows workstations, SIEM/SOC services, corporate servers for IT or OT, VPN/local user accounts, AD servers, networking infrastructure and related logs, backup servers, asset inventory information, etc. Company processes, procedures, incident/response playbooks and best practices should also be on hand.
- And ultimately, why do this? If the goal is to test processes, knowledge and OTSM maturity, then the story needs to align.
People need to draft the scenario in a play-by-play manner. It can be linear or multi-pathed, similar to a choose-your-own-adventure book. The simplest of the two options is a linear storyline, but often reality likes to add its own dose of surprises, so it’s best to have multiple paths considered and a few predefined complications to add at times.
- Analyst A in the MarineCo’s SOC has had a busy day. She sees the alert and decides to close the ticket because it looks like business as usual. The malware continues to spread.
- Analyst A just started their shift and has “fresh” eyes. She recognizes the alert as one that requires investigation and decides to escalate once a convergence of alarms is observed. This limits infection to just a handful of systems.
The overall event of course needs to be eventually scripted (this is in the implementation phase), but during the composure phase, an approach that looks similar to the table of contents for a technical manual might suffice:
- Chapter 1 – Prepping the event
- Outline the background, roles, responsibilities and scenario.
- Chapter 2 – Begin the event by starting with detection and identification.
- Walk through the initial event with “evidence”
- Begin the infection story starting with the ticket and assign Analyst A role.
- … continue script
- Chapter 3 – Response
- Plant manager is made aware and coordinates activities to ensure safe operation under a watch condition
- Response teams review evidence that points to a few predefined entry points and malware based on predefined evidence/props (e.g., screenshots and logs)
- Response teams quickly attempt to isolate those systems, but one of the wrenches is thrown into the mix.
- The response team is not affected
- The response team is negatively affected, and the incident continues to escalate to C-suite roles
- Insurance company does not pay the ransom
- Response team escalates to a high severity event:
- C-suite is notified
- Legal comes into play
- Media and communications need to be drafted
- Entire network goes down, email and messaging included
- IT shuts down systems critical to OT or makes changes
- Consequences of those changes or loss of connectivity without proper OT support increase delays
- Losses grow by the minute
- Script continues
- Chapter 4 – Recovery
- Assuming the response teams were able to manage the response and isolation
- Suddenly a wrench is thrown into the mix: not enough bandwidth, failed backup/RAID, missing credentials, malware re-emerges spread, etc
- Teams watch their watches, and the media needs to be informed
- Script continues
- Chapter 5 – Remediation
The idea is to ensure that processes that are in scope will be enacted at some point, all roles are impacted, escalation paths are noted and even fringe activities are covered during the envisioned scenario. For a first attempt, it’s best to keep things simple and have a manageable group of perhaps seven or eight individuals.
Implementing the ransomware event scenario
Once the scenario has been outlined, it’s time to implement the exercise in its entirety. In addition to being the playwright, that person will also be serving as producer, director, prop master and observer. The script may cover all of the roles, but it is the responsibility of the producer to set up the scenario with sufficient context to capture the audience’s attention or deliver a message. Whatever the effect, a good scenario needs believable data.
Implementing a scenario has a time factor that cannot be understated. It’s one thing to find enough time to get everyone in the room, another to cross all scenarios or gaps, and yet another to raise awareness for gaps early on while everyone involved is intently focused on the task at hand.
It’s great to test the processes end to end and have varying amounts of realistic data, but if someone cannot implement a plausible scenario within a reasonable time frame, the impact of such a training exercise will be limited.
As an example, during the S4x19 S4 ICS Detection Challenge, we created a gigantic set of data under strict NDA to mimic a large faux mining facility located in Eastern Canada. The attack had plenty of noise, some real attacks and an endgame. The data set was over 130GB in network traffic and the participants were well versed in OT cybersecurity. But, despite the lead time for the participants, the main objective of the attackers was missed.
My goal was to see where the participants and their tools would fall short. I wanted to challenge their confidence in their tools and chase rabbits through a labyrinth because during most incidents, defenders have too much data, not enough time or are generally dealing with the consequences after the fact. To be fair, all parties fared reasonably well, but the point was to explain that even with the greatest tools and minds, the limited detection surface would have only been a single piece of the puzzle.
In the MarineCo scenario, people need to keep in mind all of the pieces. There will likely be conversations, processes to be found, responsibilities assigned and challenges to be added. But the event needs to be flexible enough for reuse, if possible, and completable within a single sitting.
The final script needs to contain all the elements with believable roles, relevant screenshots or simulated tooling, props and organization artifacts handy, OT facts such as shared passwords or other common behaviors and clear start and endpoints.
Executing the ransomware event tabletop exercise
Now that there is a frame, composed elements and the implemented attack all in one, the next step is to execute. Being in the same room helps establish trust and bring light to groups that often do not interact. Sometimes, it’s beneficial to ask them to switch sides.
Regardless, the execution phase is primarily about:
- Getting the right people in the room and assigned to their appropriate roles
- Distributing the props, evidence and supporting materials at the correct times. This should include referencing OT asset inventories for the specific sites in question.
- Mediating and facilitating the scenario sufficiently to keep the execution of the script smooth.
- Observing and recording times, responses and questions from participants, especially when the complications are introduced.
In addition to the execution phase, a very important piece to keep in mind is the role of the mediator and also any recordings. There may be organizational policies and concerns either for privacy or other situational factors such as sensitive data.
Regardless of any of the actions, frustrations or even the observations, it is important to consider exercises such as simulating a wide-spread ransomware attack as a training tool. It is not a “finger-pointing” exercise or criteria for someone to be disciplined or removed. Rather, it should be viewed as guidance to help individuals, groups and the company at large fare better during an incident.
Summarizing and acting upon any observations from the exercise
The last piece of an exercise involves the collection of insights into how the organization is performing in terms of preparedness. Generally, most organizations rate themselves on cybersecurity maturity via a matrix of controls, but rarely are those controls truly tested end to end. The summation of the exercise often results in a number of surprises. At this final stage, it is important to:
- Observe all of the participants and their roles for clarity, understanding, demeanor and competency.
- Monitor the time it takes for important events to be understood and acted upon.
- Look for gaps in training and process. If the plant operator has to go find the manual, how quickly can they find the playbook for isolating the network or to recover systems at scale? Alternatively, if there is an analyst, can they walk through a series of alerts and identify the correct one in a timely manner?
- Extract the learnings the organization and participants just witnessed. For example, someone may have just recovered all the Windows systems and can function, but they lost XYZ transport data and must now manually inspect 123 containers until verified because of the time-lapse. This results also in ABC external repercussions such as inability to deliver products as specified in contract.
- Begin initiatives that result in acquiring technology or resources where clear gaps were observed.
- Act on all of the findings with follow-up test runs of the ransomware event as part of your overall cybersecurity OT program.
Real-world incidents in IT or OT require all hands to be present. An event with limited scope, however, can be simulated with minimal investment and can quickly highlight the gaps in cybersecurity capability, particularly if the organization does not consistently apply cybersecurity basics. Through the combination of simulating a ransomware event across people, processes and technology, organizations can improve their chances to defend against a ransomware attack, limit the impact, find value in their technology investments and create organizational change.
Staying ahead of today’s threats
Simulated attacks and tabletop exercises represent significant tools for reducing risk and bolstering defenses in modern ICS environments. But, they’re not the only arrows in the defender’s quiver. Maximizing OT security maturity requires planning and practice coupled with exhaustive asset inventories, well-crafted policies, robust controls and a unified platform that offers 360-degree visibility into all aspects of ICS security assessment, defense, response and recovery.
– Verve Industrial Protection is a CFE Media content partner.