CrowdStrike IT Outage: Key Lessons for Cyber Resilience

0
148
CrowdStrike and Windows OS Logo

The recent global IT outage experienced by CrowdStrike, a leading cybersecurity company, has raised significant concerns about cyber resilience and the stability of widely used security software. Despite its strong reputation and role in investigating high-profile cyberattacks, such as the SolarWinds hack, the company faced a critical failure that affected millions of users worldwide. This incident, caused by a faulty sensor update on Windows systems, highlights the delicate balance between security and operational stability. The event serves as a crucial reminder of the importance of robust testing, diverse tools, and clear communication strategies in maintaining cyber resilience.

CrowdStrike’s Role in the Cybersecurity Landscape

CrowdStrike, a US cybersecurity company, has thousands of customers in the world and more than 500 of them are big ones like VISA, Intel or Ernst & Young.

Having a market share of around 23% in endpoint-protection market among 36 competitors, CrowdStrike also helped to investigate major cyberattacks, such as the Sony Pictures hack in 2014, the Democratic National Committee hack in 2015 and 2016, the SolarWinds hack in 2021.

The company was at the center of a global IT outage that effected millions of people, businesses and organizations on Friday, July 19.

What Led to the Outage?

The company released a sensor update on Windows systems. The sensor is cloud-based and operates in conjunction with CrowdStrike’s servers, without needing customers to install and manage extra equipment or software.

Such updates are made many times to improve the program, but this time patch didn’t go as planned. A coding mistake, called “Logic error”, caused malfunction only on Windows systems. Millions of users saw “Blue Screen of Death” on their devices. According to Microsoft, around 8.5 million windows devices were affected but not Mac and Linux machines.

The outage was not a cyberattack but some of the airlines, TV broadcasters, banks, and other essential services came to a standstill as a massive outage. Services came back at the same day, but the affected companies had to deal with delays and canceled flights.

How long the recovery time takes, depends on the size and resources of a company’s IT team and the number of affected devices.

Key Lessons Learned for Cyber-Resilience

Cyberresilience is an organization’s ability to effectively respond to and recover from cyber-attacks outages or while continuing to operate normally. Regular data backup, incident response plan to respond to cyberattacks, monitoring the network/assets continuously for unusual activity and potential threats, employee training and awareness are examples of cyber-resilience practices.

This incident underscores the need for robust testing procedures and highlights the potential risks associated with automatic updates for security software, especially in enterprise environments.  This incident serves as a strong reminder of the delicate balance between security and stability, especially in the context of widely deployed enterprise software.

1. Leveraging Linux/Mac for Recovery

Having alternative tools or systems can help maintain operational continuity. When rebooting a Windows machine directly is not possible due to issues like the Blue Screen of Death (BSOD), having a Linux/Mac machine on hand can be beneficial in several ways. Once booted, the user can access the Windows file system and back up important data before attempting further troubleshooting.

2. Reducing Dependence on a Single Software Provider

It is the widespread use of a single software in many systems and organizations. A vulnerability in this software can become a widespread issue, affecting all systems. This incident highlights the dependencies modern infrastructure has on one software.

In the early 2000s, many organizations and individuals relied heavily on Windows XP and Internet Explorer. This widespread reliance on a single operating system and browser led to numerous security vulnerabilities. Exploits targeting Windows XP and Internet Explorer became major threats due to their prevalence. Because the dominance of these systems made them attractive targets for malware and cyberattacks.

3. Turning Crisis Into Opportunity Through Strong Communication

After the incident, CrowdStrike responded within an hour and pushed out an update to replace the flawed files. The rollback of the configuration update seemed to be working.

Clear communication from the CEO helped manage the situation by informing stakeholders about the nature of the issue, the steps being taken to resolve it, and the expected timelines for recovery.

Following the company’s clarification, CrowdStrike’s stock experienced a hard decline. However, the shares have rebounded as confidence in the company’s ability to manage the situation and its overall market position was restored.

This event underscored, indeed, the popularity and the significant role of CrowdStrike in the cybersecurity landscape. The incident also highlighted the prominence of CrowdStrike among over 20 cybersecurity tools available in the market. It became evident that CrowdStrike is highly notable and influential in the cybersecurity industry, given the scale of attention and impact.

The Consequences

The consequences of the CrowdStrike outage have far-reaching implications, affecting not only the company’s reputation but also exposing vulnerabilities in critical systems worldwide. The incident has opened the door for cybercriminals to exploit the situation, while also placing a heavy burden on IT teams to manage recovery. Additionally, the summoning of the CEO to testify before Congress highlights the gravity of the situation, emphasizing the need for better preparedness against unexpected, high-impact events in the cybersecurity landscape. This event serves as a critical reminder of the importance of robust cyber resilience strategies.

1. New opportunity arises for cyber criminals

Within cyber security framework, threat actors are trying to use the opportunity to target affected users with phishing scams and malwares disguised as updates. Emails from fake domains are sent to trick users into visiting fake websites.

In this attack type; users receive a notification or message, which appears to be from a trustworthy source, indicating that a software update is available. The message usually contains a link or attachment. The link might lead to a fake website designed to steal credentials or the attachment might contain malicious software. When the user downloads and installs the so-called update, they are installing malware. This malware could be designed to steal data, monitor activity, or cause other harm.

2. Hard work for IT people

In response to the outage, some users attempt to delete the CrowdStrike sensor or try to manage updates manually. In addition, IT teams in big companies would face an increased workload managing updates. This additional burden can divert resources from other critical tasks, potentially impacting overall IT performance and security posture.

3. CEO summoned to testify

The U.S House of Reps Homeland security committee asked the CEO to testify on the outage stating in the letter that they cannot ignore the largest IT outage in history. The testimony of CrowdStrike CEO George Kurtz before the House Homeland Security Committee may lead to several significant outcomes such as positive or negative industry response, criticism from shareholder and customers.

Calling the CEO to testify before congress may impact on company reputation and market. If the fault is determined to be due to negligence or insufficient testing processes, the company could face legal and financial penalties. However, the CEO’s performance and statements can also impact the company’s public reputation.

A strong performance can enhance the company’s reputation, while a poor performance can lead to a loss of reputation. For example, Tesla CEO Elon Musk’s charisma and the company’s strong performance in 2020 or Spotify CEO Daniel Ek’s answers led to a rapid recovery in the stock prices and company reputation.

The CrowdStrike incident exemplifies a “Black Swan” event, where rare and unforeseen disruptions can cause significant damage. Such disruptions are difficult to predict but have massive impacts on businesses and national security. In hindsight, the code issue may seem identifiable, but anticipating these events and their outcomes beforehand is nearly impossible. The connection highlights the unpredictable nature of cybersecurity failures. Both concepts emphasize the importance of being prepared for unexpected, high-impact risks.

The large-scale outage experienced by CrowdStrike underscores the importance of NATO’s cyber defense strategies. NATO’s Integrated Cyber Defence Centre facilitates the integration, alignment, and cohesion of Information and Communications Technology (ICT) systems across NATO.

It also leads efforts to enhance NATO’s cybersecurity posture, increase awareness, and manage cyber incidents. NATO Cyber Security Centre (NCSC) in Mons, Belgium, is responsible for providing technical cybersecurity services and responding to any cyber incidents affecting NATO.

As illustrated by the CrowdStrike incident, complex and unforeseen disruptions highlight the critical need for robust cybersecurity strategies.

Here are the steps NATO should consider to prepare for situations like the CrowdStrike incident:

  1. Regular stress testing and simulations,
  2. Improve patch management and testing procedures,
  3. Diversify cyber defense tools,
  4. Clear communication with all partners,
  5. Increase information sharing among partners, and cybersecurity agencies,
  6. Provide continuous training for NATO personnel on cyber resilience practices,
  7. Review and update NATO’s cyber defense policies regularly,
  8. Utilize advanced technologies such as AI (Artificial Intelligence) and ML (Machine Learning) for threat detection, and incident response.

Sources:

  1. https://www.crowdstrike.com/blog/how-falcon-complete-stopped-a-solarwinds-serv-u-exploit-campaign/
  2. https://cyberpress.org/crowdstrike-update-triggers-endless-bsod/
  3. https://thehackernews.com/2024/07/crowdstrike-warns-of-new-phishing-scam.html
  4. https://www.welivesecurity.com/en/cybersecurity/building-cyber-resilience-lessons-learned-crowdstrike-incident/
  5. https://www.ncia.nato.int/what-we-do/cyber-security.html