Nexthink Stops MS Outage From Hurting a Leading Consumer Goods Company
While individual blue screen errors are frustrating, the recent global system crashes caused by a CrowdStrike update incompatible with Microsoft Windows have wreaked havoc across entire industries since early Friday morning. Companies ranging from the airlines, media, and banking industries have been facing significant disruptions, with thousands of customer-facing devices experiencing blue screens and causing widespread travel delays and chaos.
A leading global consumer goods company detected the MS outage with Nexthink well before it made headlines. Nexthink’s platform promptly alerted the IT team to an unusual spike in system crashes. By creating proactive tickets in the IT Service Management (ITSM) system, the issue quickly received the necessary attention.
The first step was to determine the cause of the crashes. Using Nexthink’s real-time device timeline view, their IT team was able to quickly identify that CrowdStrike had been updated on the device right before the system crashes began to occur.
Nexthink’s diagnosis further revealed that most devices were crashing due to a specific error code, indicating a conflict between a component of CrowdStrike and Windows.
10% of the organization’s devices were immediately affected, necessitating a safe boot and deletion of a system file to restore functionality. Nexthink made it simple to pinpoint the impacted devices, ensuring that the appropriate support could be promptly dispatched to those users.
To prevent further issues on the remaining devices, proactive measures were needed. The customer implemented a series of preventive strategies to ensure stability and avoid additional system crashes. This included halting all Windows and application updates temporarily and verifying that each device had the correct version of CrowdStrike installed.
In response to the emerging issue, Nexthink’s engineers demonstrated remarkable agility by swiftly creating a dedicated library pack. This library pack is designed to help customers proactively address and prevent CrowdStrike-related system crashes. The Library pack also contains an automation that gathers the timestamp of the CrowdStrike .sys file present at C:\Windows\System32\drivers\CrowdStrike\ matching the pattern C-00000291*.sys, if the timestamp is found to match 0409 UTC on the 19th if July 2024, the automation will attempt to remove the file. By importing this library pack, IT teams can quickly understand the scale of impact in their environment, fix devices which are impacted and implement preventive measures, ensuring that their devices remain stable and operational.
Please note: CrowdsStrike uses an anti-tampering mechanism that prevents the deletion of files. Therefore, for this automation to work, the CrowdStrike administrator must disable anti-tampering for a brief time while the automation is being executed on the device. And then enable it again when the remediation is complete.
Leveraging Nexthink’s advanced capabilities, the organization’s IT team swiftly identified the issue in its nascent stage. This early detection allowed them to diagnose the root cause with remarkable speed and precision. As a result, the team was able to contain the disruption to a small fraction of devices, effectively preventing what could have been a much more nefarious and widespread outage from occurring.
By acting quickly and efficiently, the organization minimized downtime and maintained operational continuity, showcasing the critical role of proactive monitoring and rapid response in IT management.
For more information on the CrowdStrike Troubleshooting Library pack, click here! And if you’re interested in learning more about Nexthink, contact us.