In the growing DEX industry, we advocate for a predictive approach to digital workplace management. Build processes and systems around the goal of a seamless employee experience, and you’ll deal with fewer IT challenges as a result. However, even the most well-designed system cannot avoid the inescapable impact of technologies greatest foe: human error – as one of our customers, a global technology leader, recently discovered.
This company is dedicated to helping organizations streamline operations and enhance efficiency. With over 100,000 employees across many offices worldwide, their vast global operations rely on smooth IT performance to keep employees productive, and customers satisfied.
At this scale, even minor IT disruptions can have widespread consequences, making uninterrupted operations not just a priority but a necessity. When these critical IT issues arise, the company needs swift, scalable solutions to minimize impact and keep employees focused on delivering exceptional service to their customers.
Read on to discover how a case of human error derailed the employee experience for 5,000 employees, and how the IT team used workflow automations to get everything back on track.
The Problem
As a standard, the IT department at this global technology leader rolled out new Microsoft patches to a beta group on a monthly basis. This method safely allowed the testing and discovery of potential bugs before they rolled out new patches company wide. However, a simple human error resulted in accidental rollout of a recent patch group to all machines.
Unfortunately, this version contained bugs, causing approximately 5,000 devices to become stuck in a continuous reboot loop.
The scale of the issue had a massive impact on operations across the business, overwhelming the support team with a surge of calls as employees struggled with confusion and frustration over their endlessly rebooting devices. With thousands of machines affected, the continuous reboots led to significant downtime, severely hindering productivity as 5,000 employees were left unable to work.
The Approach
Given the severity of the issue, the company needed a solution that could scale effectively without extensive need for manual intervention from their already overwhelmed support team.
The team implemented a workflow in Nexthink Flow to assess each machine's issues and determine the appropriate remediation steps. Several factors influenced the process, including what had been installed, whether the machine had been rebooted, and the status of any patches. To ensure smooth execution and clear communication with end users, they collaborated closely with the patch deployment management team throughout the process.
The first step was to prevent further disruptions. A pop-up notification was sent to all users running the affected OS version, instructing them not to reboot their computers. The team then collaborated with Microsoft and the patch deployment team to identify which devices were impacted. From there, the workflow specifically targeted machines running the Windows version with the faulty patch.
Once the affected devices were identified, a remote action was triggered to check for a hotfix on these machines. If the hotfix was present, the workflow determined whether the patch had been fully installed or only partially applied. Based on this assessment, the IT team established the appropriate next steps to fix each machine.
The final stage focused on guiding employees based on their fix status. To ensure the appropriate remediation, devices were categorized into two groups:
Group 1 (machines that were fine after rebooting): Employees were prompted with a message saying, “Reboot your computer, you’re safe.”
Group 2 (machines that needed manual intervention): Employees were directed to a Knowledge Base article with detailed instructions on how to fix the issue. An automated pop-up message was then scheduled to appear a few hours later, depending on whether the automated fix worked or to inform whether further action was required.
The Impact: 5,000 Hours Reclaimed
By leveraging automated workflows to detect and resolve the issue, the company eliminated the need for manual intervention on thousands of machines, saving over 5,000 hours of employee time. Each affected device would have required at least an hour of IT support, but automation meant that time could be reclaimed, freeing up the support teams as well.
Beyond the operational benefits for IT, the outcomes for employees were just as beneficial. The frustration of repeatedly having to reboot was quickly eliminated, downtime was minimized, and as a result, their workday became smoother and more productive.
From Reactive Fixes to Proactive Solutions
In this scenario, the team had to take a reactive approach to resolve a problem caused by human error. Once the issue was under control, the team shifted to a proactive strategy. They implemented a workflow to save registry data so that if the machines encountered a similar issue in the future, they could be reset properly without losing any registry data in the process.
This unexpected issue demonstrated the importance of always thinking proactively to best equip for those issues that can’t always be predicted. By anticipating future challenges, the team ensures they’re best equipped to handle unforeseen issues proactively and effectively.
Learn more about the full-scale workplace orchestration capabilities offered by Nexthink Flow.