20 July 2024
SafeAeon Inc.Many people, not just those who work in cybersecurity, were shocked when CrowdStrike failed on July 19, 2024. A lot of companies around the world use CrowdStrike as their first line of defense because it is a great endpoint protection tool. The outage had effects on businesses, important infrastructure, and the world economy that had never been seen before. This case study goes into great depth about those effects.
We will look into what went wrong with the technology and how decisions were made that led to the loss and its effects on so many people. This study will also look closely at CrowdStrike's recovery efforts, checking out the company's reaction time, communication methods, and the steps it took to get things back to normal. We want to find useful lessons for cybersecurity companies, IT experts, and businesses that depend on third-party security solutions by fully understanding the details of this event.
What kinds of services were harmed?
About 8.5 million Windows devices were affected by the CrowdStrike logic error flaw. This is less than 1% of all the Windows devices that are installed on computers around the world. Even though this percentage seems small, the systems that were affected were very important to many activities. The loss caused major problems in a number of areas, including:
Flights and Airports
Because of the outage, thousands of planes around the world had to be grounded, which caused major delays and cancellations. More than 10,000 planes around the world were impacted. Airlines in the US, like Delta, United, and American Airlines, had to cancel a lot of flights until their systems were fixed. Many companies and airports around the world were affected, such as KLM, Porter companies, Toronto Pearson International Airport, Zurich Airport, and Amsterdam Schiphol Airport.
Getting Around
A lot of towns, like Chicago, Cincinnati, Minneapolis, New York City, and Washington, D.C., had problems with their public transportation systems. The loss made these transit networks run more slowly and caused delays.
Medical Care
Major problems were reported at hospitals and health services worldwide. Appointment systems were slowed down or cancelled, which affected patient care. Alaska, Indiana, and New Hampshire were among the states where 911 emergency lines were also down.
Services for money
The outage caused problems for financial institutions and internet banking systems around the world. Several payment systems were directly affected, which made transactions and paychecks for people take longer to arrive.
The media and radio
A lot of media and television outlets lost their power because of the outage. Some services, like those from the British channel Sky News, were interrupted.
What Didn't Happen to Apple and Linux
CrowdStrike's software is made to work with a number of different operating systems, such as Linux, Microsoft Windows, and Apple's macOS. But the July outage only affected Windows computers because of a bad update to the sensor setup. This fix, which was called "channel file 291," only worked on Windows computers. It didn't affect macOS or Linux.
When the Falcon sensor is added as a Windows kernel process, it works differently than when it is added to macOS or Linux. The update fixed a problem with named pipe execution, which is a Windows-only function. Because of this, computers running macOS and Linux, which have different integration points, were not affected.
In a different event earlier in June, Linux vendor Red Hat said that the Falcon sensor caused a kernel panic on Linux systems. But this problem was fixed without any major problems being mentioned by Red Hat.
How Long Will It Take for Businesses to Get Better?
Within 79 minutes, CrowdStrike found a way to fix the problem and put it into action. Even though the government responded quickly, it has been hard and takes a long time for companies to get back to normal. The update caused a Blue Screen of Death (BSOD) on the Windows operating system, which stopped computers from working and needed to be fixed by hand.
IT managers had to start up systems that were affected in Safe Mode or the Windows Recovery Environment in order to get rid of the troublesome channel file 291 and get the systems working again. This repair process takes a lot of work, especially for businesses with a lot of devices that are affected. Sometimes, it was necessary to physically reach each machine, which added to the time and work that had to be done.
Some companies were able to fix the problem within a few days, but others had trouble, especially those with a lot of IT equipment or drives that were encrypted. The use of Microsoft Windows BitLocker encryption made recovery even harder because BitLocker recovery keys were needed to get into and restart encrypted systems.
What can businesses do to be better ready for tech problems?
The recent CrowdStrike Windows hack showed how vulnerable we are when we depend too much on technology. Even though system backups and automated processes are very important, adding human steps can make it much easier to keep the business running when technology goes down.
Here are some important things businesses can do to be better ready for tech problems:
1. Make sure all updates work before putting them into production
Using automatic updates to keep computers up to date has been a good idea for a long time. But the CrowdStrike incident showed that this method comes with risks. When updating mission-critical systems, it's smart to test them first or use a staging setting. This can help lower risks and make sure that updates don't mess up processes by accident.
2. Come up with and write down manual workarounds
When technology fails, important business tasks must still be carried out manually. Even though digital solutions are common, writing down and practicing manual processes can be very useful in case of an outage. This planning makes sure that companies can keep running and helping customers even if their technology goes down.
3. Plan for disaster recovery and business continuity
Outages can happen for many reasons, which is why it's important to have strong plans for business continuity and emergency recovery. To keep downtime to a minimum, these plans should include backup systems and equipment. Putting in place backup systems and making sure they are ready to take over important tasks can make outages much less disruptive to operations.
Lessons and Plans for the Future
The outage between CrowdStrike and Microsoft is a strong reminder of how hard it is to manage IT systems. This event shows us a few important lessons for being ready in the future:
1. Exhaustive Tests
Before releasing changes, they must be thoroughly tested in a variety of settings. This helps keep things running smoothly and makes sure that changes don't affect system stability by accident.
2. Work Together with Vendors
It is very important for vendors, IT workers, and end users to be able to talk to each other openly and clearly. Working together can make it easier to handle and lessen the effects of possible problems.
3. Backups and extra copies
To keep running when IT fails, businesses need to set up strong backup systems and alternative plans. When you have a good backup plan, you can keep important functions running even if your main systems have problems.
4. Managing the cloud
It is very important to handle cloud environments well. To keep problems to a minimum, it's important to understand the unique challenges of cloud technology and make good backup plans.
5. Taking care of BitLocker and keys
It is very important to keep BitLocker restore keys safe and easy to get to. Finding a good balance between security and quick recovery helps make responding to IT issues easier.
The global IT outage caused by a CrowdStrike update messed up important services, showing how important it is to have strong protection and good IT management. Businesses and government services rely on digital infrastructure more and more. This makes it more important than ever to have thorough testing, good communication, and strong backup systems. Because of this event, businesses should rethink and improve their IT strategies to make them more resistant to future problems.
Conclusion
The CrowdStrike outage on July 19, 2024, is a stark warning of how important cybersecurity is to running a business today. The widespread effects of this incident make it clear why we need strong incident reaction plans, backups, and a variety of security strategies. Businesses need to put business continuity planning at the top of their list of priorities and think about using multiple layers of safety to lower the risks that come with single points of failure.
As the world of cybersecurity changes, companies need to constantly assess their security and look for better ways to do things. SafeAeon can add another layer of protection against new threats, working with computer security to make it stronger. Companies can improve their security by using more than one option and being proactive about managing risks.
FAQs
1. What was the main reason CrowdStrike went down on July 19, 2024?
When the investigation's final report comes out, we'll answer this frequently asked question. Initial reports, on the other hand, point to a problem with a software update that caused many systems to fail.
2. Which fields were most affected by the CrowdStrike outage?
A lot of different fields were affected by the outage, such as government, healthcare, banking, and transportation. Because they depend on strong cybersecurity, critical infrastructure areas were especially at risk.
3. What did CrowdStrike do when the service went down?
The company has acknowledged the severity of the incident and has initiated a thorough investigation into the root cause of the issue. While the immediate impact of the outage has been mitigated, the long-term consequences and the effectiveness of CrowdStrike's response will continue to be closely monitored by customers, industry experts, and regulators.
4. What can be learned from the lack of CrowdStrike?
Businesses and cybersecurity service providers can use this frequently asked questions (FAQ) to quickly review the most important things they need to know about disaster recovery planning, issue response, and third-party risk management.