Not sure if it’s the devs to blame when there’s statements like:
Kurtz therefore has the possibly unique and almost-certainly-unwanted distinction of having presided over two major global outage events caused by bad software updates.
So, I’m guessing it’s the business that’s not supporting good dev->test->release practices.
But, I agree with your point; their overall software quality is terrible.
true true. If the general business pressures are not conducive to proper software release practices, no amount of programming skill can help them.
BaalInvoker@lemmy.eco.br
on 22 Jul 2024 11:48
nextcollapse
Difference between open source software and closed source software:
CrowdStrike bad coding make Linux crashes -> sysadmin has control over the system and can rapidly fix the issue by disabling CrowdStrike module -> downtime is limited
CrowdStrike bad coding make Windows crashes -> sysadmin has limited control over the system and rely on Windows/CrowdStrike people to fix the issue -> the demand is too high cause the issue happened with many computers around the world at the same time -> huge downtime while few people on Microsoft and/or CrowdStrike fix the issue one by one manually
Rentlar@lemmy.ca
on 22 Jul 2024 15:51
nextcollapse
I’ve kept having to make this point repeatedly every time someone writes “It’s not a Microsoft/closed source problem, it happened to Linux too”.
Bitrot@lemmy.sdf.org
on 22 Jul 2024 15:56
nextcollapse
This is a laughably bad take.
You do realize sysadmins were fixing the Windows issue and not just waiting on Microsoft and CrowdStrike - right? They just had to delete a file.
BaalInvoker@lemmy.eco.br
on 22 Jul 2024 16:02
collapse
Bitrot@lemmy.sdf.org
on 22 Jul 2024 16:12
nextcollapse
Uh, yes. Physically touching thousands of computers to boot them into safe mode and delete a file is time consuming. It turns out physically touching thousands of machines is time consuming anywhere, especially when it is all of them at once.
Which is why your take is laughably bad. Stick to the tech and not zealotry next time, and maybe not CNN for tech news.
superkret@feddit.org
on 22 Jul 2024 16:13
collapse
You have no idea what you’re talking about.
The fix is to boot into safe or recovery mode, delete a file, reboot. That’s it.
The reason it takes so long is because millions of PCs are affected, which usually are administered remotely.
So sysadmins have to drive to multiple places, while their usual workloads wait.
On top of that, you need the encryption recovery keys for each PC to boot into safe mode.
Those are often stored centrally on a server - which may also be encrypted and affected.
Or on an Azure file share, which had an outage at the same time.
Maybe some of the recovery keys are missing. Then you have to reinstall the PC and re-configure every application that was running on it.
And when all of that is over, the admins have to get back on top of all the tasks that were sidelined, which may take weeks.
JWBananas@lemmy.world
on 22 Jul 2024 16:00
nextcollapse
Sysadmin here. Wtf are you talking about? All we did was “rapidly fix the issue by disabling Crowdstrike module.” Or really, just the one bad file. We were back online before most people even woke up.
What do you think Crowdstrike can do from their end to stop a boot loop?
SquigglyEmpire@lemmy.world
on 22 Jul 2024 21:36
nextcollapse
…what?
A busted kernel module/driver/plug-in/whatever that triggers a bootloop is going to require intervention on any platform no matter whether the code happens to be published somewhere out on the internet or not. On top of that, Windows allows you to control/remove 3rd party kernel drivers just like on Linux, which is exactly what many of us have been stuck doing on endless devices for the last three days.
I fully advocate for open-source software and use it where I can, but I also think we should do that by talking about its actual advantages instead of just making up nonsense that will make experienced sysadmins spit out their coffee.
MangoPenguin@lemmy.blahaj.zone
on 23 Jul 2024 00:26
collapse
The fix on windows was just removing the bad file, there was no reliance on crowdstrike to fix the initial issue that I know of.
threaded - newest
Ofc it is. And can’t do any updates because Crowdstrike doesn’t support newer kernels. Apparently security means running out of date packages. 🤡
That first issue was triggered by falcon, but was legitimately a bug in Red Hat’s kernel triggered by bpf.
they seem extremely competent at writing bad software
Line mus go up
That line isn’t going to recover for a while now
But the publicity
Not sure if it’s the devs to blame when there’s statements like:
So, I’m guessing it’s the business that’s not supporting good dev->test->release practices.
But, I agree with your point; their overall software quality is terrible.
true true. If the general business pressures are not conducive to proper software release practices, no amount of programming skill can help them.
Difference between open source software and closed source software:
CrowdStrike bad coding make Linux crashes -> sysadmin has control over the system and can rapidly fix the issue by disabling CrowdStrike module -> downtime is limited
CrowdStrike bad coding make Windows crashes -> sysadmin has limited control over the system and rely on Windows/CrowdStrike people to fix the issue -> the demand is too high cause the issue happened with many computers around the world at the same time -> huge downtime while few people on Microsoft and/or CrowdStrike fix the issue one by one manually
I’ve kept having to make this point repeatedly every time someone writes “It’s not a Microsoft/closed source problem, it happened to Linux too”.
This is a laughably bad take.
You do realize sysadmins were fixing the Windows issue and not just waiting on Microsoft and CrowdStrike - right? They just had to delete a file.
Oh! That’s why the outage could demand long time to recover! Just delete a file takes so long!
I’m glad you said it!
Uh, yes. Physically touching thousands of computers to boot them into safe mode and delete a file is time consuming. It turns out physically touching thousands of machines is time consuming anywhere, especially when it is all of them at once.
Which is why your take is laughably bad. Stick to the tech and not zealotry next time, and maybe not CNN for tech news.
You have no idea what you’re talking about.
The fix is to boot into safe or recovery mode, delete a file, reboot. That’s it.
The reason it takes so long is because millions of PCs are affected, which usually are administered remotely.
So sysadmins have to drive to multiple places, while their usual workloads wait.
On top of that, you need the encryption recovery keys for each PC to boot into safe mode.
Those are often stored centrally on a server - which may also be encrypted and affected.
Or on an Azure file share, which had an outage at the same time.
Maybe some of the recovery keys are missing. Then you have to reinstall the PC and re-configure every application that was running on it.
And when all of that is over, the admins have to get back on top of all the tasks that were sidelined, which may take weeks.
Sysadmin here. Wtf are you talking about? All we did was “rapidly fix the issue by disabling Crowdstrike module.” Or really, just the one bad file. We were back online before most people even woke up.
What do you think Crowdstrike can do from their end to stop a boot loop?
…what?
A busted kernel module/driver/plug-in/whatever that triggers a bootloop is going to require intervention on any platform no matter whether the code happens to be published somewhere out on the internet or not. On top of that, Windows allows you to control/remove 3rd party kernel drivers just like on Linux, which is exactly what many of us have been stuck doing on endless devices for the last three days.
I fully advocate for open-source software and use it where I can, but I also think we should do that by talking about its actual advantages instead of just making up nonsense that will make experienced sysadmins spit out their coffee.
The fix on windows was just removing the bad file, there was no reliance on crowdstrike to fix the initial issue that I know of.
“The most secure system is a system that’s not live. Crowdstrike, bringing you the best-in-class security.”
“I don’t test often but when I do it is in production”
Nobody:
Crowdstrike:
<img alt="" src="https://lemmy.world/pictrs/image/ac8007ea-1a98-4cbe-900d-c8bf5ec52d1c.jpeg">