CrowdStrike downtime apparently caused by update that replaced a file with 42kb of zeroes

CrowdStrike downtime apparently caused by update that replaced a file with 42kb of zeroes (twiiit.com)
from Aatube@kbin.melroy.org to technology@lemmy.world on 19 Jul 2024 17:33
https://kbin.melroy.org/m/technology@lemmy.world/t/364020

…according to a Twitter post by the Chief Informational Security Officer of Grand Canyon Education.

So, does anyone else find it odd that the file that caused everything CrowdStrike to freak out, C-00000291-
00000000-00000032.sys was 42KB of blank/null values, while the replacement file C-00000291-00000000-
00000.033.sys was 35KB and looked like a normal, if not obfuscated sys/.conf file?

Also, apparently CrowdStrike had at least 5 hours to work on the problem between the time it was discovered and the time it was fixed.

#crowdstrike #cybersec #cybersecurity #downtime #hack #technology #windows

threaded - newest

bjoern_tantau@swg-empire.de on 19 Jul 2024 17:42 next collapse

Ah, a classic off by 43,008 zeroes error.

TropicalDingdong@lemmy.world on 19 Jul 2024 18:30 collapse

If you listen closely, you can hear this file.

Gork@lemm.ee on 19 Jul 2024 18:09 next collapse

How can all of those zeroes cause a major OS crash?

MajinBlayze@lemmy.world on 19 Jul 2024 18:12 next collapse

Because it’s supposed to be something else

jared@mander.xyz on 19 Jul 2024 18:17 next collapse

At least a few 1’s I imagine.

Iheartcheese@lemmy.world on 19 Jul 2024 18:25 collapse

What if we put in a 2

kinkles@sh.itjust.works on 19 Jul 2024 18:36 next collapse

Society isn’t ready for that

NaibofTabr@infosec.pub on 19 Jul 2024 18:38 collapse

thurstylark@lemm.ee on 19 Jul 2024 20:01 collapse

Well, you see, the front fell off.

tiramichu@lemm.ee on 19 Jul 2024 18:22 next collapse

If I send you on stage at the Olympic Games opening ceremony with a sealed envelope

And I say “This contains your script, just open it and read it”

And then when you open it, the script is blank

You’re gonna freak out

Gork@lemm.ee on 19 Jul 2024 18:24 next collapse

Ah, makes sense. I guess a driver would completely freak out if that file gave no instructions and was just like “…”

[deleted] on 19 Jul 2024 21:38 collapse

planish@sh.itjust.works on 19 Jul 2024 21:56 next collapse

That’s what the BSOD is. It tries to bring the system back to a nice safe freshly-booted state where e.g. the fans are running and the GPU is not happily drawing several kilowatts and trying to catch fire.

TimeSquirrel@kbin.melroy.org on 19 Jul 2024 22:33 collapse

No try-catch, no early exit condition checking and return, just nuke the system and start over?

Aatube@kbin.melroy.org on 20 Jul 2024 00:08 next collapse

what do you propose, run faulty code that could maybe actually nuke your system, not just memory but storage as well?

kogasa@programming.dev on 20 Jul 2024 04:26 next collapse

Catch and then what? Return to what?

ChairmanMeow@programming.dev on 20 Jul 2024 09:45 next collapse

Windows assumes that you installed that AV for a reason. If it suddenly faults, who’s to say it’s a bug and not some virus going ham on the AV? A BSOD is the most graceful exit you could do, ignoring and booting a potentially compromised system is a fairly big no-no (especially in systems that feel the need to install AV like this in the first place).

reddit_sux@lemmy.world on 20 Jul 2024 11:09 next collapse

BSOD is the ultimate catch statement of the OS. It will gracefully close all open data streams and exit. Of course it is not the usual exit so it gives a graphic representation of what not have gone wrong.

If it would have been nuking it wouldn’t show anything.

Morphit@feddit.uk on 20 Jul 2024 11:24 collapse

A page fault can be what triggers a catch, but you can’t unwind what a loaded module (the Crowdstrike driver) did before it crashed. It could have messed with Windows kernel internals and left them in a state that is not safe to continue. Rather than potentially damage the system, Windows stops with a BSOD. The only solution would be to not allow code to be loaded into the kernel at all, but that would make hardware drivers basically impossible.

Kaboom@reddthat.com on 19 Jul 2024 23:22 collapse

For most things, yes. But if someone were to compromise the file, stopping when they see it invalid is probably a good idea for security

sigmaklimgrindset@sopuli.xyz on 19 Jul 2024 19:46 next collapse

Great layman’s explanation.

Imgonnatrythis@sh.itjust.works on 19 Jul 2024 20:30 next collapse

Maybe. But I’d like to think I’d just say something clever like, “says here that this year the pummel horse will be replaced by yours truly!”

Hazzia@infosec.pub on 19 Jul 2024 20:46 next collapse

I’m gonna take from this that we should have AI doing disaster recovery on all deployments. Tech CEO’s have been hyping AI up so much, what could possibly go wrong?

Couldbealeotard@lemmy.world on 20 Jul 2024 02:53 collapse

What are the chances that Crowdstrike started using ai to do their update deployments, and they just won’t admit it?

Takios@discuss.tchncs.de on 19 Jul 2024 22:54 collapse

Problem is that software cannot deal with unexpected situations like a human brain can. Computers do exactly what a programmer tells it to do, nothing more nothing less. So if a situation arises that the programmer hasn’t written code for, then there will be a crash.

deadbeef79000@lemmy.nz on 20 Jul 2024 00:23 collapse

Poorly written code can’t.

In this case:

Load config data
If data is valid:
1. Use config data
If data is invalid:
1. Crash entire OS

Is just poor code.

5C5C5C@programming.dev on 20 Jul 2024 01:40 next collapse

When talking about the driver level, you can’t always just proceed to the next thing when an error happens.

Imagine if you went in for open heart surgery but the doctor forgot to put in the new valve while he was in there. He can’t just stitch you up and tell you to get on with it, you’ll be bleeding away inside.

In this specific case we’re talking about security for business devices and critical infrastructure. If a security driver is compromised, in a lot of cases it may legitimately be better for the computer to not run at all, because a security compromise could mean it’s open season for hackers on your sensitive device. We’ve seen hospitals held random, we’ve seen customer data swiped from major businesses. A day of downtime is arguably better than those outcomes.

The real answer here is crowdstrike needs a more reliable CI/CD pipeline. A failure of this magnitude is inexcusable and represents a major systemic failure in their development process. But the OS crashing as a result of that systemic failure may actually be the most reasonable desirable outcome compared to any other possible outcome.

deadbeef79000@lemmy.nz on 20 Jul 2024 01:42 next collapse

But the OS crashing as a result of that systemic failure may actually be the most reasonable desirable outcome compared to any other possible outcome.

In which case this should’ve been documented behaviour and probably configurable.

Morphit@feddit.uk on 20 Jul 2024 11:37 next collapse

This error isn’t intentionally crashing because of a security risk, though that could happen. It’s a null pointer exception, so there are no static or runtime checks that could have prevented or handled this more gracefully. This was presumably a bug in the driver for a long time, then a faulty config file came and triggered the crashes. Better static analysis and testing of the kernel driver is one aspect, how these live config updates are deployed and monitored is another.

CeeBee_Eh@lemmy.world on 21 Jul 2024 17:21 collapse

That’s a bad analogy. CrowdStrike’s driver encountering an error isn’t the same as not having disk IO or a memory corruption. If CrowdStrike’s driver ~~didn’t load at all~~ wasn’t installed the system could still boot.

It should absolutely be expected that if the CrowdStrike driver itself encounters an error, there should be a process that allows the system to gracefully recover. The issue is that CrowdStrike likely thought of their code as not being able to crash as they likely only ever tested with good configs, and thus never considered a graceful failure of their driver.

5C5C5C@programming.dev on 21 Jul 2024 17:35 collapse

I don’t doubt that in this case it’s both silly and unacceptable that their driver was having this catastrophic failure, and it was probably caused by systemic failure at the company, likely driven by hubris and/or cost-cutting measures.

Although I wouldn’t take it as a given that the system should be allowed to continue if the anti-virus doesn’t load properly more generally.

For an enterprise business system, it’s entirely plausible that if a crucial anti-virus driver can’t load properly then the system itself may be compromised by malware, or at the very least the system may be unacceptably vulnerable to malware if it’s allowed to finish booting. At that point the risk of harm that may come from allowing the system to continue booting could outweigh the cost of demanding manual intervention.

In this specific case, given the scale and fallout of the failure, it probably would’ve been preferable to let the system continue booting to a point where it could receive a new update, but all I’m saying is that I’m not surprised more generally that an OS just goes ahead and treats an anti-virus driver failure at BSOD worthy.

Takios@discuss.tchncs.de on 20 Jul 2024 07:49 next collapse

I agree that the code is probably poor but I doubt it was a conscious decision to crash the OS.

The code is probably just:

Load config data
Do something with data

And 2 fails unexpectedly because the data is garbage and wasn’t checked if it’s valid.

Morphit@feddit.uk on 20 Jul 2024 11:32 next collapse

You can still catch the error at runtime and do something appropriate. That might be to say this update might have been tampered with and refuse to boot, but more likely it’d be to just send an error report back to the developers that an unexpected condition is being hit and just continuing without loading that one faulty definition file.

CeeBee_Eh@lemmy.world on 21 Jul 2024 13:23 collapse

If there’s an error, use last known good config. So many systems do this.

ToyDork@preserve.games on 21 Jul 2024 19:49 collapse

Unfortunately, an OS that covers such cases is a lost monetization opportunity, fuck the system, use a Linux distro, you get the idea. Microsoft makes money off of tech support for people too unversed in computers to fix it themselves.

ChairmanMeow@programming.dev on 20 Jul 2024 09:47 collapse

If AV suddenly stops working, it could mean the AV is compromised. A BSOD is a desirable outcome in that case. Booting a compromised system anyway is bad code.

CeeBee_Eh@lemmy.world on 21 Jul 2024 13:22 collapse

You know there’s a whole other scenario where the system can simply boot the last known good config.

ChairmanMeow@programming.dev on 21 Jul 2024 16:40 collapse

And what guarantees that that “last known good config” is available, not compromised and there’s no malicious actor trying to force the system to use a config that has a vulnerability?

CeeBee_Eh@lemmy.world on 21 Jul 2024 17:13 collapse

The following:

An internal backup of previous configs
Encrypted copies
Massive warnings in the system that current loaded config has failed integrity check

There’s a load of other checks that could be employed. This is literally no different than securing the OS itself.

This is essentially a solved problem, but even then it’s impossible to make any system 100% secure. As the person you replied to said: “this is poor code”

Edit: just to add, failure for the system to boot should NEVER be the desired outcome. Especially when the party implementing that is a 3rd party service. The people who setup these servers are expecting them to operate for things to work. Nothing is gained from a non-booting critical system and literally EVERYTHING to lose. If it’s critical then it must be operational.

ChairmanMeow@programming.dev on 21 Jul 2024 19:24 collapse

The 3rd party service is AV. You do not want to boot a potentially compromised or insecure system that is unable to start its AV properly, and have it potentially access other critical systems. That’s a recipe for a perhaps more local but also more painful disaster. It makes sense that a critical enterprise system does not boot if something is off. No AV means the system is a security risk and should not boot and connect to other critical/sensitive systems, period.

These sorts of errors should be alleviated through backup systems and prevented by not auto-updating these sorts of systems.

Sure, for a personal PC I would not necessarily want a BSOD, I’d prefer if it just booted and alerted the user. But for enterprise servers? Best not.

CeeBee_Eh@lemmy.world on 24 Jul 2024 13:36 collapse

Sure, for a personal PC I would not necessarily want a BSOD, I’d prefer if it just booted and alerted the user. But for enterprise servers? Best not.

You have that backwards. I work as a dev and system admin for a medium sized company. You absolutely do not want any server to ever not boot. You absolutely want to know immediately that there’s an issue that needs to be addressed ASAP, but a loss of service generally means loss of revenue and, even worse, a loss of reputation. If you server is briefly at a lower protection level that’s not an issue unless you’re actively being targeted and attacked. But if that’s the case then getting notified of an issue can get some people to deal with it immediately.

ChairmanMeow@programming.dev on 24 Jul 2024 14:05 collapse

A single server not booting should not usually lead to a loss of service as you should always run some sort of redundancy.

I’m a dev for a medium-sized PSP that due to our customers does occasionally get targetted by malicious actors, including state actors. We build our services to be highly available, e.g. a server not booting would automatically do a failover to another one, and if that fails several alerts will go off so that the sysadmins can investigate.

Temporary loss of service does lead to reputational damage, but if contained most of our customers tend to be understanding. However, if a malicious actor could gain entry to our systems the damage could be incredibly severe (depending on what they manage to access of course), so much so that we prefer the service to stop rather than continue in a potentially compromised state. What’s worse: service disrupted for an hour or tons of personal data leaked?

Of course, your threat model might be different and a compromised server might not lead to severe damage. But Crowdstrike/Microsoft/whatever may not know that, and thus opt for the most “secure” option, which is to stop the boot process.

deadbeef79000@lemmy.nz on 20 Jul 2024 00:27 next collapse

Except “freak out” could have various manifestations.

In this case it was “burn down the venue”.

It should have been “I’m sorry, there’s been an issue, let’s move on to the next speaker”

tiramichu@lemm.ee on 20 Jul 2024 00:34 next collapse

You’re right of course and that should be on Microsoft to better implement their driver loading. But yes.

Morphit@feddit.uk on 20 Jul 2024 11:14 collapse

The driver is in kernel mode. If it crashes, the kernel has no idea if any internal structures have been left in an inconsistent state. If it doesn’t halt then it has the potential to cause all sorts of damage.

the_crotch@sh.itjust.works on 20 Jul 2024 00:52 next collapse

In this case it was “burn down the venue”.

It was more like “barricade the doors until a swat team sniper gets a clear shot at you”.

deadbeef79000@lemmy.nz on 20 Jul 2024 01:01 collapse

Hmmmm.

More like standing there and loudly shitting your pants and spreading it around the stage.

Strykker@programming.dev on 20 Jul 2024 02:37 next collapse

Except since it was an antivirus software the system is basically told “I must be running for you to finish booting”, which does make sense as it means the antivirus can watch the system before any malicious code can get it’s hooks into things.

Morphit@feddit.uk on 20 Jul 2024 11:09 collapse

I don’t think the kernel could continue like that. The driver runs in kernel mode and took a null pointer exception. The kernel can’t know how badly it’s been screwed by that, the only feasible option is to BSOD.

The driver itself is where the error handling should take place. First off it ought to have static checks to prove it can’t have trivial memory errors like this. Secondly, if a configuration file fails to load, it should make a determination about whether it’s safe to continue or halt the system to prevent a potential exploit. You know, instead of shitting its pants and letting Windows handle it.

Thann@lemmy.ml on 20 Jul 2024 14:26 next collapse

The envelope contains a barrel of diesel and a lit flare

OozingPositron@feddit.cl on 20 Jul 2024 18:51 collapse

Computers have social anxiety.

digdilem@lemmy.ml on 20 Jul 2024 07:47 next collapse

Nice analogy, except you’d check the script before you tried to use it. Computers are really good at crc/hash checking files to verify their integrity, and that’s exactly what a privileged process like antivirus should do with every source of information.

Cocodapuf@lemmy.world on 20 Jul 2024 10:56 next collapse

I’m nominating this for the “best metaphor of the day” award.

Well done!

JasonDJ@lemmy.zip on 20 Jul 2024 14:50 next collapse

The funny bit is, I’m sure more than a few people at Crowdstrike are preparing 3 envelopes right now.

crystalmerchant@lemmy.world on 20 Jul 2024 21:55 next collapse

This guy ELI5s

CeeBee_Eh@lemmy.world on 21 Jul 2024 13:21 collapse

Ah yes. So Windows is the screaming in terror version and other systems are the “oh, sorry everyone, looks like there’s an error. Let’s just move on to the next bit” version.

driving_crooner@lemmy.eco.br on 19 Jul 2024 18:31 next collapse

The file is used to store values to use as denominators on some divisions down the process. Being all zeros is caused a division by zero erro. Pretty rookie mistake, you should do IFERROR(;0) when using divisions to avoid that.

sugar_in_your_tea@sh.itjust.works on 19 Jul 2024 20:05 next collapse

I disagree. I’d rather things crash than silently succeed or change the computation. They should have done better input and output validation, and gracefully fail into a recoverable state that sends a message to an admin to correct. A divide by zero doesn’t crash a system, it’s a recoverable error they should 100% detect and handle, hot sweep under the rug.

driving_crooner@lemmy.eco.br on 19 Jul 2024 20:36 collapse

Life pro tip: if you’re a python programmer you should use try: func() except: continue every time you run a function, that way ypu would never have errors on your code.

sugar_in_your_tea@sh.itjust.works on 19 Jul 2024 22:17 next collapse

Lol.

CeeBee_Eh@lemmy.world on 21 Jul 2024 17:26 collapse

that way ypu would never have errors on your code.

🤔

Morphit@feddit.uk on 20 Jul 2024 11:40 collapse

IFERROR(;0)

Maybe they should use a more appropriate development tool for their critical security platform than Excel.

urquell@lemm.ee on 19 Jul 2024 18:36 next collapse

Well, the file shouldn’t be zeroes

lastjunkieonearth@lemdro.id on 20 Jul 2024 03:57 collapse

The front of the file fell off

LodeMike@lemmy.today on 19 Jul 2024 19:53 collapse

Windows

diffusive@lemmy.world on 19 Jul 2024 18:24 next collapse

If I had to bet my money, a bad machine with corrupted memory pushed the file at a very final stage of the release.

The astonishing fact is that for a security software I would expect all files being verified against a signature (that would have prevented this issue and some kinds of attacks

jlh@lemmy.jlh.name on 19 Jul 2024 19:45 next collapse

Windows kernel drivers are signed by Microsoft. They must have rubber stamped this for this to go through, though.

diffusive@lemmy.world on 19 Jul 2024 19:49 next collapse

This was not the driver, it was a config file or something read by the driver. Now having a driver in kernel space depending on a config on a regular path is another fuck up

jlh@lemmy.jlh.name on 19 Jul 2024 20:12 collapse

isn’t .sys a driver?

Jakeroxs@sh.itjust.works on 19 Jul 2024 20:30 next collapse

Not just drivers, no fileinfo.com/extension/sys

Evilcoleslaw@lemmy.world on 20 Jul 2024 19:35 collapse

So yes, .sys is by convention on Windows is for a kernel mode driver. However, Crowdstrike specifically uses .sys for non-driver files and this specifically was not a driver.

PythagreousTitties@lemm.ee on 19 Jul 2024 20:31 collapse

What about the Mac and Linux PCs? Did Microsoft sign those too?

jlh@lemmy.jlh.name on 19 Jul 2024 21:05 next collapse

Not sure about Mac, but on Linux, they’re signed by the distro maintainer or with the computer’s secure boot key.

wiki.ubuntu.com/UEFI/SecureBoot

PythagreousTitties@lemm.ee on 19 Jul 2024 21:08 collapse

So… Microsoft couldn’t have “rubber-stamped” anything to do with the outage.

feannag@sh.itjust.works on 19 Jul 2024 21:15 collapse

The outage only affected the Windows version of Falcon. OSX and Linux were not affected.

PythagreousTitties@lemm.ee on 19 Jul 2024 23:59 collapse

This time. Last time it did affect Linux. It doesn’t have anything to do with Microsoft.
Sorry to burst your bubble.

Aatube@kbin.melroy.org on 20 Jul 2024 00:07 next collapse

what are you on about? who suggested anything about microsoft?

PythagreousTitties@lemm.ee on 20 Jul 2024 00:12 collapse

Windows kernel drivers are signed by Microsoft. They must have rubber stamped this for this to go through, though.

Try to keep up.

witx@lemmy.sdf.org on 20 Jul 2024 14:02 collapse

You look so kewl if I were a child again I’d speak just like you

PythagreousTitties@lemm.ee on 20 Jul 2024 17:35 collapse

Quoting the comment that started this thread is speaking like a child to you?

blind3rdeye@lemm.ee on 20 Jul 2024 08:02 next collapse

In this thread we’re talking about the recent problem with CrowdStrike on Windows that brought down various services around the world. So I don’t know who’s bubble you think you’re bursting by talking about something else.

PythagreousTitties@lemm.ee on 20 Jul 2024 14:56 collapse

You l people have a horrible time following threads.

[deleted] on 20 Jul 2024 14:00 collapse

Aatube@kbin.melroy.org on 19 Jul 2024 21:09 collapse

only the Windows version was affected

LodeMike@lemmy.today on 19 Jul 2024 19:52 next collapse

Which is still unacceptable.

LodeMike@lemmy.today on 19 Jul 2024 19:52 next collapse

Which is still unacceptable.

BossDj@lemm.ee on 19 Jul 2024 22:44 next collapse

So here’s my uneducated question: Don’t huge software companies like this usually do updates in “rollouts” to a small portion of users (companies) at a time?

[deleted] on 19 Jul 2024 23:45 next collapse

Dashi@lemmy.world on 20 Jul 2024 01:09 next collapse

I mean yes, but one of the issuess with “state of the art av” is they are trying to roll out updates faster than bad actors can push out code to exploit discovered vulnerabilities.

The code/config/software push may have worked on some test systems but MS is always changing things too.

madcaesar@lemmy.world on 20 Jul 2024 13:08 collapse

Somone else said this wasn’t a case of this breaks on windows system version XXX with update YYY on a Tuesday at 12:24 pm when clock is set to eastern standard time. It literally breaks on ANY windows machine, instantly, on boot. There is no excuse for this.

echodot@feddit.uk on 20 Jul 2024 10:19 next collapse

Companies don’t like to be beta testers. Apparently the solution is to just not test anything and call it production ready.

JasonDJ@lemmy.zip on 20 Jul 2024 14:46 collapse

Every company has a full-scale test environment. Some companies are just lucky enough to have a separate prod environment.

Norgoroth@lemmy.world on 20 Jul 2024 15:40 collapse

Peak programmer humor

JasonDJ@lemmy.zip on 20 Jul 2024 23:38 collapse

I’m a bit rusty. I’d give it a C++.

expr@programming.dev on 20 Jul 2024 12:20 next collapse

That’s certainly what we do in my workplace. Shocked that they don’t.

deegeese@sopuli.xyz on 20 Jul 2024 19:36 collapse

When I worked at a different enterprise IT company, we published updates like this to our customers and strongly recommended they all have a dedicated pool of canary machines to test the update in their own environment first.

I wonder if CRWD advised their customers to do the same, or soft-pedaled the practice because it’s an admission there could be bugs in the updates.

I know the suggestion of keeping a stage environment was off putting to smaller customers.

Angry_Autist@lemmy.world on 20 Jul 2024 14:08 collapse

From my experience it was more likely to be an accidental overwrite from human error with recent policy changes that removed vetting steps.

rozodru@lemmy.ca on 20 Jul 2024 15:27 collapse

this is what I suspect also. I mean it’s easy to point fingers at George Kurtz as he was CTO at Mcafee when they had their “little” snafu but…well…yeah. I strongly suspect many of his “policies” he had while CTO at Mcafee carried over to Crowdstrike. dude isn’t exactly known for being a fan of testing or vetting processes. in fact he’s all about quick development/crunch.

Angry_Autist@lemmy.world on 20 Jul 2024 16:41 collapse

Quick development will probably spell the end of the internet once AI code creation hits its stride. It’ll be like the most topheavy SCRUM you’ve ever seen with the devs literally incapable of disagreeing.

I was thinking about his stint at McAfee, and I think you’re right. My real question is: will the next company he golden parachutes off to learn the lesson?

I’m going to bet not.

cupcakezealot@lemmy.blahaj.zone on 19 Jul 2024 18:36 next collapse

have they ruled out any possibility of a man in the middle attack by a foreign actor?

db2@lemmy.world on 19 Jul 2024 19:21 next collapse

Or it being an intentional proof of concept

simplejack@lemmy.world on 19 Jul 2024 19:27 next collapse

This was not a cyberattack.

crowdstrike.com/…/statement-on-falcon-content-upd…

I guess they could be lying, but if they were lying, I don’t know if their argument of “we’re incompetent” is instilling more trust in them.

xavier666@lemm.ee on 19 Jul 2024 20:25 collapse

“We are confident that only our engineers can fuck up so much, instead of our competitors”

Kazumara@discuss.tchncs.de on 19 Jul 2024 20:55 next collapse

In the middle of the download path of all the machines that got the update?

planish@sh.itjust.works on 19 Jul 2024 21:54 next collapse

Foreign to who?

kyle@lemm.ee on 19 Jul 2024 21:59 collapse

“Foreign” in this context just means “not Crowdstrike”, not like a foreign government.

floofloof@lemmy.ca on 19 Jul 2024 22:36 collapse

The CEO made a statement to the effect of “It’s not an attack, it’s just me and my company being shockingly incompetent.” He didn’t use exactly those words but that was the gist.

independantiste@sh.itjust.works on 19 Jul 2024 18:41 next collapse

Every affected company should be extremely thankful that this was an accidental bug, because if crowdstrike gets hacked, it means the bad actors could basically ransom I don’t know how many millions of computers overnight

Not to mention that crowdstrike will now be a massive target from hackers trying to do exactly this

Miaou@jlai.lu on 19 Jul 2024 18:53 next collapse

I’d assume state (or other serious) actors already know about these companies.

Evotech@lemmy.world on 19 Jul 2024 19:02 next collapse

Don’t Google solar winds

planish@sh.itjust.works on 19 Jul 2024 21:53 next collapse

Holy hell

SomethingBurger@jlai.lu on 19 Jul 2024 23:16 collapse

New vulnerability just dropped

peopleproblems@lemmy.world on 19 Jul 2024 23:07 next collapse

Oooooooo this one again thank you for reminding me

floofloof@lemmy.ca on 20 Jul 2024 21:13 next collapse

That one turns out to have been largely Microsoft’s fault for repeatedly ignoring warnings of a severe vulnerability relating to Active Directory. Microsoft were warned about it, acknowledged it and ignored it for years until it got used in the Solar Winds hack.

[deleted] on 21 Jul 2024 01:01 collapse

qprimed@lemmy.ml on 19 Jul 2024 19:32 next collapse

security as a service is about to cost the world a pretty penny.

Telorand@reddthat.com on 19 Jul 2024 19:55 next collapse

You mean it’s going to cost corporations a pretty penny. Which means they’ll pass those “costs of operation” on to the rest of us. Fuck.

qprimed@lemmy.ml on 19 Jul 2024 20:17 next collapse

well, the world does include the rest of us.

and its not just opeerational costs. what happens when an outage lasts 3+ days and affects all communication and travel? thats another massive shock to the system.

they come faster and faster.

figjam@midwest.social on 19 Jul 2024 23:07 next collapse

Either that or cyber instance

zbyte64@awful.systems on 20 Jul 2024 08:18 collapse

You did not just fall out of a coconut tree. You exist in a context of all that came before you.

Manifish_Destiny@lemmy.world on 19 Jul 2024 20:41 next collapse

Where’s my fuckin raise

littlewonder@lemmy.world on 20 Jul 2024 15:29 collapse

All the more reason for companies to ignore security until they’re affected personally. The companies I’ve worked for barely ever invested in future cost-savings.

helpImTrappedOnline@lemmy.world on 20 Jul 2024 02:28 next collapse

I’ve got a feeling crowdstrike won’t be as grand of target anymore. They’re sure to lose a lot of clients…at least until they spin up a new name and erease all traces of “crowdstrike”.

echodot@feddit.uk on 20 Jul 2024 10:17 next collapse

That trick doesn’t work for B2B as organizations tend to do their research before buying. Consumers tend not to.

reddit_sux@lemmy.world on 20 Jul 2024 10:36 collapse

I don’t think they will lose any big clients. I am sure they will have insurance to take care of compensations.

echodot@feddit.uk on 20 Jul 2024 10:16 next collapse

On Monday I will once again be raising the point of not automatically updating software. Just because it’s being updated does not mean it’s better and does not mean we should be running it on production servers.

Of course they won’t listen to me but at least it’s been brought up.

shield_gengar@sh.itjust.works on 20 Jul 2024 10:49 next collapse

I thought it was a security definition download; as in, there’s nothing short of not connecting to the Internet that you can do about it.

echodot@feddit.uk on 20 Jul 2024 12:00 collapse

Well I haven’t looked into it for this piece of software but essentially you can prevent automatic updates from applying to the network. Usually because the network is behind a firewall that you can use to block the update until you decide that you like it.

Also a lot of companies recognize that businesses like to check updates and so have more streamlined ways of doing it. For instance Apple have a whole dedicated update system for iOS devices that only businesses have access to where you can decide you don’t want the latest iOS and it’s easy you just don’t enable it and it doesn’t happen.

Regardless of the method, what should happen is you should download the update to a few testing computers (preferably also physically isolated from the main network) and run some basic checks to see if it works. In this case the testing computers would have blue screened instantly, and you would have known that this is not an update that you want on your system. Although usually requires a little bit more investigation to determine problems.

Angry_Autist@lemmy.world on 20 Jul 2024 14:06 collapse

It makes me so fuckdamn angry that people make this assumption.

This Crowdstrike update was NOT pausable. You cannot disable updates without disabling the service as they get fingerprint files nearly every day.

lando55@lemmy.world on 20 Jul 2024 15:06 collapse

I hear you, but there’s no reason to be angry.

When I first learned of the issue, my first thought was, “Hey our update policy doesn’t pull the latest sensor to production servers.” After a little more research I came to the same conclusion you did, aside from disconnecting from the internet there’s nothing we really could have done.

There will always be armchair quarterbacks, use this as an opportunity to teach, life’s too short to be upset about such things.

Angry_Autist@lemmy.world on 20 Jul 2024 15:13 collapse

It doesn’t help that I’m medically angry 80% of the time for mostly no reason, but even without that this would incense me because I’ve had 40+ users shouting similar uneducated BS at me yesterday thinking that it was personally my fault that 40% of the world bluescreened. No I am not exaggerating.

I have written and spoken phrases ‘No we could not prevent this update’ so many times in the last 24 hours that they have become meaningless to me through semantic satiation.

ToyDork@preserve.games on 21 Jul 2024 20:04 collapse

Take it from me, reality is a prison. If your issues are as bad as mine, escapism is the only solution and social media is the polar opposite of escapism. I’m not saying “do drugs”, I’m saying “threaten to quit, and if they call your bluff, make an untraceable alteration to fuck the company over and quietly hand in your resignation” before taking a break to indulge hobbies while searching for another job.

And if you can’t afford to lose your job at all for any length, you now have a morally-acceptable reason to kill everyone in your workplace because better death than slavery to a system this corrupt.

expr@programming.dev on 20 Jul 2024 12:16 collapse

Thank God someone else said it. I was constantly in an existential battle with IT at my last job when they were constantly forcing updates, many of which did actually break systems we rely on because Apple loves introducing breaking changes in OS updates (like completely fucking up how dynamic libraries work).

Updates should be vetted. It’s a pain in the ass to do because companies never provide an easy way to rollback, but this really should be standard practice.

echodot@feddit.uk on 21 Jul 2024 07:05 collapse

You can use AirWatch to deal with Apple devices. Although it is a clunky program it does at least give you the ability to roll things back.

Angry_Autist@lemmy.world on 20 Jul 2024 14:04 next collapse

This is why I openly advocate for a diverse ecosystems of services, so not everyone is affected if the biggest gets targeted.

But unfortunately, capitalism favors only the frontrunner and everyone else can go spin, and we aren’t getting rid of capitalism anytime soon.

So basically, it is inevitable that crowdstrike WILL be hacked, and the next time will be much much worse.

driving_crooner@lemmy.eco.br on 20 Jul 2024 17:35 next collapse

Years ago I read an study about insurance companies and diversification of assets in Brazil. By regulation, an individual insurance company need to have a diversified investment portfolio, but the insurance market as a whole not. the diversification of every individual company sum, as a whole of all the insurance market, as an was exposed market, and the researchers found, iirc, like 3 banks that if they fail they can cause a chain reaction that would take out the entire insurance market.

Don’t know why, but your comment made me remind of that.

Angry_Autist@lemmy.world on 20 Jul 2024 17:52 collapse

That’s kind of fascinating, never considered what the results of that kind of regulation can bring without anyone even noticing it at the time. Thanks for a good reading topic for lunch!

Cryophilia@lemmy.world on 20 Jul 2024 17:57 collapse

Properly regulated capitalism breaks up monopolies so new players can enter the market. What you’re seeing is dysfunctional capitalism - an economy of monopolies.

Angry_Autist@lemmy.world on 20 Jul 2024 19:20 collapse

Sorry no, capitalism is working exactly as intended. Concentration of wealth breaks regulation with unlimited political donations.

You call it unregulated, but that is the natural trend for when the only acceptable goal is the greater accumulation of wealth. There comes a time when that wealth is financially best spent buying politicians.

Until there are inherent mechanisms within capitalism to prevent special interest money from pushing policy and direct regulatory capture, capitalism will ALWAYS trend to deregulation.

Cryophilia@lemmy.world on 20 Jul 2024 20:40 next collapse

You call it unregulated, but that is the natural trend for when the only acceptable goal is the greater accumulation of wealth.

Yes…obviously.

And that IS dysfunctional capitalism.

Until there are inherent mechanisms within capitalism to prevent special interest money from pushing policy and direct regulatory capture

That’s exactly what I’m saying, dude.

This is NOT capitalism working as intended. This is broken capitalism. Runaway capitalism. Corrupt capitalism.

hglman@lemmy.ml on 21 Jul 2024 04:05 collapse

Its like saying we just need good kings, no ids a bad system. Any capitalist system will devolve in corruption and monopoly. No regulations can survive the unavailable regulatory capture and corruption.

Cryophilia@lemmy.world on 21 Jul 2024 04:50 collapse

No system is perfect. All systems require some form of keeping power from accruing to the few.

hglman@lemmy.ml on 21 Jul 2024 13:08 collapse

Yes, very insightful.

Aatube@kbin.melroy.org on 20 Jul 2024 21:22 next collapse

would you like an introduction to the almighty red rose?

Angry_Autist@lemmy.world on 20 Jul 2024 21:27 collapse

I know you are trying to be clever but I’m not really in a clever mood rn.

barsoap@lemm.ee on 20 Jul 2024 21:32 next collapse

You call it unregulated, but that is the natural trend for when the only acceptable goal is the greater accumulation of wealth.

Nah unregulated is the exact right word and that isn’t the kind of neolib you’re out for. Those would use “free” instead of unregulated, deliberately confusing unregulated markets with the theoretical model of the free market which allocates resources perfectly – if everyone is perfectly rational and acts on perfect information. Which obviously is not the case in the real world because real-world.

There’s a strain of liberalism which is pretty much the cornerstone of Europe’s economical model, also, generally compatible with socdem approaches, and it says precisely that regulation should be used to bring the real-world market closer to that theoretical ideal – they’re of course not going all-out, you’d need to do stuff like outlaw trade secrets to actually do that, have all advertisement done by an equitable and accountable committee and shit. But by and large regulation does take the edge off capitalism. If you want to see actually unregulated capitalism, have a look at Mexican cartels. Rule of thumb: If you see some market failure, regulate it away. Like make producers of cereal pay for the disposal costs of the packaging they use and suddenly they have an interest in making that packaging more sensible, can’t externalise the cost any more.

Defeating capitalism ultimately is another fight altogether, it’s nothing less than defeating greed – as in not the acquisition of things, but getting addicted to the process of acquisition: The trouble isn’t that people want shit the problem is that they aren’t satisfied once they’ve got what they wanted. Humanity is going to take some more time to learn to not do that, culturally, (and before tankies come along nah look at how corrupt all those ML states were and are same problem different coat of paint), in the meantime regulation, rule of law, democracy, even representative democracy, checks and balances, all that stuff, is indeed a good idea.

gremllin@lemmy.world on 20 Jul 2024 22:26 collapse

If you start regulating capitalism, thats called something else. That would be saying that the markets can not regulate by themselves, and proving as a myth one of the basics of capitalism.

So I, as well, think capitalism is working as intended. and sure is based on greed.

Aatube@kbin.melroy.org on 20 Jul 2024 22:37 collapse

Something else, as in what? As long as the means of production is privately owned for profit, it's capitalism.

billwashere@lemmy.world on 20 Jul 2024 17:39 next collapse

Third parties being able to push updates to production machines without being tested first is giant red flag for me. We’re human … we fuck up. I understand that. But that’s why you test things first.

I don’t trust myself without double checking, so why would we completely trust a third party so completely.

[deleted] on 21 Jul 2024 20:16 collapse

Pika@sh.itjust.works on 21 Jul 2024 20:17 collapse

Yeah the fact that this company calls it feature that they can push an update anytime without site level intervention is scary to me. If they ever did get compromised boom every device running their program suddenly has a kernel level malware essentially overnight.

EleventhHour@lemmy.world on 19 Jul 2024 19:11 next collapse

d'00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000!

jj4211@lemmy.world on 20 Jul 2024 01:56 next collapse

Damnit you’re comment just crashed the rest of the computers that were still up.

VintageTech@sh.itjust.works on 20 Jul 2024 08:53 collapse

thank you for the visual representation ☺️

some_guy@lemmy.sdf.org on 19 Jul 2024 23:19 next collapse

If it had been all ones this could have been avoided.

jj4211@lemmy.world on 20 Jul 2024 01:54 collapse

Just needed to add 42k of ones to balance the data. Everyone knows that, like tires, you need to balance your data.

Killing_Spark@feddit.de on 20 Jul 2024 04:13 next collapse

Apply the ones in a star shape to distribute pressure evenly

rmuk@feddit.uk on 20 Jul 2024 10:08 next collapse

I mean, joking aside, isn’t that how parity calculations used to work? “Got more uppy bits than downy bits - that’s a paddlin’” or something.

echodot@feddit.uk on 20 Jul 2024 10:14 collapse

Assuming they were all calculations, which they won’t have been.

We will probably never know for sure, because the company will never actually release a postmortem, but I suspect that the file was essentially just treated as unreadable, and didn’t actually do anything. The problem will have been that important bits of code, that should have been in there, now no longer existed.

You would have thought they’d do some testing before releasing an update wouldn’t you. I’m sure their software developers have a bright future at Boeing ahead of them. Although in fairness to them, this will almost certainly have been a management decision.

werefreeatlast@lemmy.world on 20 Jul 2024 16:00 next collapse

I once had a very noisy computer… It was the disk! I just balanced the data and all problems went away.

xavier666@lemm.ee on 20 Jul 2024 21:37 collapse

Perfectly balanced

tofubl@discuss.tchncs.de on 20 Jul 2024 02:38 next collapse

This file compresses so well. 🤏

bruhduh@lemmy.world on 20 Jul 2024 04:38 collapse

filister@lemmy.world on 20 Jul 2024 13:02 next collapse

Imagine the world if those companies were using Atomic distribution and the only thing you would need to do is to boot the previous good image.

JasonDJ@lemmy.zip on 20 Jul 2024 14:37 collapse

ohno_anyway.png

cheers_queers@lemm.ee on 20 Jul 2024 13:45 next collapse

school districts were also affected… at least mine was.

Semi_Hemi_Demigod@lemmy.world on 20 Jul 2024 21:46 collapse

I can’t imagine how much worse this would have been for global GDP if schools had to be closed for it.

Socsa@sh.itjust.works on 20 Jul 2024 18:52 next collapse

The fact that a single bad file can cause a kernel panic like this tells you everything you need to know about using this kind of integrated security product. Crowdstrike is apparently a rootkit, and windows apparently has zero execution integrity.

BeardedGingerWonder@feddit.uk on 20 Jul 2024 21:05 next collapse

Does anything running on an x86 processor from the last decade?

areyouevenreal@lemm.ee on 20 Jul 2024 21:56 next collapse

Yeah pretty much all security products need kernel level access unfortunately. The Linux ones including crowdstrike and also the Open Source tools SELinux and AppArmor all need some kind of kernel module in order to work.

uis@lemm.ee on 20 Jul 2024 22:08 next collapse

At least SELinux doesn’t crash on bad config file

areyouevenreal@lemm.ee on 20 Jul 2024 22:22 collapse

I am not praising crowdstrike here. They fucked up big time. I am saying that the concept of security software needing kernel access isn’t that unheard of, and is unfortunately necessary for a reason. There is only so much a security thing can do without that kernel level access.

reddithalation@sopuli.xyz on 20 Jul 2024 23:58 collapse

crowdstrike has caused issues like this with linux systems in the past, but sounds like they have now moved to eBPF user mode by default (I don’t know enough about low level linux to understand that though haha), and it now can’t crash the whole computer. source

areyouevenreal@lemm.ee on 21 Jul 2024 00:27 collapse

As explained in that source eBPF code is still running in kernel space. The difference is it’s not turing complete and has protections in place to make sure it can’t do anything too nasty. That being said I am sure you could still break something like networking or critical services on the system by applying the wrong eBPF code. It’s on the authors of the software to make sure they thoroughly test and review their software prior to release if it’s designed to work with the kernel especially in enterprise environments. I am glad this is something they are doing though.

whoisearth@lemmy.ca on 20 Jul 2024 22:19 next collapse

I mean are we surprised by any of this?

GroundedGator@lemmy.world on 20 Jul 2024 22:41 next collapse

I don’t remember much about my OS courses from 20 years back, but I do recall something about walls between user space and kernel space. The fact that an update from the Internet could enter kernel space is insane to me.

Aatube@kbin.melroy.org on 20 Jul 2024 23:14 collapse

You mean you don’t update your kernel?

GroundedGator@lemmy.world on 20 Jul 2024 23:40 collapse

I do, but I do it on my terms when I know it is stable. I don’t allow anyone to push updates to my system.

Aatube@kbin.melroy.org on 20 Jul 2024 23:41 collapse

Agreed. Point is, I’m pretty sure programs in kernel space can still read stuff in user space, which can be easily updated.

OutsizedWalrus@lemmy.world on 20 Jul 2024 23:43 next collapse

I’m not sure why you think this statement is so profound.

CrowdStrike is expected to have kernel level access to operate correctly. Kernel level exceptions cause these types of errors.

Windows handles exceptions just fine when code is run in user space.

This is how nearly all computers operate.

hglman@lemmy.ml on 21 Jul 2024 00:17 next collapse

Tell me you don’t understand what it is by telling me you don’t understand what Crowdstrike does.

phx@lemmy.ca on 21 Jul 2024 00:34 next collapse

Security products of this nature need to be tight with the kernel in order to actually be effective (and prevent actual rootkits).

That said, the old mantra of “with great power” comes to mind…

arin@lemmy.world on 21 Jul 2024 04:57 collapse

with great power, don’t lay off the testing team (force return to office or get fired ultimatums)

tzrlk@lemmy.world on 21 Jul 2024 22:05 collapse

It’s fine, they’ve just switched to a crowd-sourced testing strategy.

Dark_Arc@social.packetloss.gg on 21 Jul 2024 04:18 collapse

This is a pretty hot take. A single bad file can topple pretty much any operating system depending on what the file is. That’s part of why it’s important to be able to detect file corruption in a mission critical system.

dgriffith@aussie.zone on 21 Jul 2024 09:17 collapse

This was a binary configuration file of some sort though?

Something along the lines of:

IF (config.parameter.read == garbage) {
     Dont_panic;
}

Would have helped greatly here.

Edit: oh it’s more like an unsigned binary blob that gets downloaded and directly executed. What could possibly go wrong with that approach?

Aatube@kbin.melroy.org on 21 Jul 2024 16:05 collapse

We agree, but they were responding to “windows apparently has zero execution integrity”.

yokonzo@lemmy.world on 21 Jul 2024 05:55 next collapse

I’m not a dev, but don’t they have like a/b updates or at least test their updates in a sandbox before releasing them?

thermal_shock@lemmy.world on 21 Jul 2024 06:50 next collapse

one would think. apparently the world is their sandbox.

kalleboo@lemmy.world on 21 Jul 2024 13:11 collapse

It could have been the release process itself that was bugged. The actual update that was supposed to go out was tested and worked, then the upload was corrupted/failed. They need to add tests on the actual released version instead of a local copy.

FiniteBanjo@lemmy.today on 21 Jul 2024 17:29 collapse

Could also be that the Windows versions they tested on weren’t as problematic as the updated drivers around the time they released.

[deleted] on 21 Jul 2024 06:21 collapse

Pika@sh.itjust.works on 21 Jul 2024 17:42 collapse

oh they’ll take it as a lesson all right, up until they get the quote to fix it suddenly the downtime becomes non-issue as long as it “doesn’t happen again”