We finally know what caused the global tech outage - and how much it cost (www.cnn.com)
from WhatsHerBucket@lemmy.world to technology@lemmy.world on 24 Jul 2024 16:55
https://lemmy.world/post/17913749

#technology

threaded - newest

Varyk@sh.itjust.works on 24 Jul 2024 17:27 next collapse

And the stockades?

Any word on the stockades?

Bishma@discuss.tchncs.de on 24 Jul 2024 17:36 collapse

George Kurtz has only crashed the world twice so he has one strike to go, I guess.

c0smokram3r@midwest.social on 24 Jul 2024 17:41 next collapse

Wowowow! This is insane! 😨🤯

Blue_Morpho@lemmy.world on 24 Jul 2024 17:58 collapse

You can only fail upwards at the executive level. He went from CTO to CEO on his last global crash. What’s next? Running for President?

No risk, All rewards.

OsaErisXero@kbin.run on 24 Jul 2024 18:26 collapse

Please no

dditty@lemm.ee on 24 Jul 2024 17:46 next collapse

$5.4 Bn so far, not including lost worker productivity or damage to brand reputations, so that’s a very conservative estimate. And Cybersecurity insurance will supposedly only cover up to 20% of that (but good luck getting even that much). What a clusterf***

Empricorn@feddit.nl on 24 Jul 2024 17:59 collapse

And that $5,400,000,000 loss estimate is only Fortune 500 companies!

11111one11111@lemmy.world on 25 Jul 2024 18:23 collapse

No it’s all of them because all the companies combined out side of the 500 wouldn’t even have enough net worth large enough to move the needle. So technically they may not be included but would be covered by whatever amount they rounded up to make the even 5.4b

Empricorn@feddit.nl on 26 Jul 2024 01:18 collapse

All the CrowdStrike companies on earth minus the 500 biggest (American) ones? I have a hard time believing it’s as insignificant as you assume. I guess we’ll see…

flatlined@lemmy.dbzer0.com on 26 Jul 2024 10:48 collapse

It’s a variation on the old saw of “how much is the difference between a million and a billion? About a billion”. Once numbers become so big, it’s hard to grasp the relative sizes. That said, I’m also interested in a more comprehensive breakdown. Seeing who are impacted, how much and where.

11111one11111@lemmy.world on 27 Jul 2024 01:44 collapse

100% correct. I wasn’t implying that I knew the figures just that the size of the Fortune 500 is used as an economic index for this reason.

[deleted] on 24 Jul 2024 17:49 next collapse

.

0x0@programming.dev on 24 Jul 2024 17:53 next collapse

On Wednesday, CrowdStrike released a report outlining the initial results of its investigation into the incident, which involved a file that helps CrowdStrike’s security platform look for signs of malicious hacking on customer devices.

The company routinely tests its software updates before pushing them out to customers, CrowdStrike said in the report. But on July 19, a bug in CrowdStrike’s cloud-based testing system — specifically, the part that runs validation checks on new updates prior to release — ended up allowing the software to be pushed out “despite containing problematic content data.”

…

When Windows devices using CrowdStrike’s cybersecurity tools tried to access the flawed file, it caused an “out-of-bounds memory read” that “could not be gracefully handled, resulting in a Windows operating system crash,” CrowdStrike said.

Couldn’t it, though? 🤔

And CrowdStrike said it also plans to move to a staggered approach to releasing content updates so that not everyone receives the same update at once, and to give customers more fine-grained control over when the updates are installed.

I thought they were already supposed to be doing this?

Plopp@lemmy.world on 25 Jul 2024 06:30 next collapse

Couldn’t it, though? 🤔

IANAD and AFAIU, not in kernel mode. Things like trying to read non existing memory in kernel mode are supposed to crash the system because continuing could be worse.

0x0@programming.dev on 25 Jul 2024 16:32 collapse

I.meant couldn’t they test for a NULL pointer.

chaospatterns@lemmy.world on 25 Jul 2024 18:26 collapse

They could and clearly they should have done that but hindsight is 20/20. Software is complex and there’s a lot of places that invalid data could come in.

whatwhatwhatwhat@lemmy.world on 25 Jul 2024 15:18 next collapse

The fact that they weren’t already doing staggered releases is mind-boggling. I work for a company with a minuscule fraction of CrowdStrike’s user base / value, and even we do staggered releases.

foggenbooty@lemmy.world on 25 Jul 2024 19:26 collapse

They do have staggered releases, but it’s a bit more complicated. The client that you run does have versioning and you can choose to lag behind the current build, but this was a bad definition update. Most people want the latest definition to protect themselves from zero days. The whole thing is complicated and a but wonky, but the real issue here is cloudflare’s kernel driver not validating the content of the definition before loading it.

whatwhatwhatwhat@lemmy.world on 26 Jul 2024 05:36 collapse

Makes sense that it was a definitions update that caused this, and I get why that’s not something you’d want to lag behind on like you could with the agent. (Putting aside that one of the selling points of next-gen AV/EDR tools is that they’re less reliant on definitions updates compared to traditional AV.) It’s just a bit wild that there isn’t more testing in place.

It’s like we’re always walking this fine line between “security at all costs” vs “stability, convenience, etc”. By pushing definitions as quickly as possible, you improve security, but you’re taking some level of risk too. In some alternate universe, CS didn’t push definitions quickly enough, and a bunch of companies got hit with a zero-day. I’d say it’s an impossible situation sometimes, but if I had to choose between outage or data breach, I’m choosing outage every time.

cheddar@programming.dev on 26 Jul 2024 06:15 next collapse

The company routinely tests its software updates before pushing them out to customers, CrowdStrike said in the report. But on July 19, a bug in CrowdStrike’s cloud-based testing system — specifically, the part that runs validation checks on new updates prior to release — ended up allowing the software to be pushed out “despite containing problematic content data.”

It is time to write tests for tests!

Passerby6497@lemmy.world on 26 Jul 2024 11:48 collapse

My thoughts are to have a set of machines that have to run the update for a while, and if any single machine doesn’t pass and all allow it to move forward, it halts any further rollout.

AA5B@lemmy.world on 26 Jul 2024 11:09 collapse

a bug in CrowdStrike’s cloud-based testing system

Always blame the tests. There are so many dark patterns in this industry including blaming qa for being the last group to touch a release, that I never believe “it’s the tests”.

There’s usually something more systemic going on where something like this is missed by project management and developers, or maybe they have a blind spot that it will never happen, or maybe there’s a lack of communication or planning, or maybe they outsourced testing to the cheapest offshore providers, or maybe everyone has huge time pressure, but “it’s the tests”

Ok, maybe I’m not impartial, but when I’m doing a root cause on how something like this got out, my employer expects a better answer than “it’s the tests”

aStonedSanta@lemm.ee on 26 Jul 2024 11:47 collapse

There was probably one dude at CrowdStrike going. Uh hey guys??? 😆

Imgonnatrythis@sh.itjust.works on 24 Jul 2024 20:59 next collapse

“CrowdStrike said it also plans to move to a staggered approach to releasing content updates so that not everyone receives the same update at once, and to give customers more fine-grained control over when the updates are installed.”

Hol up. So they like still get to exist? Microsoft and affected industries just gonna kinda move past this?

BakerBagel@midwest.social on 24 Jul 2024 21:08 next collapse

Haven’t seen anything from the affected major players. Obviously Crowdstrike isn’t going to say they are fucked long term, they have to act like this is just a little hiccup and move on. Lawsuits are absolutely incoming

Ledivin@lemmy.world on 24 Jul 2024 23:30 next collapse

We’ll see how fucked they are from SLA breaches/etc., and then we’ll see how many companies jump ship to an alternative. We won’t have the real fallout from this event for months or years.

LodeMike@lemmy.today on 25 Jul 2024 07:23 next collapse

Companies using CrowdStrike and Windows aren’t really the type to be active about this sort of thing.

11111one11111@lemmy.world on 25 Jul 2024 18:19 collapse

What do you mean by this?

LodeMike@lemmy.today on 25 Jul 2024 18:57 collapse

The companies who use CrowdStrike (lazy fix) on Windows (garbage OS) aren’t really the type to want to switch away from it (will take effort)

best_username_ever@sh.itjust.works on 26 Jul 2024 10:49 collapse

I don’t understand the downvotes. You’re right on all points. If the task is too big, it can take years from testing another solution to using it for real.

Modern_medicine_isnt@lemmy.world on 25 Jul 2024 17:01 next collapse

Newsflash, Solarwinds still exists too. Not sure I could name a company that screwed up so big and actually paid the price.

Imgonnatrythis@sh.itjust.works on 25 Jul 2024 17:14 next collapse

Yeah, what was I thinking. United airlines was bankrupt and literally beating people up on their planes and still got taxpayer payouts and is around paying investors divends still today.

TheLimiter@lemmy.world on 26 Jul 2024 11:21 collapse

Two days ago my company sent out an all hands email that we’re going company wide with Crowdstrike.

JasonDJ@lemmy.zip on 26 Jul 2024 11:54 collapse

Nows the time to sign up. They’ll slash prices and hopefully never fuck up this bad again.

Have we had a XaaS fuck up real, real bad, twice, yet?

JasonDJ@lemmy.zip on 26 Jul 2024 11:55 collapse

I wasn’t effected but I bet a lot of admins, as pissed as they were, were thinking “I could easily fuck up this bad or worse”.

jeeva@lemmy.world on 26 Jul 2024 16:51 collapse

Yeah, what’s the jokey parable thing?

A CTO is at lunch when a call comes in. There’s been a huge outage, caused by a low level employee pressing the wrong button.
“Damn, you going to fire that guy?”
“Hell no, do you know how much I just spent on training him to never do that again?”

(</Blah>)

essteeyou@lemmy.world on 25 Jul 2024 00:14 next collapse

Oh, finally, I have been waiting for so long.

Semi_Hemi_Demigod@lemmy.world on 25 Jul 2024 19:38 next collapse

For the rest of history this sort of thing will mention Crowdstrike, or it might even be called a “crowdstrike.”

You can’t buy that kind of marketing

Wispy2891@lemmy.world on 26 Jul 2024 06:34 next collapse

This crowdstrike stuff seems an expensive subscription

I saw a lot of photos of crashed ad screens.

Why the hell are corps paying this much money for windows+cloudstrike for a glorified digital picture frame?? Wouldn’t be 100x cheaper to do it with some embedded stuff instead of having a full desktop computer running a full desktop os???

sugar_in_your_tea@sh.itjust.works on 26 Jul 2024 13:46 collapse

Yeah, an RPi or similar with a screen would be more than plenty for this, and the Pi Zero is really small. Connect that to a central Linux server with a hot backup or two (through local DNS) and you’ll have a hard time crashing it.

[deleted] on 26 Jul 2024 06:52 next collapse

.

unexpectedteapot@lemmy.ml on 26 Jul 2024 10:07 next collapse

Do we actually know? We might know that Crowdstrike was the cause but we don’t actually know what went wrong and how it happened. It is an unfree proprietary closed source software, we just have to take their word for it, which for all purposes is PR in line with the fact that it is coming from a profit-driven organisation.

lightsblinken@lemmy.world on 26 Jul 2024 14:27 collapse

this is exactly the question that needs answering… the PIR is bullshit

riodoro1@lemmy.world on 26 Jul 2024 10:22 next collapse

Ok. Can we get a solar storm next? I want linux servers out this time too.

sugar_in_your_tea@sh.itjust.works on 26 Jul 2024 13:44 collapse

Best I can do is an xz vuln where half the Linux servers go down for maintenance.

JasonDJ@lemmy.zip on 26 Jul 2024 11:37 next collapse

Pretty soon we are gonna have to start deciding if it’s safer for enterprise computers to run without AV or AMP.

bigFab@lemmy.world on 26 Jul 2024 20:27 collapse

Beautiful