Our first outage from LLM-written code

Our first outage from LLM-written code (sketch.dev)
from skip0110@lemmy.zip to programming@programming.dev on 01 Aug 2025 02:00
https://lemmy.zip/post/45246325

cross-posted from: lemmy.bestiver.se/post/526967

Comments

#programming

threaded - newest

skip0110@lemmy.zip on 01 Aug 2025 02:02 next collapse

Why are we using tools that can’t parse the comment and code via syntax for refactoring?

spartanatreyu@programming.dev on 01 Aug 2025 03:53 collapse

The first problem is they’re letting AI touch their code.

The second problem is they’re relying on a human to pick up changes in moved code while using git’s built-in diff tools. There’s a whole bunch of studies that show how git’s diff algorithms are terrible, and how swapping to newer diff algos improves things considerably.

TL;DR on the studies:

Only supporting add/remove/move operations is really bad.
Adding syntax awareness to understand if differences in indentation should be brought to a reviewer’s attention, improves code and makes code reviews more accurate. (But this is hard because it’s language dependent)
Adding extra operations (indent/deindent/move/rename-symbol/comment/un-comment/etc…) makes code review easier, faster and more accurate. (But again, most of this requires syntax awareness.

There’s also a bunch of alternative diff algos you can use, but the best ones are paid, and the free ones have fewer features. See:

deegeese@lemmy.dbzer0.com on 01 Aug 2025 02:08 next collapse

I gasped when I saw this:

A bit of discussion indicated that the trigger for the CPU spikes both times was our CEO logging in. We re-deployed to get a clean start, permanently banned him from the service, and moved on.

This is like finding a live grenade under your bed and putting it under the rug.

They found a way to reproduce a system killing bug, and instead of taking the time to understand it, they threw away their test case.

BlazeDaley@lemmy.world on 01 Aug 2025 03:20 next collapse

They contained the impact. Root causing or “understanding” should come after impact mitigation. If needed find a safe way to reproduce the bug without customer impact.

We reverted the refactoring, deployed, un-banned the CEO, and set about analysis.

FizzyOrange@programming.dev on 01 Aug 2025 06:48 collapse

Yeah me too but if you keep reading they didn’t actually “move on” in the way that it sounds.

vhstape@lemmy.sdf.org on 01 Aug 2025 03:03 next collapse

Well done. More and more companies are deploying LLM-written code in production environments. Might as well be honest about the results so we can learn what does and doesn’t work.

bookmeat@lemmynsfw.com on 01 Aug 2025 05:25 collapse

It’s obvious that the LLM didn’t understand the code at all. It chose to refactor the way it did because of a silly comment.

Awkwardparticle@programming.dev on 01 Aug 2025 21:11 collapse

It’s an inference model. It does not understand code no matter how much context it has. It can however output the most probable solution based on the context it has.