ChatGPT Is Still a Bullshit Machine

ChatGPT Is Still a Bullshit Machine (gizmodo.com)
from chobeat@lemmy.ml to technology@lemmy.world on 09 Aug 10:01
https://lemmy.ml/post/34396335

#technology

threaded - newest

BallShapedMan@lemmy.world on 09 Aug 10:31 next collapse

A lady I can’t stand who is condescending to everyone like the worst version of a grade school teacher resigned recently so she can market her GPT “models” full time.

Last week she used “Salem” her GPT prompt BS to read through a detailed document my department put together to see what was missing. She shared her screen with obvious gpt slop and started pointing out all the things we didn’t answer. The lady on my team who put the very specific and detailed document together just started reading the answers right from the document that showed it was clearly and precisely answered. The lady I can’t stand stopped sharing her screen and quit talking.

The moral of the story, gpt did me a solid by convincing this person who’s clearly never heard of the Dunning Kruger effect that she needs to quit her well paying job and stop being a pain in my ass.

Thank you GPT!! Best thing is ever done for me.

jubilationtcornpone@sh.itjust.works on 09 Aug 11:49 collapse

Holy shit. I think you just found a valid use for LLM’s. OpenAI valuation intensifies

kescusay@lemmy.world on 09 Aug 10:43 next collapse

Software developer, here. (No, not a “vibe coder.” I actually know how to read and write my own code and what it does.)

Just had the opportunity to test GPT 5 as a coding assistant in Copilot for VS Code, which in my opinion is the only legitimately useful purpose for LLMs. (No, not to write everything for me, just to do some of the more tedious tasks faster.) The IDE itself can help keep them in line, because it detects when they screw up. Which is all the time, due to their nature. Even recent and relatively “good” models like Sonnet need constant babysitting.

GPT 5 failed spectacularly. So badly, in fact, that I’m glad I only set it to analysis tasks and not to any write tasks. I will not be using it for anything else any time soon.

Pechente@feddit.org on 09 Aug 11:13 next collapse

Yeah right? I tried it yesterday to build a simple form for me. Told it to look at the structure of other forms for reference which it did and somehow it used NONE of the UI components and helpers from the other forms. It was bafflingly bad

errer@lemmy.world on 09 Aug 14:06 collapse

Despite the “official” coding score for GPT5 being higher, Claude sonnet still seems to blow it out of the water. That seems to suggest they are training to the test and the test must not be a very good test. Or they are lying.

elvith@feddit.org on 09 Aug 14:14 next collapse

They’d never be lying! Look at these beautiful graphs from their presentation of GPT5. They’d never!

Source: theverge.com/…/openai-gpt-5-vibe-graphing-chart-c…

errer@lemmy.world on 09 Aug 14:18 collapse

Wut…did GPT5 evaluate itself?

elvith@feddit.org on 09 Aug 14:22 collapse

Now that we have vibe coding and all programmers have been sacked, theyr apparently trying out vibe presenting and vibe graphing. Management watch out, you’re obviously next!

jj4211@lemmy.world on 10 Aug 02:34 collapse

Problem with the “benchmarks” is Goodhart’s Law: one a measure becomes a target, it ceases to be a good measurement.

The AI companies obsession with these tests cause them to maniacly train on them, making then better at those tests, but that doesn’t necessarily map to actual real world usefulness. Occasionally you’ll see a guy that interviews well, but it’s petty useless in general on the job. LLMs are basically those all the time, but at least useful because they are cheap and fast enough to be worth it for super easy bits.

Passerby6497@lemmy.world on 09 Aug 12:31 next collapse

Yeah, LLMs are decent with coding tasks if you know what you’re doing and can properly guide it (and check it’s work!), but fuck if they don’t take a lot of effort to reign in. I will say they’re pretty damned good at debugging the shit I wrote. I’ve been working on an audit project for a few months and 4o/5 have helped me a good bit to find persistent errors in my execution logic that I just kept missing on rereads and debug runs.

But new generation is painful. I had 5 generate a new function for me yesterday to do some issues recon and report generation, and I spent 20 minutes going back and forth with it dropping fields in the output repeatedly. Even on 5, it still struggles at times to not give you the same wrong answer more than once, or just waffles between wrong answers at times.

webhead@lemmy.world on 09 Aug 15:39 next collapse

Dude forgetting stuff has to be one the most frustrating parts of the entire process . Like forgetting a column in a database or just an entire piece of a function you just pasted in… Or trying to change things you never asked it to touch. So freaking annoying. I had standing instructions in it’s memory to not leave out pieces or modify things I didn’t ask for and will put that stuff in the prompt and it just does not care lol.

I’ve used it a lot for coding because I’m not a real programmer (more a code hacker) and need to get things done for a website, but I know just enough to know it’s really stupid sometimes lol.

Passerby6497@lemmy.world on 09 Aug 16:12 collapse

Dude forgetting stuff has to be one the most frustrating parts of the entire process . Like forgetting a column in a database or just an entire piece of a function you just pasted in

It was actually worse. I was pulling data out of local logs and processing events. I asked to assess a couple columns that I was struggling to parse properly, and it got those ones in, but dropped some of my existing columns. I pointed out the error, it acknowledged the issue, then spat out code that reverted to the first output!

Though, that wasn’t nearly as bad as it telling me that a variable a couple hundred lines and multiple transformations in wasn’t being populated by an early variable, and I literally went in and just copied each declaration line and sent it back like I was smacking an intern on the nose or something…

For a bit designed to read and analyze text, it is surprisingly bad at the whole ‘reading’ aspect. But maybe that’s just how human like the intelligence is /s

Or trying to change things you never asked it to touch. So freaking annoying. I had standing instructions in it’s memory to not leave out pieces or modify things I didn’t ask for and will put that stuff in the prompt and it just does not care lol

OMFG this. I’ve had decent luck recently after setting up a project and explicitly laying out a number of global directives, because yeah, it was awful trying to figure out exactly what changed when I diff the input and output, and fucking everything is red because even the goddamned comments are changed. But even just trying to make it understand basic style requirements was a solid half hour of arguing with it (only partially because I forgot the proper names of casings) so it wouldn’t make me lint the whole goddamned script I just told it to analyze and fix one item.

webhead@lemmy.world on 09 Aug 17:51 collapse

Yessir I’ve basically run into all of that. It’s fucking infuriating. It really is like talking to a toddler at times. There seems to be a limit to the complexity of what it can process before it just starts messing everything up. Like once you hit its limit, it will not process the entire thing no matter how many times you fix it together like your example. You fix one problem and then it just forgets a different piece. FFFFFFFFFF.

Badabinski@kbin.earth on 09 Aug 16:40 next collapse

Out of curiosity, do you feel that you would have been able to write that new function without an LLM in less time than you spent fighting GPT5?

Passerby6497@lemmy.world on 09 Aug 18:43 collapse

I could definitely write it, but probably not as fast, even with fighting it. The report I got in 25-30 minutes would normally take me closer to 45-60, with having to research what to analyze, figure out how to parse different format of logs and break up and collate them and give a pretty output.

SpaceCadet@feddit.nl on 10 Aug 22:52 collapse

LLMs are decent with coding tasks if you know what you’re doing

Only if the thing you are trying to do is commonly used and well documented, but in that case you could just read the documentation instead and learn a thing yourself, right?

The other day I tried to get some instructions on how to do something specific in a rather obscure and rather opaquely documented cli tool that I need for work. I couldn’t quite make sense of the documentation, and I found the program’s behavior a bit erratic, so that’s why I turned to AI. It cheerfully and confidently told me (I’m paraphrasing): oh to do “this specific thing” you have to use the –something-specific switch, and then it gave some command line examples using that switch that looked like they made complete sense.

So I thought: oh, did I overlook that switch? Could it be that easy? So I looked in the documentation and sure enough… the AI had been bullshitting me and that switch didn’t exist.

Then there was the time when I asked it to generate an ARM template (again, poorly documented bullshit) to create some service in Azure with some specific parameters. It gave me something that looked like an ARM template, but sure as hell wasn’t a valid one. This one wasn’t completely useless though, at least I was able to cross reference with an existing template and with some trial-and-error, I was able to copy over some of the elements that I needed.

ThePowerOfGeek@lemmy.world on 09 Aug 14:11 next collapse

I’m no longer even confident in modern LLMs to do stuff like convert a table schema or JSON document into a POCO. I tried this the other day with a field list from a table creation script. So it had to do was reformat the fields into a dumb C# model. Inexplicably it did fine except for omitting a random field in the middle of the list. Kinda shakes your confidence in LLMs for even the most basic programming tasks.

kescusay@lemmy.world on 09 Aug 15:13 collapse

More and more, for tasks like that I simply will not use an LLM at all. I’ll use a nice, predictable, deterministic script. Weirdly, LLMs are pretty decent at writing those.

brucethemoose@lemmy.world on 09 Aug 14:21 next collapse

Have you given Qwen or GLM 4.5 a shot?

kescusay@lemmy.world on 09 Aug 15:12 collapse

Not yet. I’ll give them a shot if they promise never to say “you’re absolutely correct” or give me un-requested summaries about how awesome they are in the middle of an unfinished task.

Actually, I have to give GPT 5 credit on one thing: It’s actually sort of paying attention to the copilot-instructions.md file, because I put this snippet in it: “You don’t celebrate half-finished features, and your summaries of what you’ve accomplished are not only rare, they’re never more than five sentences long. You just get straight to the point.” And - surprise, surprise - it has strictly followed that instruction.

Fucks up everything else, though.

LiamMayfair@lemmy.sdf.org on 09 Aug 14:21 next collapse

I tried GPT-5 to write some code the other day and was quite unimpressed with how lazy it is. For every single thing, it needed nudging. I’m going back to Sonnet and Gemini. And even so, you’re right. As it stands, LLMs are useful at refactoring and writing boilerplate and repetitive code, which does save time. But they’re definitely shit at actually solving non-trivial problems in code and designing and planning implementation at a high level.

They’re basically a better IntelliSense and automated refactoring tool, but I wouldn’t trust them with proper software engineering tasks. All this vibe coding and especially agentic development bullshit people (mainly uneducated users and the AI vendors themselves) are shilling these days, I’m going nowhere near around.

I work in a professional software development team in a business that is pushing the AI coding stuff really hard. So many of my coworkers use agentic development tools routinely now to do most (if not all) of their work for them. And guess what, every other PR that goes in, random features that had been built and working are removed entirely, so then we have to do extra work to literally build things again that had been ripped out by one of these AI agents. smh

MagicShel@lemmy.zip on 09 Aug 15:12 collapse

In a similar situation. I’m even an AI proponent. I think it’s a great tool when used properly. I’ve had great success solving basically trivial problems with small scripts. And code review is helpful. Code complete is helpful. It makes me faster, but you have to know when and how to leverage it.

Even on tasks it isn’t good at, it often helps me frame my own thoughts. It can identify issues better than it can fix them. So if I say here is the current architecture, what is the best way to implement <feature> and explain why, it will give a plan. It may not be a great plan, but as it explains it, I can easily identify the stuff it has wrong. Sometimes it’s close to a workable plan. Other times it’s not. Other times it will confidently lead you down a rabbit hole. That’s the real time waster.

“Why won’t the context load for this unit test?”

You’re missing this annotation.

“Yeah that didn’t do it. What else.”

You need this plugin.

“Yeah it’s already there.”

You need this other annotation.

“Okay that got a different error message.”

You need another annotation

“That didn’t work either. You don’t actually know what the problem is do you?”

Sad computer beeps.

To just take the output and run with it is inviting disaster. It’ll bite you every time and the harder the code the worse it performs.

very_well_lost@lemmy.world on 09 Aug 17:40 collapse

This has been my experience as well, only the company I work for has mandated that we must use AI tools everyday (regardless of whether we want/need them) and is actively tracking our usage to make sure we comply.

My productivity has plummeted. The tool we use (Cursor) requires so much hand-holding that it’s like having a student dev with me at all times… only a real student would actually absorb information and learn over time, unlike this glorified Markov Chain. If I had a human junior dev, they could be a productive and semi-competent coder in 6 months. But 6 months from now, the LLM is still going to be making all of the same mistakes it is now.

It’s gotten to the point where I ask the LLM to solve a problem for me just so that I can hit the required usage metrics, but completely ignore its output. And it makes me die a little bit inside every time I consider how much water/energy I’m wasting for literally zero benefit.

MagicShel@lemmy.zip on 09 Aug 17:54 collapse

That sounds horrific. Maybe you can ask the AI to write a plugin that automatically invokes the AI in the background and throws away the result.

We are strongly encouraged to use the tools, and copilot review is automatic, but that’s it. I’m actually about to accept a leadership position in another AI heavy company and hopefully I can leverage that position to guide a sensible AI policy.

But at the heart of it, I need curious minds that want to learn. Give me those and I can build a strong team with or without AI. Without them, all the AI in the world won’t help.

Steve@startrek.website on 09 Aug 14:38 next collapse

Not 5 minutes ago I asked gpt5 how to go back to gpt-4o.

GPT5 was spitting out some strange bs for simple coding prompts that 4o handles well.

jjjalljs@ttrpg.network on 09 Aug 23:08 next collapse

Even when it gets it right, you have to then check it carefully. It feels like a net loss of speed most of the time. Reading and checking someone else’s code is harder than writing your own

TheFogan@programming.dev on 10 Aug 01:54 next collapse

have to agree on that, there’s the variation, it’s faster if you take it’s code verbatim, run it, and debug where there’s obvious problems… but then you are vulnerable to unobvious problems, when a hacky way of doing it is weak to certain edge cases… and no real way to do it.

Reading it’s code, understanding it, finding the problems from the core, sounds as time consuming as writing the code.

jj4211@lemmy.world on 10 Aug 02:24 collapse

On the code competition, I think it can do like 2 or 3 lines in particular scenarios. You have to have an instinct for “are the next three lines so blatantly obvious it is actually worth reading the suggestion, or just ignore it because I know it’s going to screw up without even looking”.

Very very very rarely do I find prompt driven coding to be useful, like very boilerplate but also very tedious. Like “show user to specify these three parametets in this cli utility”, and poof, you got a reasonable argv handling pretty reliably.

Rule of thumb is if a viable answer could be expected during an interview by a random junior code applicant, it’s worth giving the llm a shot. If it’s something that a junior developer could get right after learning on the job a bit, then forget it, the LLM will be useless.

takeda@lemmy.dbzer0.com on 10 Aug 19:31 collapse

Just had the opportunity to test GPT 5 as a coding assistant in Copilot for VS Code, which in my opinion is the only legitimately useful purpose for LLMs.

The best use of LLM sadly is to use it on social media to spread disinformation. At that point making shit up isn’t a big but a feature.

For coding I am still not sold on it. It seems to excel on tasks that were done millions of times like programming assignments at school, example/tutorial code, interview questions.

For me while it helps in stone cases, I still have to go over the code and understand it and very often it introduces subtle bugs or I can write a more concise chose that fits my need. In those cases all the advantages it did are nullified, I suspect it might actually be slowing me down.

It feels to me that LLM is a godsend to all the coders that previously copied code from stack overflow. It greatly streamlined the process and also included all code published on GitHub.

brsrklf@jlai.lu on 09 Aug 10:55 next collapse

Somehow the chatGPT US map generated a clean, correctly spelled French word for “Mondays” instead of Illinois.

Sorry, Illinois, Garfield hates your guts.

paraphrand@lemmy.world on 09 Aug 12:02 next collapse

So is SamA.

[deleted] on 10 Aug 02:42 collapse

galoisghost@aussie.zone on 09 Aug 12:26 next collapse

Such a punchable face

SugarCatDestroyer@lemmy.world on 09 Aug 14:52 collapse

There is a certain hypocrisy in this face… Or rather, it can be felt.

brucethemoose@lemmy.world on 09 Aug 14:21 next collapse

ChatGPT Is Still a Bullshit Machine

Just like Sam Altman.

I will add that they aren’t even tackling basic issues like the randomness of sampling; all OpenAI models still need a highish temperature. It’s like the poster child of corporate scaling vs actually innovating.

misterdoctor@lemmy.world on 09 Aug 15:03 next collapse

As a friend and AI defender recently put it

This feels like an oxymoron

A_norny_mousse@feddit.org on 09 Aug 15:18 next collapse

Sam Altman Is Still a Bullshit Machine

Just like all billionaire tech bros: totally removed from reality, living in their own bubbles of unimaginable wealth and loneliness. They convinced themselves that the constant grift is real and call it a utopia… please, someone take their dangerous toys (and most of their wealth) away. And since the government certainly isn’t going to step in

Mereo@lemmy.ca on 09 Aug 15:24 next collapse

I ran the tests with the thinking model. It got them right. For these kind of tasks, choosing the thinking model is key,

very_well_lost@lemmy.world on 09 Aug 17:45 collapse

the thinking model

Ugh… can we all just stop for a moment to acknowledge how obnoxious this branding is? They’ve already corrupted the term “AI” to the point of being completely meaningless, are they going to remove all meaning from the word “thinking” now too?

Lemminary@lemmy.world on 10 Aug 02:55 collapse

They’ve already corrupted the term “AI” to the point of being completely meaningless

Did they? Afaik, LLMs are an application of AI that falls under natural language processing. It’s like calling a rhombus a geometric shape because that’s what it is. And this usage goes back decades to, for example, A* pathfinding algorithms and hard-coded decision trees for NPCs.

E: Downvotes for what, disagreeing? At least point out why I’m wrong if you think you know better.

SpaceCadet@feddit.nl on 10 Aug 23:21 collapse

I think the problem stems from how LLMs are marketed to, and perceived by the public. They are not marketed as: this is a specific application of this-or-that AI or ML technology. They are marketed as “WE HAVE AI NOW!”, and the general public who is not familiar with AI/ML technologies equates this to AGI, because that’s what they know from the movies. The promotional imagery that some of these companies put out, with humanoid robots that look like they came straight out of Ex Machina doesn’t help either.

And sure enough, upon first contact, an LLM looks like a duck and quacks like a duck … so people assume it is a duck, but they don’t realize that it’s a cardboard model of a duck with a taperecorder inside that plays back quacking sounds.

Reverendender@sh.itjust.works on 09 Aug 16:44 next collapse

I tried ChatGPT-5 for the first time this morning. I asked it to help me create an RSS list for some world news articles. For context I have never used RSS before. That was 90 minutes of my life I’ll never get back. Also, you can no longer choose any other models except for 5 and the 5-thinking model. Access to 4o is gone.

PalmTreeIsBestTree@lemmy.world on 09 Aug 22:58 collapse

What did it do exactly? I haven’t kept up with AI because I’d rather not engage with it.

Reverendender@sh.itjust.works on 10 Aug 15:22 collapse

It was consistently wrong about how to go about setting up a group feed on every single website it suggested. And I started out trying to work on my iPad, and it kept telling me where to find things, and it was wrong, and how to do something, and it was wrong, and it kept telling me supposed desktop instructions, even after I told it I was on an iPad, (an iPad is somehow different it claimed). So I went to the desktop and it was exactly the same as the iPad, as o knew it would be, meaning its instructions were just wrong. When I called it out and I asked it on its final recommendation to be absolutely sure that the process it was going to tell me was correct and up-to-date, and to read the website first before it gave me the information; it lied to me and said it was and that it would check first, and then didn’t do what it said it was going to do. Plus about 70% of RSS links it gave me were bad. It was just a really frustrating experience. Essentially going from version 4o to version 5 was a step backwards in AI evolution from what I can tell things,

DeathToUS@lemmy.world on 09 Aug 18:04 next collapse

Just use DeepAI never bothered with ChatGPT because its so primitive same goes for Gemini.

Ghyste@sh.itjust.works on 09 Aug 19:23 next collapse

Always has been.

NotASharkInAManSuit@lemmy.world on 10 Aug 02:38 next collapse

That’s because they’re not trying to make AI, they are just programming LLMs to be bullshit confirmation bias machines. They don’t want to create intelligence, they want to create unregulated revenue streams.

miridius@lemmy.world on 10 Aug 18:38 next collapse

Nothing to see here, just another “token based LLMs can’t count letters in words” post

shane@feddit.nl on 10 Aug 20:39 collapse

That’s what I thought but there’s slightly more than that.

The writer tried to trick ChatGPT 5, saying Vermont has no R in it. ChatGPT did say “wait, it does”. But then when pushed it said, “oh right there is no R in Vermont”.

I mean… the inability to know what it knows or not is a real problem for most use cases…

jj4211@lemmy.world on 11 Aug 01:36 collapse

Yeah, the fact you can “gaslight” a chat is just as much of a symptom of a difficulty as the usual mistakes. It shows that it doesn’t deal with facts, but structurally sound content, which is correlated with facts, especially when the prompt has context/rag stuffing the prompt using more traditional approaches that actually will tend to get more factual stuff crammed in.

To all the people white knighting for the LLM, for the thousandth time, we know that it is useful, but it’s usefulness is only tenuously connected to the marketing reality. Making the mistake in counting letters is less important than the fact that it “acts” like it can when it can’t.

vane@lemmy.world on 10 Aug 18:42 next collapse

My kids will never be smarter than AI

Sam Altman

some_guy@lemmy.sdf.org on 10 Aug 19:44 next collapse

If you believe that Sam Altman is full of shit and the whole AI hype machine is a bubble (regardless of any real-world uses that do exist) built on lies about where this specific form of the technology can actually go, congratulations. You might enjoy listening to or reading the work of Ed Zitron. He has a podcast and a newsletter and he’s been pointing this out for over a year, among other topics.

ExLisper@lemmy.curiana.net on 10 Aug 21:56 collapse

One thing I didn’t expect in AI evolution was all the “safety” features. Basically since people are too stupid to use LLMs and not fall in love or poison themselves all models have more and more guardrails making it way less useful than normal search. I think it was fairly clear from the beginning that LLMs will always be bullshit machines but I didn’t think they will be less and less useful bullshit machines.