rizzothesmall@sh.itjust.works
on 12 Jul 17:10
nextcollapse
Ai-only vibe coders. As a development manager I can tell you that AI-augmented actual developers who know how to write software and what good and bad code looks like are unquestionably faster. GitHub Copilot makes creating a suite of unit tests and documentation for a class take very little time.
sturlabragason@lemmy.world
on 12 Jul 17:44
nextcollapse
I read the article (not the study only the abstract) and they were getting paid an hourly rate. It did not mention anything about whether or not they had expirence in using llms to code. I feel there is a sweet spot, has to do with context window size etc.
I was not consistently better a year and a half ago but now i know the limits caveats and methods.
I think this is a very difficult thing to quantify but haters gonna latch on to this, same as the study that said “ai makes you stupid” and “llms cant reason”… its a cool tool that has limits.
One interesting feature in this paper is that the programmers who used LLMs thought they were faster, they estimated it was saving about 20% of the time it would have taken without LLMs. I think that’s a clear sign that you shouldn’t trust your gut about how much time LLMs save you, you should definitely try to measure it.
sturlabragason@lemmy.world
on 12 Jul 19:55
collapse
The study did find a correlation between prior experience and performance. One of the developers who showed a positive speedup with AI was the one with the most previous experience using Cursor (over 50 hours).
rizzothesmall@sh.itjust.works
on 12 Jul 17:49
nextcollapse
I did, thank you. Terms therein like “they spend more time prompting the AI” genuinely do not apply to a code copilot, like the one provided by GitHub, because it infers its prompt based on what you’re doing and the context of the file and application and creates an autocomplete based on its chat completion, which you can accept or ignore like any autocomplete.
You can start writing test templates and it will fill them out for you, and then write the next tests based on the inputs of your methods and the imports in the test class.
You can write a whole class without any copilot usage and then start writing the xmldocs and it will autocomplete them for you based on work you already did.
Try it for yourself if you haven’t already, it’s pretty useful.
We do not provide evidence that: AI systems do not currently speed up many or most software developers. We do not claim that our developers or repositories represent a majority or plurality of software development work.
The research shows that under their tested scenario and assumptions, devs were less productive.
The takeaway from this study is to measure and benchmark what’s important to your team. However many development teams have been doing that, albeit not in a formal study format, and finding AI improves productivity. It is not (only) “vibe productivity”.
And certainly I agree with the person you replied to: anecdotally, AI makes my devs more productive by cutting out the most grindy parts, like writing mocks for tests or getting that last missing coverage corner. So we have some measuring and validation to do.
The research explicitly showed that the anecdotes were flawed, and that actual measured productivity was the inverse of what the users imagined. That’s the entire point. You’re just saying “nuh uh, muh anecdotes.”
I said it needs to be measured. But few teams are going to do that, they’re building products not case studies.
This study is catnip for the people who put “AI” in scare quotes and expect those of us who use it to suddenly realize that we’ve only been generating hallucination slop. This has not been the lived experience of those of us in software development. In my own case I’ve seen teams stop hiring because they are getting the same amount of work done in less time. But those are anecdotes, so it doesn’t count.
We do measure metrics around sprint productivity, code quality, reworks, and more. However, we do not build case studies for clickbait articles. This isn’t a productive conversation as you just want to ridicule those using AI. Blocked.
monkeyman512@lemmy.world
on 12 Jul 17:52
nextcollapse
It would be interesting to see another study focusing on cognitive load. Maybe the AI let’s you offload some amount of thinking so you reserve that energy for things it’s bad at. But I could see how that would potentially be a wash as you need to clearly specify your requirements in the prompt, which is a different cognitive load.
I seem to recall a separate study showing that it just encouraged users to think more lazily, not more critically.
daniskarma@lemmy.dbzer0.com
on 12 Jul 18:00
nextcollapse
The study was centered on bugfixing large established projects. This task is not really the one that AI helpers excel at.
Also small number of participants (16) , the participants were familiar with the code base and all tasks seems to be smaller in completion time can screw results.
Thus the divergence between studio results and many people personal experience that would experience increase of productivity because they are doing different tasks in a different scenario.
6nk06@sh.itjust.works
on 12 Jul 18:45
nextcollapse
The study was centered on bugfixing large established projects. This task is not really the one that AI helpers excel at.
“AI is good for Hello World projects written in javascript.”
Managers will still fire real engineers though.
daniskarma@lemmy.dbzer0.com
on 12 Jul 18:50
collapse
I find it more useful doing large language transformations and delving into unknown patterns, languages or environments.
If I know a source head to toe, and I’m proficient with that environment, it’s going to offer little help. Specially if it’s a highly specialized problem.
Since SVB crash there have been firings left and right. I suspect AI is only an excuse for them.
Same experience here, performance is mediocre at best on an established code base. Recall tends to drop sharply as the context expands leading to a lot of errors.
I’ve found coding agents to be great at bootstrapping projects on popular stacks, but once you reach a certain size it’s better to either make it work on isolated files, or code manually and rely on the auto complete.
justastranger@sh.itjust.works
on 13 Jul 11:57
collapse
So far I’ve only found it useful when describing bite-sized tasks in order to get suggestions on which functions are useful from the library/API I’m using. And only when those functions have documentation available on the Internet.
Call me crazy but I think developers should understand what they’re working on, and using LLM tools doesn’t provide a shortcut there.
daniskarma@lemmy.dbzer0.com
on 12 Jul 19:07
collapse
You have to get familiar with the codebase at some point. When you are unfamiliar, in my experience, LLMs can provide help understanding it.
Copying large portions of code you don’t really understand and asking for an analysis and explanation.
Not so far ago I used it on assembly code. It would have taken ages to decipher what it was doing by myself. The AI sped up the process.
But once you are very familiar with a established project you had work a lot with, I don’t even bother asking LLMs anything, as in my experience, I come up with better answers quicker.
At the end of the day we must understand that a LLM is more or less an statistical autocomplete trained on a large dataset. If your solution is not on the dataset the thing is not going to really came up with a creative solution. And the thing is not going to run a debugger on your code either, afaik.
When I use it the question I ask myself the most before bothering is “is the solution likely to be on the training dataset?” or “is it a task that can be solved as a language problem?”
resipsaloquitur@lemmy.world
on 12 Jul 18:35
nextcollapse
Someone told me the best use of AI was writing unit tests and I died on the inside.
FizzyOrange@programming.dev
on 13 Jul 12:28
collapse
Why? That is a great use for AI. I’m guessing you are imagining that people are just blindly asking for unit tests and not even reading the results? Obviously don’t do that.
resipsaloquitur@lemmy.world
on 13 Jul 14:24
collapse
Of course that’s what they’re doing. That’s the whole point. Generate a bunch of plausible-looking BS and move on.
Writing one UT (actually writing, not pressing tab) gives you ideas for other tests.
And unit tests are not some boring chore. When doing TDD, they help inform and guide the design. If the LLM is doing that thinking for you, too, you’re just flying blind. “Yeah, that looks about right.”
Can’t wait for this shit to show up in medical devices.
FizzyOrange@programming.dev
on 13 Jul 14:32
nextcollapse
Damn there are so many AI critics who have clearly not seriously tried it. It’s like the smartphone naysayers of 2007 but much much worse.
resipsaloquitur@lemmy.world
on 13 Jul 14:36
collapse
That is like claiming people are directly copying from university books and implementing whatever they get without checking.
Of course there are nitwits like that, but they are few and far in between.
Anyone seriously using LLM prompts double checks their work.
resipsaloquitur@lemmy.world
on 14 Jul 13:37
collapse
I’ve caught “professionals” pasting code from from forums and StackOverflow. Of course people are just blindly using LLMs the same way. Incredibly naive to think people aren’t already and won’t do so more in the future.
I feel this – we had a junior dev on our project who started using AI for coding, without management approval BTW (it was a small company and we didn’t yet have a policy specifically for it. Alas.)
I got the fun task, months later, of going through an entire component that I’m almost certain was ‘vibe coded’ – it “worked” the first time the main APIs were called, but leaked and crashed on subsequent calls. It used double- and even triple-pointers to data structures, which the API vendor’s documentation upon some casual reading indicated could all be declared statically and re-used (this was an embedded system); needless arguments; mallocs and frees everywhere for no good reason (again due to all of the un-needed dynamic storage involving said double/triple pointers to stuff). It was a horrible mess.
It should have never gotten through code review, but the senior devs were themselves overloaded with work (another, separate problem) …
I took two days and cleaned it all up, much simpler, no mem leaks, and could actually be, you know, used more than once.
Fucking mess, and LLMs (don’t call it “AI”) just allow those who are lazy and/or inexperienced to skate through short-term tasks, leaving huge technical debt for those that have to clean up after.
If you’re doing job interviews, ensure the interviewee is not connected to LLMs in any way and make them do the code themselves. No exceptions. Consider blocking LLMs from your corp network as well and ban locally-installed things like Ollama.
It should have never gotten through code review, but the senior devs were themselves overloaded with work
Ngl, as much as I dislike AI, I think this is really the bigger issue. Hiring a junior and then merging his contributions without code reviewing is a disaster waiting to happen with or without AI.
Ek-Hou-Van-Braai@piefed.social
on 12 Jul 19:02
nextcollapse
I wish AI was never invented, but surely this isn't ture.
I've been able to solve coding issues that usually took me hours in minutes.
Wish it wasn't so, but it's been my reality
Traister101@lemmy.today
on 12 Jul 19:22
nextcollapse
LLMs making you code faster means your slow not LLMs fast
Ek-Hou-Van-Braai@piefed.social
on 12 Jul 21:41
collapse
I doubt anyone can write complex regex in ~30 seconds, LLM's can
Clent@lemmy.dbzer0.com
on 12 Jul 22:41
nextcollapse
You have definitely never worked with a regex guru.
Ek-Hou-Van-Braai@piefed.social
on 13 Jul 06:56
collapse
No, but not everyone is a regex guru.
If AI can write code half as good and fast as a regex guru, it's going to increase the average dev's productivity a lot
vrighter@discuss.tchncs.de
on 13 Jul 12:01
nextcollapse
does the regex search for what you wanted to? Does it work in all cases? Can I be confident that it will find all instances i care about, or will I still have to comb the code manually?
zygo_histo_morpheus@programming.dev
on 14 Jul 07:29
collapse
If you find yourself writing regexes often enough that speeding up that process would increase your productivity by “a lot”, then you should get good at writing them yourself which means practicing without an LLM. If its something that you don’t do often enough to warrant getting good at then the productivity increase is negligible.
I think the main benefit here isn’t really productivity but developer comfort by saving them from having to step out of their comfort zone.
staircase@programming.dev
on 13 Jul 01:45
nextcollapse
I’m not trusting a regex written by AI
Ek-Hou-Van-Braai@piefed.social
on 13 Jul 06:55
nextcollapse
That's why you write tests
vrighter@discuss.tchncs.de
on 13 Jul 11:59
collapse
tests can never prove correctness of code. All they can prove is “the thing hasn’t failed yet”. Proper reasoning is always needed if you want a guarantee.
If you had the llm write the regex for you, I can practically guarantee that you won’t think of, and write tests for, all the edge cases.
FizzyOrange@programming.dev
on 13 Jul 20:31
collapse
Proper reasoning is always needed if you want a guarantee.
No, which is why I avoid regexes for most production code and also why I would never use one written by a pathological liar and always guessing coder like an LLM.
LLM is great when you’re coding in a pure fictional programming language like elm and are using loss of custom types to make impossible states unrepresentable, and the function you’re writing could have been derived by the Haskell compiler, so mathematically the only possible way you could write it wrong is to use the wrong constructor, then it’s usually right and when it’s wrong either it doesn’t compile or you can see it’s chosen the wrong path.
The rest of the time it will make shit up and when you challenge it, out will happily rewrite it for you, but there’s no particular reason why it wouldn’t make up more nonsense.
Regexes are far easier to write than to debug, which is exactly why they’re poison for a maintainable code base and a really bad use case for an LLM.
I also wouldn’t use an LLM for languages in which there are lots and lots of ways to go wrong. That’s exactly when you need an experienced developer, not someone who guesses based on what they read online and no understanding, never learning anything, because, my young padawan, that’s exactly what an LLM is, every day.
Watch your LLM like a hawk.
FizzyOrange@programming.dev
on 13 Jul 12:31
collapse
You don’t have to. You can read it.
vrighter@discuss.tchncs.de
on 13 Jul 08:09
nextcollapse
yes they can. I regularly do. Regexes aren’t hard to write, their logic is quite simple. They’re hard to read, yes, but they are almost always one-offs (ex, substitutions in nvim).
FizzyOrange@programming.dev
on 13 Jul 12:30
collapse
Regexes aren’t hard to write, their logic is quite simple.
He did say complex regex. A complex regex is not simple.
vrighter@discuss.tchncs.de
on 13 Jul 13:26
collapse
yes, “complex” regexes are quite simple too. Complex regexes are long, not difficult. They appear complex because you have to “inline” everything. They really are not that hard.
FizzyOrange@programming.dev
on 13 Jul 14:33
collapse
This is stupid pedantry. By that logic literally nothing is complex because everything is made up of simple parts.
vrighter@discuss.tchncs.de
on 13 Jul 14:49
collapse
cryptic != complex. Are they cryptic? yes. Are they complex? not really, if you can understand “one or more” or “zero or more” and some other really simple concepts like “one of these” or “not one of these” or “this is optional”. You could explain these to a child. It’s only because they look cryptic that people think they are complex. Unless you start using backreferences and advanced concepts like those (which are not usually needed in most cases) they are very simple. long != complex
FizzyOrange@programming.dev
on 13 Jul 17:13
collapse
Ok I can see you haven’t actually come across any complex regexes yet…
(Which is probably a good thing tbh - if you’re writing complex regexes you’re doing it wrong.)
vrighter@discuss.tchncs.de
on 13 Jul 17:18
collapse
i haven’t come across many. But i have written a lot.
Depends on what you need to match. Regex is just another programming language. It’s more declarative than traditional languages though (it’s basically pattern matching).
Pattern matching is something I already do a lot of in my code, so regexes aren’t that much different.
Regardless, the syntax sucks. It takes some time to get familiar with it, but once you get past that, it’s really simple.
The experienced developers in the study believed they were 20% faster. There’s a chance you also measured your efficiency more subjectively than you think you did.
I suspect that unless you were considerably more rigorous in testing your efficiency than they were, you might just be in a time flies when you’re having fun kind of situation.
Reading the paper, AI did a lot better than I would expect. It showed experienced devs working on a familiar code base got 19% slower.
It’s telling that they thought they had been more productive, but the result was not that bad tbh.
I wish we had similar research for experienced devs on unfamiliar code bases, or for inexperienced devs, but those would probably be much harder to measure.
staircase@programming.dev
on 13 Jul 01:43
nextcollapse
I don’t understand your point. How is it good that the developers thought they were faster? Does that imply anything at all in LLMs’ favour? IMO that makes the situation worse because we’re not only fighting inefficiency, but delusion.
20% slower is substantial. Imagine the effect on the economy if 20% of all output was discarded (or more accurately, spent using electricity).
I’m not saying it’s good, I’m saying I expected it to be even worse.
FizzyOrange@programming.dev
on 13 Jul 20:30
collapse
Does that imply anything at all in LLMs’ favour?
Yes it suggest lower cognitive load.
vrighter@discuss.tchncs.de
on 13 Jul 07:29
collapse
1% slowdown is pretty bad. You’d still do better just not using it. 19% is huge!
_cnt0@sh.itjust.works
on 12 Jul 22:46
nextcollapse
I’ll quote myself from some time ago:
The entire article is based on the flawed premise, that “AI” would improve the performance of developers. From my daily observation the only people increasing their throughput with “AI” are inexperienced and/or bad developers. So, create terrible code faster with “AI”. Suggestions by copilot are >95% garbage (even for trivial stuff) just slowing me down in writing proper code (obviously I disabled it precisely for that reason). And I spend more time on PRs to filter out the “AI” garbage inserted by juniors and idiots. “AI” is killing the productivity of the best developers even if they don’t use it themselves, decreases code quality leading to more bugs (more time wasted) and reducing maintainability (more time wasted). At this point I assume ignorance and incompetence of everybody talking about benefits of “AI” for software development. Oh, you have 15 years of experience in the field and “AI” has improved your workflow? You sucked at what you’ve been doing for 15 years and “AI” increases the damage you are doing which later has to be fixed by people who are more competent.
Kissaki@programming.dev
on 14 Jul 08:47
nextcollapse
from some time ago
It’s a fair statement and personal experience, but a question is, does this change with tool changes and user experience? Which makes studies like OP important.
Your >95% garbage claim may very well be an isolated issue due to tech or lib or llm usage patters or whatnot. And it may change over time, with different models or tooling.
BrianTheeBiscuiteer@lemmy.world
on 12 Jul 23:28
nextcollapse
The only time it really helps me is when I’m following a pretty clear pattern and the auto-complete spares me from copy-pasting or just retyping the same thing over and over. Otherwise I’m double-checking everything it wrote, and I have to understand it to test it, and that probably takes most of my time. Furthermore, it usually doesn’t take the entire codebase into account so it looks like it was written by someone who didn’t know our team or company standards as well as our proprietary code.
Scrath@lemmy.dbzer0.com
on 13 Jul 07:36
nextcollapse
I talked to Microsoft Copilot 3 times for work related reasons because I couldn’t find something in documentation. I was lied to 3 times. It either made stuff up about how the thing I asked about works or even invented entirely new configuration settings
Claude AI does this ALL the time too. It NEEDS to give a solution, it rarely can say “I don’t know” so it will just completely make up a solution that it thinks is right without actually checking to see the solution exists. It will make/dream up programs or libraries that don’t and have never existed OR it will tell you something can do something when it has never been able to do that thing ever.
And that’s just how all these LLMs have been built. they MUST provide a solution so they all lie. they’ve been programmed this way to ensure maximum profits. Github Copilot is a bit better because it’s with me in my code so it’s suggestions, most of the time, actually work because it can see the context and whats around it. Claude is absolute garbage, MS Copilot is about the same caliber if not worse than Claude, and Chatgpt is only good for content writing or bouncing ideas off of.
Croquette@sh.itjust.works
on 13 Jul 12:06
nextcollapse
LLM are just sophisticated text predictions engine. They don’t know anything, so they can’t produce an “I don’t know” because they can always generate a text prediction and they can’t think.
Cyberflunk@lemmy.world
on 13 Jul 14:44
nextcollapse
Tool use, reasoning, chain of thought, those are the things that set llm systems apart. While you are correct in the most basic sense, it’s like saying a car is only a platform with wheels, it’s reductive of the capabilities
Croquette@sh.itjust.works
on 13 Jul 16:49
collapse
LLM are prediction engine. They don’t have knowledge, they only chain words together related to your topic.
They don’t know they are wrong because they just don’t know anything period.
They could be programmed to do some double/triple checking, and return “i dont know” when the checks are negative.
I guess that would compromise the apparence of oracle that their parent companies seem to dissimulately push onto them.
they don’t check. you gotta think in statistics terms.
based on the previously inputed words (tokens actually, but I’ll use words for the sake of simplicity), which is the system prompt + user prompt, the LLM generates a list of the next possible words that makes most sense, then picks one from the top few. How much it goes down the list on lower possible words is based on temperature configuration. Then the next word, and the next, etc, each time looking back.
I haven’t checked on the reasoning models, what that step actually does, but I assume it just expands the user prompt to fill in stuff that thr LLM thinks the user was lazy to input, then works on the final answer.
so basically is like tapping on your phone keyboard next word prediction.
That the script could incorporate some checking mechanisms and implement an “i dont know” for when the LLMs answers fails some tests.
They already do some of that but for other purposes, like censoring, or as by recent news, grok looks up musks opinions before answering questions, or to make more accurate math calculations they actually call a normal calculator, and so on…
They could make the LLM produce an answer A, then look up the question on google and ask that LLM to “compare” answer A with the main google results looking for inconsistencies and then return “i dont know” if its too inconsistent. Its not a rigorous test, but its something, and im sure the actual devs of those chatbots could make something much better than my half baked idea.
Are you using Claude web chat or Claude code? Because my experience with it is vastly different eve when using the same underlying model. Clause code isn’t perfect and gets stuff wrong, but it can run the project check the output and realize it’s mistake and fix it in many cases. It doesn’t fix logic flaws, but it can fix hallucinations of library methods that don’t exist.
In fairness the msdn documentation is prone to this also.
By “this” I mean having what looks like a comprehensive section about the thing you want but the actual information you need isn’t there, but you need to read the whole thing to find out.
Cyberflunk@lemmy.world
on 13 Jul 14:42
nextcollapse
My velocity has taken an unreasonable rocket trajectory. Deploying internal tooling, agent creation, automation. I have teams/swarms that tackle so many things, and do it well. I understand there are issues, but learning how to use the tools is critical to improving performance, blindly expecting the tools to be sci-fi super coders is unrealistic.
SugarCatDestroyer@lemmy.world
on 13 Jul 14:50
nextcollapse
It’s hard to even call them specialists, they are at the level of cashiers, for whom the computer does everything, and sometimes they do something at the level of communicating with clients and that’s all. I’m certainly not a professional, but I think the main message is clear.
People are a bad judge of their own skill and overrely on tools and assistants when present. See also: car adas systems making drivers less skillful. More news at 11.
See also: car adas systems making drivers less skillful.
But also making traffic safer
Think we need to introduce a mandatory period where you need to drive an old car with no ABS when you’ve just gotten your license. I mean for me that was called being a broke-ass student, but nowadays cars with no ABS are starting to cost more than cars with ABS, traction control and even ESP, because the 80s and early 90s cars where these things were optional, are now classics, whereas you can get a BMW or Audi that was made this century for like 500-800 euros if you’re brave or just want to move in to your garage full time.
HaraldvonBlauzahn@feddit.org
on 14 Jul 05:49
collapse
Now the interesting question is what it really means when less experienced programmers think they are 100% faster.
threaded - newest
Ai-only vibe coders. As a development manager I can tell you that AI-augmented actual developers who know how to write software and what good and bad code looks like are unquestionably faster. GitHub Copilot makes creating a suite of unit tests and documentation for a class take very little time.
Try reading the article.
I read the article (not the study only the abstract) and they were getting paid an hourly rate. It did not mention anything about whether or not they had expirence in using llms to code. I feel there is a sweet spot, has to do with context window size etc.
I was not consistently better a year and a half ago but now i know the limits caveats and methods.
I think this is a very difficult thing to quantify but haters gonna latch on to this, same as the study that said “ai makes you stupid” and “llms cant reason”… its a cool tool that has limits.
One interesting feature in this paper is that the programmers who used LLMs thought they were faster, they estimated it was saving about 20% of the time it would have taken without LLMs. I think that’s a clear sign that you shouldn’t trust your gut about how much time LLMs save you, you should definitely try to measure it.
The study did find a correlation between prior experience and performance. One of the developers who showed a positive speedup with AI was the one with the most previous experience using Cursor (over 50 hours).
I did, thank you. Terms therein like “they spend more time prompting the AI” genuinely do not apply to a code copilot, like the one provided by GitHub, because it infers its prompt based on what you’re doing and the context of the file and application and creates an autocomplete based on its chat completion, which you can accept or ignore like any autocomplete.
You can start writing test templates and it will fill them out for you, and then write the next tests based on the inputs of your methods and the imports in the test class. You can write a whole class without any copilot usage and then start writing the xmldocs and it will autocomplete them for you based on work you already did. Try it for yourself if you haven’t already, it’s pretty useful.
The article is a blog post summarizing the actual research. The researchers’ summary says:
The research shows that under their tested scenario and assumptions, devs were less productive.
The takeaway from this study is to measure and benchmark what’s important to your team. However many development teams have been doing that, albeit not in a formal study format, and finding AI improves productivity. It is not (only) “vibe productivity”.
And certainly I agree with the person you replied to: anecdotally, AI makes my devs more productive by cutting out the most grindy parts, like writing mocks for tests or getting that last missing coverage corner. So we have some measuring and validation to do.
The research explicitly showed that the anecdotes were flawed, and that actual measured productivity was the inverse of what the users imagined. That’s the entire point. You’re just saying “nuh uh, muh anecdotes.”
I said it needs to be measured. But few teams are going to do that, they’re building products not case studies.
This study is catnip for the people who put “AI” in scare quotes and expect those of us who use it to suddenly realize that we’ve only been generating hallucination slop. This has not been the lived experience of those of us in software development. In my own case I’ve seen teams stop hiring because they are getting the same amount of work done in less time. But those are anecdotes, so it doesn’t count.
It’s entirely possible to measure metrics.
Enjoy your slopware.
We do measure metrics around sprint productivity, code quality, reworks, and more. However, we do not build case studies for clickbait articles. This isn’t a productive conversation as you just want to ridicule those using AI. Blocked.
LOL. Doing a good job of that all by yourself, mate.
Don’t give yourselves to these unnatural men - machine men with machine minds and machine hearts! You are not machines! You are men!
You are not cattle!
You are men!
You have the love of humanity in your hearts!
You don’t hate!
Only the unloved hate - the unloved and the unnatural!
Soldiers!
Don’t fight for slavery! Fight for liberty!
In the 17th Chapter of St Luke it is written: “the Kingdom of God is within man” - not one man nor a group of men, but in all men!
In you!
I love the old Melody Sheep version.
You know, I think I might make a cut of this, but… more somber, tense music, … more relevant modern references.
If you do, post a link, I’ll watch it!
does lemmy have some kind of a !remindme type thing to pester me into actually doing this, lol?
Paolo Nutini’s song ‘Iron Sky’ samples this same speech
en.wikipedia.org/wiki/Iron_Sky_(song)?wprov=sfla1
It would be interesting to see another study focusing on cognitive load. Maybe the AI let’s you offload some amount of thinking so you reserve that energy for things it’s bad at. But I could see how that would potentially be a wash as you need to clearly specify your requirements in the prompt, which is a different cognitive load.
I seem to recall a separate study showing that it just encouraged users to think more lazily, not more critically.
The study was centered on bugfixing large established projects. This task is not really the one that AI helpers excel at.
Also small number of participants (16) , the participants were familiar with the code base and all tasks seems to be smaller in completion time can screw results.
Thus the divergence between studio results and many people personal experience that would experience increase of productivity because they are doing different tasks in a different scenario.
“AI is good for Hello World projects written in javascript.”
Managers will still fire real engineers though.
I find it more useful doing large language transformations and delving into unknown patterns, languages or environments.
If I know a source head to toe, and I’m proficient with that environment, it’s going to offer little help. Specially if it’s a highly specialized problem.
Since SVB crash there have been firings left and right. I suspect AI is only an excuse for them.
Same experience here, performance is mediocre at best on an established code base. Recall tends to drop sharply as the context expands leading to a lot of errors.
I’ve found coding agents to be great at bootstrapping projects on popular stacks, but once you reach a certain size it’s better to either make it work on isolated files, or code manually and rely on the auto complete.
So far I’ve only found it useful when describing bite-sized tasks in order to get suggestions on which functions are useful from the library/API I’m using. And only when those functions have documentation available on the Internet.
Call me crazy but I think developers should understand what they’re working on, and using LLM tools doesn’t provide a shortcut there.
You have to get familiar with the codebase at some point. When you are unfamiliar, in my experience, LLMs can provide help understanding it. Copying large portions of code you don’t really understand and asking for an analysis and explanation.
Not so far ago I used it on assembly code. It would have taken ages to decipher what it was doing by myself. The AI sped up the process.
But once you are very familiar with a established project you had work a lot with, I don’t even bother asking LLMs anything, as in my experience, I come up with better answers quicker.
At the end of the day we must understand that a LLM is more or less an statistical autocomplete trained on a large dataset. If your solution is not on the dataset the thing is not going to really came up with a creative solution. And the thing is not going to run a debugger on your code either, afaik.
When I use it the question I ask myself the most before bothering is “is the solution likely to be on the training dataset?” or “is it a task that can be solved as a language problem?”
Someone told me the best use of AI was writing unit tests and I died on the inside.
Why? That is a great use for AI. I’m guessing you are imagining that people are just blindly asking for unit tests and not even reading the results? Obviously don’t do that.
Of course that’s what they’re doing. That’s the whole point. Generate a bunch of plausible-looking BS and move on.
Writing one UT (actually writing, not pressing tab) gives you ideas for other tests.
And unit tests are not some boring chore. When doing TDD, they help inform and guide the design. If the LLM is doing that thinking for you, too, you’re just flying blind. “Yeah, that looks about right.”
Can’t wait for this shit to show up in medical devices.
Damn there are so many AI critics who have clearly not seriously tried it. It’s like the smartphone naysayers of 2007 but much much worse.
Ad hominem.
That is like claiming people are directly copying from university books and implementing whatever they get without checking.
Of course there are nitwits like that, but they are few and far in between.
Anyone seriously using LLM prompts double checks their work.
I’ve caught “professionals” pasting code from from forums and StackOverflow. Of course people are just blindly using LLMs the same way. Incredibly naive to think people aren’t already and won’t do so more in the future.
I feel this – we had a junior dev on our project who started using AI for coding, without management approval BTW (it was a small company and we didn’t yet have a policy specifically for it. Alas.)
I got the fun task, months later, of going through an entire component that I’m almost certain was ‘vibe coded’ – it “worked” the first time the main APIs were called, but leaked and crashed on subsequent calls. It used double- and even triple-pointers to data structures, which the API vendor’s documentation upon some casual reading indicated could all be declared statically and re-used (this was an embedded system); needless arguments; mallocs and frees everywhere for no good reason (again due to all of the un-needed dynamic storage involving said double/triple pointers to stuff). It was a horrible mess.
It should have never gotten through code review, but the senior devs were themselves overloaded with work (another, separate problem) …
I took two days and cleaned it all up, much simpler, no mem leaks, and could actually be, you know, used more than once.
Fucking mess, and LLMs (don’t call it “AI”) just allow those who are lazy and/or inexperienced to skate through short-term tasks, leaving huge technical debt for those that have to clean up after.
If you’re doing job interviews, ensure the interviewee is not connected to LLMs in any way and make them do the code themselves. No exceptions. Consider blocking LLMs from your corp network as well and ban locally-installed things like Ollama.
(old song, to the tune of My Favourite Things)
🎶 “Pointers to pointers to pointers to strings,
this code does some rather unusual things…!” 🎶
Ngl, as much as I dislike AI, I think this is really the bigger issue. Hiring a junior and then merging his contributions without code reviewing is a disaster waiting to happen with or without AI.
I wish AI was never invented, but surely this isn't ture.
I've been able to solve coding issues that usually took me hours in minutes.
Wish it wasn't so, but it's been my reality
LLMs making you code faster means your slow not LLMs fast
I doubt anyone can write complex regex in ~30 seconds, LLM's can
You have definitely never worked with a regex guru.
No, but not everyone is a regex guru.
If AI can write code half as good and fast as a regex guru, it's going to increase the average dev's productivity a lot
does the regex search for what you wanted to? Does it work in all cases? Can I be confident that it will find all instances i care about, or will I still have to comb the code manually?
If you find yourself writing regexes often enough that speeding up that process would increase your productivity by “a lot”, then you should get good at writing them yourself which means practicing without an LLM. If its something that you don’t do often enough to warrant getting good at then the productivity increase is negligible.
I think the main benefit here isn’t really productivity but developer comfort by saving them from having to step out of their comfort zone.
I’m not trusting a regex written by AI
That's why you write tests
tests can never prove correctness of code. All they can prove is “the thing hasn’t failed yet”. Proper reasoning is always needed if you want a guarantee.
If you had the llm write the regex for you, I can practically guarantee that you won’t think of, and write tests for, all the edge cases.
You formally verify your regexes? Doubtful.
No, which is why I avoid regexes for most production code and also why I would never use one written by a pathological liar and always guessing coder like an LLM.
LLM is great when you’re coding in a pure fictional programming language like elm and are using loss of custom types to make impossible states unrepresentable, and the function you’re writing could have been derived by the Haskell compiler, so mathematically the only possible way you could write it wrong is to use the wrong constructor, then it’s usually right and when it’s wrong either it doesn’t compile or you can see it’s chosen the wrong path.
The rest of the time it will make shit up and when you challenge it, out will happily rewrite it for you, but there’s no particular reason why it wouldn’t make up more nonsense.
Regexes are far easier to write than to debug, which is exactly why they’re poison for a maintainable code base and a really bad use case for an LLM.
I also wouldn’t use an LLM for languages in which there are lots and lots of ways to go wrong. That’s exactly when you need an experienced developer, not someone who guesses based on what they read online and no understanding, never learning anything, because, my young padawan, that’s exactly what an LLM is, every day.
Watch your LLM like a hawk.
You don’t have to. You can read it.
yes they can. I regularly do. Regexes aren’t hard to write, their logic is quite simple. They’re hard to read, yes, but they are almost always one-offs (ex, substitutions in nvim).
He did say complex regex. A complex regex is not simple.
yes, “complex” regexes are quite simple too. Complex regexes are long, not difficult. They appear complex because you have to “inline” everything. They really are not that hard.
This is stupid pedantry. By that logic literally nothing is complex because everything is made up of simple parts.
cryptic != complex. Are they cryptic? yes. Are they complex? not really, if you can understand “one or more” or “zero or more” and some other really simple concepts like “one of these” or “not one of these” or “this is optional”. You could explain these to a child. It’s only because they look cryptic that people think they are complex. Unless you start using backreferences and advanced concepts like those (which are not usually needed in most cases) they are very simple. long != complex
Ok I can see you haven’t actually come across any complex regexes yet…
(Which is probably a good thing tbh - if you’re writing complex regexes you’re doing it wrong.)
i haven’t come across many. But i have written a lot.
Depends on what you need to match. Regex is just another programming language. It’s more declarative than traditional languages though (it’s basically pattern matching).
Pattern matching is something I already do a lot of in my code, so regexes aren’t that much different.
Regardless, the syntax sucks. It takes some time to get familiar with it, but once you get past that, it’s really simple.
The experienced developers in the study believed they were 20% faster. There’s a chance you also measured your efficiency more subjectively than you think you did.
I suspect that unless you were considerably more rigorous in testing your efficiency than they were, you might just be in a time flies when you’re having fun kind of situation.
Reading the paper, AI did a lot better than I would expect. It showed experienced devs working on a familiar code base got 19% slower. It’s telling that they thought they had been more productive, but the result was not that bad tbh.
I wish we had similar research for experienced devs on unfamiliar code bases, or for inexperienced devs, but those would probably be much harder to measure.
I don’t understand your point. How is it good that the developers thought they were faster? Does that imply anything at all in LLMs’ favour? IMO that makes the situation worse because we’re not only fighting inefficiency, but delusion.
20% slower is substantial. Imagine the effect on the economy if 20% of all output was discarded (or more accurately, spent using electricity).
I’m not saying it’s good, I’m saying I expected it to be even worse.
Yes it suggest lower cognitive load.
1% slowdown is pretty bad. You’d still do better just not using it. 19% is huge!
I’ll quote myself from some time ago:
It’s a fair statement and personal experience, but a question is, does this change with tool changes and user experience? Which makes studies like OP important.
Your >95% garbage claim may very well be an isolated issue due to tech or lib or llm usage patters or whatnot. And it may change over time, with different models or tooling.
.
The only time it really helps me is when I’m following a pretty clear pattern and the auto-complete spares me from copy-pasting or just retyping the same thing over and over. Otherwise I’m double-checking everything it wrote, and I have to understand it to test it, and that probably takes most of my time. Furthermore, it usually doesn’t take the entire codebase into account so it looks like it was written by someone who didn’t know our team or company standards as well as our proprietary code.
I talked to Microsoft Copilot 3 times for work related reasons because I couldn’t find something in documentation. I was lied to 3 times. It either made stuff up about how the thing I asked about works or even invented entirely new configuration settings
Claude AI does this ALL the time too. It NEEDS to give a solution, it rarely can say “I don’t know” so it will just completely make up a solution that it thinks is right without actually checking to see the solution exists. It will make/dream up programs or libraries that don’t and have never existed OR it will tell you something can do something when it has never been able to do that thing ever.
And that’s just how all these LLMs have been built. they MUST provide a solution so they all lie. they’ve been programmed this way to ensure maximum profits. Github Copilot is a bit better because it’s with me in my code so it’s suggestions, most of the time, actually work because it can see the context and whats around it. Claude is absolute garbage, MS Copilot is about the same caliber if not worse than Claude, and Chatgpt is only good for content writing or bouncing ideas off of.
LLM are just sophisticated text predictions engine. They don’t know anything, so they can’t produce an “I don’t know” because they can always generate a text prediction and they can’t think.
Tool use, reasoning, chain of thought, those are the things that set llm systems apart. While you are correct in the most basic sense, it’s like saying a car is only a platform with wheels, it’s reductive of the capabilities
LLM are prediction engine. They don’t have knowledge, they only chain words together related to your topic.
They don’t know they are wrong because they just don’t know anything period.
They have a point, chatbots are built on top of LLMs, they arent just LLMs.
They could be programmed to do some double/triple checking, and return “i dont know” when the checks are negative. I guess that would compromise the apparence of oracle that their parent companies seem to dissimulately push onto them.
they don’t check. you gotta think in statistics terms.
based on the previously inputed words (tokens actually, but I’ll use words for the sake of simplicity), which is the system prompt + user prompt, the LLM generates a list of the next possible words that makes most sense, then picks one from the top few. How much it goes down the list on lower possible words is based on temperature configuration. Then the next word, and the next, etc, each time looking back.
I haven’t checked on the reasoning models, what that step actually does, but I assume it just expands the user prompt to fill in stuff that thr LLM thinks the user was lazy to input, then works on the final answer.
so basically is like tapping on your phone keyboard next word prediction.
The chatbots are not just LLMs though. They run scripts in which some steps are queries to an LLM.
ok… what are you trying to point out?
That the script could incorporate some checking mechanisms and implement an “i dont know” for when the LLMs answers fails some tests.
They already do some of that but for other purposes, like censoring, or as by recent news, grok looks up musks opinions before answering questions, or to make more accurate math calculations they actually call a normal calculator, and so on…
They could make the LLM produce an answer A, then look up the question on google and ask that LLM to “compare” answer A with the main google results looking for inconsistencies and then return “i dont know” if its too inconsistent. Its not a rigorous test, but its something, and im sure the actual devs of those chatbots could make something much better than my half baked idea.
Are you using Claude web chat or Claude code? Because my experience with it is vastly different eve when using the same underlying model. Clause code isn’t perfect and gets stuff wrong, but it can run the project check the output and realize it’s mistake and fix it in many cases. It doesn’t fix logic flaws, but it can fix hallucinations of library methods that don’t exist.
In fairness the msdn documentation is prone to this also.
By “this” I mean having what looks like a comprehensive section about the thing you want but the actual information you need isn’t there, but you need to read the whole thing to find out.
My velocity has taken an unreasonable rocket trajectory. Deploying internal tooling, agent creation, automation. I have teams/swarms that tackle so many things, and do it well. I understand there are issues, but learning how to use the tools is critical to improving performance, blindly expecting the tools to be sci-fi super coders is unrealistic.
It’s hard to even call them specialists, they are at the level of cashiers, for whom the computer does everything, and sometimes they do something at the level of communicating with clients and that’s all. I’m certainly not a professional, but I think the main message is clear.
🎵Had a great day out,
Callin’ my name like Ferris Bueller,
Time to wrap this up,
I’m getting 19℅ slower! 🎵
Well that’s a strangely deep cut Ken Ashcorp ref
Men of culture i see
I am honestly shocked to see a reference in the wild to Ken Ashcorp.
I’m honestly shocked that multiple people got the reference.
<img alt="" src="https://slrpnk.net/pictrs/image/58812212-987e-4db1-8b3e-2d96854b3bb3.gif">
Would ai coders even get faster over time or just stay stagnant since they aren’t learning anything about what they’re doing
People are a bad judge of their own skill and overrely on tools and assistants when present. See also: car adas systems making drivers less skillful. More news at 11.
But also making traffic safer
Think we need to introduce a mandatory period where you need to drive an old car with no ABS when you’ve just gotten your license. I mean for me that was called being a broke-ass student, but nowadays cars with no ABS are starting to cost more than cars with ABS, traction control and even ESP, because the 80s and early 90s cars where these things were optional, are now classics, whereas you can get a BMW or Audi that was made this century for like 500-800 euros if you’re brave or just want to move in to your garage full time.
Now the interesting question is what it really means when less experienced programmers think they are 100% faster.
youtu.be/i7aQig-wjYA