AI Coding Is Massively Overhyped, Report Finds

AI Coding Is Massively Overhyped, Report Finds (futurism.com)
from SparroHawc@lemmy.zip to technology@lemmy.world on 30 Sep 19:20
https://lemmy.zip/post/49954591

“No Duh,” say senior developers everywhere.

The article explains that vibe code often is close, but not quite, functional, requiring developers to go in and find where the problems are - resulting in a net slowdown of development rather than productivity gains.

#technology

threaded - newest

Sibshops@lemmy.myserv.one on 30 Sep 19:37 next collapse

I mean… At best it’s a stack overflow/google replacement.

Warl0k3@lemmy.world on 30 Sep 19:43 next collapse

There’s some real perks to using AI to code - it helps a ton with templatable or repetitive code, and setting up tedious tasks. I hate doing that stuff by hand so being able to pass it off to copilot is great. But we already had tools that gave us 90% of the functionality copilot adds there, so it’s not super novel, and I’ve never had it handle anything properly complicated at all successfully (asking GPT-5 to do your dynamic SQL calls is inviting disaster, for example. Requires hours of reworking just to get close.)

Sibshops@lemmy.myserv.one on 30 Sep 19:51 next collapse

Fair, I’ve used it recently to translate a translations.ts file to Spanish.

But for repetitive code, I feel like it is kind of a slow down sometimes. I should have refactored instead.

pennomi@lemmy.world on 30 Sep 20:43 next collapse

Some code is boilerplate and can’t be distilled down more. It’s nice to point an AI to a database schema and say “write the Django models, admin, forms, and api for this schema, using these authentication permissions”. Yeah I’ll have to verify it’s done right, but that gets a lot of the boring typing out of the way.

Sibshops@lemmy.myserv.one on 30 Sep 20:55 next collapse

That’s fair.

panda_abyss@lemmy.ca on 01 Oct 10:53 collapse

I use it for writing code to call APIs and is a huge boon.

Yeah, you have to check the results, but it’s way faster than me.

cam_i_am@lemmy.world on 01 Oct 11:43 collapse

This is a thing people miss. “Oh it can generate repetitive code.”

OK, now who’s going to maintain those thousands of lines of repetitive unit tests, let alone check them for correctness? Certainly not the developer who was too lazy to write their own tests and to think about how to refactor or abstract things to avoid the repetition.

If someone’s response to a repetitive task is copy-pasting poorly-written code over and over we call them a bad engineer. If they use an AI to do the copy-paste for them that’s supposed to be better somehow?

Feyd@programming.dev on 30 Sep 19:51 next collapse

But we already had tools that gave us 90%

More reliable ones.

MaggiWuerze@feddit.org on 01 Oct 05:59 collapse

Deterministic ones

UnderpantsWeevil@lemmy.world on 30 Sep 19:58 next collapse

So much of the AI hype has been pointing to ten year old technology repackaged in a slick new interface.

AI is the iPod to the Zune of yesteryear.

MangoCats@feddit.it on 30 Sep 21:27 collapse

Repackaging old technology in slick new interfaces is what we have been calling progress in computer software for 40+ years.

UnderpantsWeevil@lemmy.world on 30 Sep 21:32 collapse

I mean… I like to think we’ve done a bit more than that. FFS, file compression alone has made leaps and bounds since the 3.25" floppy days.

Also, as a T-SQL guy, I gotta say there’s a world of difference between SQL 2008 and SQL 2022.

But I’ll spot you that a lot of the last 10-15 years has produced herculean efforts in answering the question “How can we squeeze a few more ads into your GUI?”

MangoCats@feddit.it on 30 Sep 22:45 collapse

There have been a few “milestone moments” like map-reduce Hadoop, etc. Still, there’s a whole lot of eye candy wrapped around the same old basic concepts.

otacon239@lemmy.world on 30 Sep 20:04 next collapse

I’ve had plenty of success using it to build things like docker compose yamls and the like, but for anything functional, it does often take a few tries to get it right. I never use its raw for anything in production. Only as a leaping off point to structure things.

MangoCats@feddit.it on 30 Sep 21:25 next collapse

(asking GPT-5 to do your dynamic SQL calls is inviting disaster, for example. Requires hours of reworking just to get close.)

Maybe it’s the dynamic SQL calls themselves that are inviting disaster?

Warl0k3@lemmy.world on 30 Sep 22:24 collapse

Dynamic SQL in of itself not an issue, but the consequences (exacerbated by SQL’s inherent irrecoverability from mistakes - hope you have backups) have stigmatized its use heavily. With an understanding of good practice, a proper development environment and a close eye on the junior devs, there’s no inherent issue to using it.

MangoCats@feddit.it on 30 Sep 22:43 collapse

With an understanding of good practice, a proper development environment and a close eye on the junior devs, there’s no inherent issue to using it.

My feelings about C/C++ are the same. I’m still switching to Rust, because that’s what the company wants.

Valmond@lemmy.world on 30 Sep 21:35 next collapse

For the missing 10% : the folder with copies of the code you have already wrote doing that.

Flamekebab@piefed.social on 30 Sep 22:57 collapse

Similarly I find it very useful for if I’ve written a tool script and really don’t want to write the command line interface for it.

“Here’s a well-documented function - write an argparser for it”

…then I fix its rubbish assumptions and mistakes. It’s probably not drastically quicker but it doesn’t require as much effort from me, meaning I can go harder on the actual function (rather than keeping some effort in reserve to get over the final hump).

MangoCats@feddit.it on 30 Sep 21:22 next collapse

In the beginning there were manufacturer’s manuals, spec sheets, etc.

Then there were magazines, like Byte, InfoWorld, Compute! that showed you a bit more than just the specs

Then there were books, including the X for Dummies series that purported to teach you theory and practice

Then there was Google / Stack Overflow and friends

Somewhere along there, where depends a lot on your age, there were school / University courses

Now we have “AI mode”

Each step along that road has offered a significant speedup, connecting ideas to theory to practice.

I agree, all the “magic bullet” AI hype is far overblown. However, with AI something I new I can do is, interactively, develop a specification and a program. Throw out the code several times while the spec gets refined, re-implemented, tried in different languages with different libraries. It’s still only good for “small” projects, but less than a year ago “small” meant less than 1000 lines of code. These days I’m seeing 300 lines of specification turn into 1500-3000 lines of code and have it running successfully within half a day.

I don’t know if we’re going to face a Kurzweilian singularity where these things start improving themselves at exponential rates, or if we’ll hit another 30 year plateau like neural nets did back in the 1990s… As things are, Claude helps me make small projects several times faster than I could ever do with Google and Stack Overflow. And you can build significant systems out of cooperating small projects.

Steve@startrek.website on 01 Oct 00:51 next collapse

I found that it only does well if the task is already well covered by the usual sources. Ask for anything novel and it shits the bed.

snooggums@piefed.world on 01 Oct 03:40 collapse

That's because it doesn't understand anything and is just vomiting forth output based on the code that wad fed into it.

timbuck2themoon@sh.itjust.works on 01 Oct 01:38 next collapse

At absolute best.

My experience is it’s the bottom stack overflow answers. Making up bullshit and nonexistent commands, etc.

mcv@lemmy.zip on 01 Oct 08:38 next collapse

If you know what you want, its automatic code completion can save you some typing in those cases where it gets it right (for repetitive or trivial code that doesn’t require much thought). It’s useful if you use it sparingly and can see through its bullshit.

For junior coders, though, it could be absolute poison.

whoisearth@lemmy.ca on 01 Oct 15:42 collapse

They should make it more like SO and have it chastise you for asking a stupid question you should already know the answer to lol

forrcaho@lemmy.world on 01 Oct 16:24 collapse

At least when I’m cleaning up after shit devs who used Stack Overflow, I can usually search using a fragment of their code and find where they swiped it from and get some clue what the hell they were thinking. Now that they’re all using AI chatbots, there’s no trace.

Lembot_0004@discuss.online on 30 Sep 19:37 next collapse

Industry? Yes, industry hires people who know how to do things needed by industry and who do nothing besides those things.

Programmers outside “industry” more often find themselves writing using the libraries they see for the first time and using languages they never thought to use. AI helps a lot here.

SparroHawc@lemmy.zip on 01 Oct 21:51 collapse

Except LLMs are absolutely terrible at working with a new, poorly documented library. Commonly-used, well-defined libraries? Sure! Working in an obscure language or an obscure framework? Good luck.

LLMs can surface information. It’s perhaps the one place they’re actually useful. They cannot reason in the same way a human programmer can, and all the big tech companies are trying to sell them on that basis.

Lembot_0004@discuss.online on 02 Oct 04:20 collapse

Well, don’t use it with new, poorly documented libraries. That is a common sense rule: use the tool where it is useful.

Somehow many LLM criticizers just claim that LLMs are shit because they can’t autonomously write code. Yes, they can’t. But they can do many other useful things.

simplejack@lemmy.world on 30 Sep 19:41 next collapse

Might be there someday, but right now it’s basically a substitute for me googling some shit.

If I let it go ham, and code everything, it mutates into insanity in a very short period of time.

degen@midwest.social on 30 Sep 20:11 collapse

I’m honestly doubting it will get there someday, at least with the current use of LLMs. There just isn’t true comprehension in them, no space for consideration in any novel dimension. If it takes incredible resources for companies to achieve sometimes-kinda-not-dogshit, I think we might need a new paradigm.

Windex007@lemmy.world on 30 Sep 22:31 next collapse

A crazy number of devs weren’t even using EXISTING code assistant tooling.

Enterprise grade IDEs already had tons of tooling to generate classes and perform refactoring in a sane and algorithmic way. In a way that was deterministic.

So many use cases people have tried to sell me on (boilerplate handling) and im like “you have that now and don’t even use it!”.

I think there is probably a way to use llms to try and extract intention and then call real dependable tools to actually perform the actions. This cult of purity where the llm must actually be generating the tokens themselves… why?

I’m all for coding tools. I love them. They have to actually work though. Paradigm is completely wrong right now. I don’t need it to “appear” good, i need it to BE good.

degen@midwest.social on 30 Sep 23:24 collapse

Exactly. We’re already bootstrapping, re-tooling, and improving the entire process of development to the best of our collective ability. Constantly. All through good, old fashioned, classical system design.

Like you said, a lot of people don’t even put that to use, and they remain very effective. Yet a tiny speck of AI tech and its marketing is convincing people we’re about to either become gods or be usurped.

It’s like we took decades of technical knowledge and abstraction from our Computing Canon and said “What if we didn’t use that anymore?”

Jason2357@lemmy.ca on 01 Oct 16:10 collapse

This is the smoking gun. If the AI hype boys really were getting that “10x engineer” out of AI agents, then regular developers would not be able to even come close to competing. Where are these 10x engineers? What have they made? They should be able to spin up whole new companies, with whole new major software products. Where are they?

Glitchvid@lemmy.world on 01 Oct 06:51 next collapse

I think we’ve tapped most of the mileage we can get from the current science, the AI bros conveniently forget there have been multiple AI winters, I suspect we’ll see at least one more before “AGI” (if we ever get there).

Jason2357@lemmy.ca on 01 Oct 16:06 collapse

They are statistical prediction machines. The more they output, the larger the portion of their “context window” (statistical prior) becomes the very output they generated. It’s a fundamental property of the current LLM design that the snake will eventually eat enough of it’s tail to puke garbage code.

MisterNeon@lemmy.world on 30 Sep 19:57 next collapse

I can’t even get Copilot to write Vitest files for React without making a mountain of junk code that describes drivel.

SpaceNoodle@lemmy.world on 30 Sep 22:39 collapse

Were you trying to say “drivel?”

MisterNeon@lemmy.world on 30 Sep 23:07 collapse

Yes. I’ll go fix that.

Feyd@programming.dev on 30 Sep 20:03 next collapse

It remains to be seen whether the advent of “agentic AIs,” designed to autonomously execute a series of tasks, will change the situation.

“Agentic AI is already reshaping the enterprise, and only those that move decisively — redesigning their architecture, teams, and ways of working — will unlock its full value,” the report reads.

“Devs are slower with and don’t trust LLM based tools. Surely, letting these tools off the leash will somehow manifest their value instead of exacerbating their problems.”

Absolute madness.

Draces@lemmy.world on 01 Oct 01:16 collapse

How are you interpreting it that way. Did you miss a sentence or something in the quote?

SparroHawc@lemmy.zip on 01 Oct 21:45 collapse

It’s not interpretation, it’s extrapolation.

Draces@lemmy.world on 01 Oct 22:17 collapse

There’s quotes.

BeigeAgenda@lemmy.ca on 30 Sep 20:06 next collapse

Sounds exactly like my experience with Vibe Coding.

gigachad@piefed.social on 30 Sep 20:09 next collapse

I always need to laugh when I read “Agentic AI”

NuXCOM_90Percent@lemmy.zip on 30 Sep 20:39 next collapse

LLMs/“Vibe Coding” is probably a little bit more useful than the average intern with some tasks bumping up to an early career hire (what would historically be a Junior Engineer before title inflation/stagnation).

As in: it can generate code that might do what you want. But you need (actual) senior engineers to review the code thoroughly. And… how do people get the experience they need to do that?

Which basically results in turning everyone into a manager. Except your reports aren’t humans and you don’t get more pay. Instead your reports are vscode plugins. Which… sounds like absolute hell but I can get why the (wannabe) management class loves that.

Olap@lemmy.world on 30 Sep 20:53 next collapse

Nowhere close to any junior ime. Grads learn very quickly. Interns only job is to understand. Code academy career switchers understand requirements and will ask questions. Subservient AI does fuck all of any of those things

They are more akin to yet another Rapid Application Development wave imo. Go see how the previous iterations have done. Lots are still with us (rails ftw!). I’ll bet most will outlive LLMs

Feyd@programming.dev on 30 Sep 20:58 collapse

Even that description is vastly overselling it’s usefulness. Every time someone says it’s like a junior dev I just sigh, because literally the only reason I like junior devs is because they turn into not junior devs. Never once has assigning something to a junior dev made my job easier. The entire goal is to train them to the point they make PRs that I don’t have to walk them through reworking.

vermaterc@lemmy.ml on 30 Sep 20:44 next collapse

Usefulness really comes down to which model is being used. I’ve noticed most developers choose GPT for Copilot because that’s what they are familiar with (or they often don’t have a choice due to company policy). I recommend to try Claude Sonnet. How it works is true magic.

But I agree, repetitive tasks is what it should be used for. Planning is still programmer’s job

goatinspace@feddit.org on 30 Sep 20:49 next collapse

It amplifies a person. If you smart, it will help. If you dumb, it will make you dumber.

silasmariner@programming.dev on 30 Sep 21:18 collapse

I used to work with a pretty great coder.

Everything he’s done in the last 6 months has been trash

Big code assist energy

Flamekebab@piefed.social on 30 Sep 22:55 collapse

Yeah, one of my colleagues leans on it too hard and it’s really undermining his actual talent.

PetteriPano@lemmy.world on 30 Sep 21:23 next collapse

GPT has been quite hit and miss for me, but Claude is usually quite solid.

It needs micromanaging, otherwise it will do bad design decisions and go off on unrelated side quests. When micromanaged it’ll get you to that MVP very fast.

The trap is that you need to be able to find the errors it makes, or at least call them out immediately. Trying to have co-pilot fix it’s own mistakes is usually a neverending prompt-cycle.

It can summarise big code bases fast, and find how things fit together a lot faster than me. It’s been very useful when being thrown in head first into a new project.

rozodru@piefed.social on 30 Sep 23:23 collapse

Claude Sonnet? so you enjoy being lied to on a daily basis because that’s all you’re getting from Claude. Claude Code is alright but using Claude.ai you might as well just throw a dart at wall to get a solution on something.

sp3ctr4l@lemmy.dbzer0.com on 30 Sep 20:48 next collapse

Almost like its a desperate bid to blow another stock/asset bubble to keep ‘the economy’ going, from C suite, who all knew the housing bubble was going to pop when this all started, and now is.

Funniest thing in the world to me is high and mid level execs and managers who believe their own internal and external marketing.

The smarter people in the room realize their propoganda is in fact propogands, and are rolling their eyes internally that their henchmen are so stupid as to be true believers.

WhatsHerBucket@lemmy.world on 30 Sep 21:47 next collapse

shocked_pikachu_face.jpg

Baguette@lemmy.blahaj.zone on 30 Sep 21:51 next collapse

I’d be inclined to try using it if it was smart enough to write my unit tests properly, but it’s great at double inserting the same mock and have 0 working unit tests.

I might try using it to generate some javadoc though… then when my org inevitably starts polling how much ai I use I won’t be in the gutter lol

sugar_in_your_tea@sh.itjust.works on 30 Sep 21:57 next collapse

I personally think unit tests are the worst application of AI. Tests are there to ensure the code is correct, so ideally the dev would write the tests to verify that the AI-generated code is correct.

I personally don’t use AI to write code, since writing code is the easiest and quickest part of my job. I instead use it to generate examples of using a new library, give me comparisons of different options, etc, and then I write the code after that. Basically, I use it as a replacement for a search engine/blog posts.

Baguette@lemmy.blahaj.zone on 30 Sep 22:59 next collapse

To preface I don’t actually use ai for anything at my job, which might be a bad metric but my workflow is 10x slower if i even try using ai

That said, I want AI to be able to do unit tests in the sense that I can write some starting ones, then it be able to infer what branches aren’t covered and help me fill the rest.

Obviously it’s not smart enough, and honestly I highly doubt it will ever be because that’s the nature of llm, but my peeve with unit test is that testing branches usually entail just copying the exact same test but changing one field to be an invalid value, or a dependency to throw. It’s not hard, just tedious. Branching coverage is already enforced, so you should know when you forgot to test a case.

Edit: my vision would be an interactive version rather than my company’s current, where it just generates whatever it wants instantly. I’d want something to prompt me saying this branch is not covered, and then tell me how it will try to cover it. It eliminates the tedious work but still lets the dev know what they’re doing.

I also think you should treat ai code as a pull request and actually review what it writes. My coworkers that do use it don’t really proofread, so it ends up having some bad practices and code smells.

MangoCats@feddit.it on 30 Sep 23:24 next collapse

A software tester walks into a bar, he orders a beer.

He orders -1 beers.

He orders 0 beers.

He orders 843909245824 beers.

He orders duck beers.

AI can be trained to do that, but if you are in a not-well-trodden space, you’ll want to be defining your own edge cases in addition to whatever AI comes up with.

ganryuu@lemmy.ca on 01 Oct 07:08 collapse

Way I heard this joke, it continues with:

A real customer enters.

He asks where the toilets are.

The bar explodes.

sugar_in_your_tea@sh.itjust.works on 01 Oct 01:35 collapse

testing branches usually entail just copying the exact same test but changing one field to be an invalid value, or a dependency to throw

That’s what parameterization is for. In unit tests, most dependencies should be mocked, so expecting a dependency to throw shouldn’t really be a thing much of the time.

I’d want something to prompt me saying this branch is not covered, and then tell me how it will try to cover it

You can get the first half with coverage tools. The second half should be fairly straightforward, assuming you wrote the code. If a branch is hard to hit (i.e. it happens if an OS or library function fails), either mock that part or don’t bother with the test. I ask my team to hit 70-80% code coverage because that last 20-30% tends to be extreme corner cases that are hard to hit.

My coworkers that do use it don’t really proofread, so it ends up having some bad practices and code smells.

And this is the problem. Reviewers only know so much about the overall context and often do a surface level review unless you’re touching something super important.

We can make conventions all we want, but people will be lazy and submit crap, especially when deadlines are close. >

Baguette@lemmy.blahaj.zone on 01 Oct 03:21 collapse

The issue with my org is the push to be ci/cd means 90% line and branch coverage, which ends up being you spend just as much time writing tests as actually developing the feature, which already is on an accelerated schedule because my org has made promises that end up becoming ridiculous deadlines, like a 2 month project becoming a 1 month deadline

Mocking is easy, almost everything in my team’s codebase is designed to be mockable. The only stuff I can think of that isn’t mocked are usually just clocks, which you could mock but I actually like using fixed clocks for unit testing most of the time. But mocking is also tedious. Lots of mocks end up being:

Change the test constant expected. Which usually ends up being almost the same input just with one changed field.
Change the response answer from the mock
Given the response, expect the result to be x or some exception y

Chances are, if you wrote it you should already know what branches are there. It’s just translating that to actual unit tests that’s a pain. Branching logic should be easy to read as well. If I read a nested if statement chances are there’s something that can be redesigned better.

I also think that 90% of actual testing should be done through integ tests. Unit tests to me helps to validate what you expect to happen, but expectations don’t necessarily equate to real dependencies and inputs. But that’s a preference, mostly because our design philosophy revolves around dependency injection.

sugar_in_your_tea@sh.itjust.works on 01 Oct 04:32 collapse

I also think that 90% of actual testing should be done through integ tests

I think both are essential, and they test different things. Unit tests verify that individual pieces do what you expect, whereas integration tests verify that those pieces are connected properly. Unit tests should be written by the devs and help them prove their solution works as intended, and integration tests should be written by QA to prove that user flows work as expected.

Integration test coverage should be measured in terms of features/capabilities, whereas unit tests are measured in terms of branches and lines. My target is 90% for features/capabilities (mostly miss the admin bits that end customers don’t use), and 70-80% for branches and lines (skip unlikely errors, simple data passing code like controllers, etc). Getting the last bit of testing for each is nice, but incredibly difficult and low value.

Lots of mocks end up being

I use Python, which allows runtime mocking of existing objects, so most of our mocks are like this:

@patch.object(Object, "method", return_value=value)

Most tests have one or two lines of this above the test function. It’s pretty simple and not very repetitive at all. If we need more complex mocks, that’s usually a sign we need to refactor the code.

dependency injection

I absolutely hate dependency injection, most of the time. 99% of the time, there are only two implementations of a dependency, the standard one and a mock.

If there’s a way to patch things at runtime (e.g. Python’s unittest.mock lib), dependency injection becomes a massive waste of time with all the boilerplate.

If there isn’t a way to patch things at runtime, I prefer a more functional approach that works off interfaces where dependencies are merely passed as needed as data. That way you avoid the boilerplate and still get the benefits of DI.

That said, dependency injection has its place if a dependency has several implementations. I find that’s pretty rare, but maybe its more common in your domain.

FishFace@lemmy.world on 30 Sep 23:17 next collapse

The reason tests are a good candidate is that there is a lot of boilerplate and no complicated business logic. It can be quite a time saver. You probably know some untested code in some project - you could get an llm to write some tests that would at least poke some key code paths, which is better than nothing. If the tests are wrong, it’s barely worse than having no tests.

theolodis@feddit.org on 30 Sep 23:55 next collapse

Wrong tests will make you feel safe. And in the worst case, the next developer that is going to port the code will think that somebody wrote those tests with intention, and potentially create broken code to make the test green.

sugar_in_your_tea@sh.itjust.works on 01 Oct 00:03 next collapse

Exactly! I’ve seen plenty of tests where the test code was confidently wrong and it was obvious the dev just copied the output into the assertion instead of asserting what they expect the output to be. In fact, when I joined my current org, most of the tests were snapshot tests, which automated that process. I’ve pushed to replace them such with better tests, and we caught bugs in the process.

FishFace@lemmy.world on 01 Oct 09:10 collapse

Then write comments in the tests that say they haven’t been checked.

That is indeed the absolute worst case though, and most of the tests that are so produced will be giving value because checking a test is easier than checking the code (this is kind of the point of tests) and so most will be correct.

The risk of regressions covered by the good tests is higher than someone writing code to the rare bad test that you’ve marked as suspicious because you (for whatever reason) are not confident in your ability to check it.

sugar_in_your_tea@sh.itjust.works on 01 Oct 00:09 collapse

better than nothing

I disagree. I’d much rather have a lower coverage with high quality tests than high coverage with dubious tests.

If your tests are repetitive, you’re probably writing your tests wrong, or at least focusing on the wrong logic to test. Unit tests should prove the correctness of business logic and calculations. If there’s no significant business logic, there’s little priority for writing a test.

FishFace@lemmy.world on 01 Oct 09:05 collapse

The actual risk of those tests being wrong is low because you’re checking them.

If your tests aren’t repetitive they’ve got no setup or mocking in so they don’t test very much.

sugar_in_your_tea@sh.itjust.works on 01 Oct 15:26 collapse

If your test code is repetitive, you’re not following DRY sufficiently, or the code under test is overly complicated. We’ll generally have a single mock or setup code for several tests, some of which are parameterized. For example, in Python:

@parameterized.expand([
  (key, value, Expected Exception,),
  (other_key, other_value, OtherExpectedException,), 
])
def test_exceptions(self, key, value, exception_class):
    obj = setup()
    setattr(obj, key, value) 

    with self.assertRaises(exception_class): 
        func_to_test(obj)

Mocks are similarly simple:

@unittest.mock.patch.object(Class, "method", return_value=...)

dynamic_mock =  MagicMock(Class)
dynamic_mock...

How this looks will vary in practice, but the idea is to design code such that usage is simple. If you’re writing complex mocks frequently, there’s probably room for a refactor.

FishFace@lemmy.world on 01 Oct 16:58 collapse

I know how to use parametrised tests, but thanks.

Tests are still much more repetitive than application code. If you’re testing a wrapper around some API, each test may need you to mock a different underlying API call. (Mocking all of them at once would hide things). Each mock is different, so you can’t just extract it somewhere; but it is still repetitive.

If you need three tests each of which require a (real or mock) user, a certain directory structure to be present somewhere, input data to be got from somewhere, that’s three things that, even if you streamline them, need to be done in each test. I have been involved in a project where we originally followed the principle of, “if you need a user object in more than one test, put it in setUp or in a shared fixture” and the result is rapid unwieldy shared setup between tests - and if ever you should want to change one of those tests, you’d better hope you only need to add to it, not to change what’s already there, otherwise you break all the other tests.

For this reason, zealous application of DRY is not a good idea with tests, and so they are a bit repetitive. That is an acceptable trade-off, but also a place where an LLM can save you some time.

If you’re writing complex mocks frequently, there’s probably room for a refactor.

Ah, the end of all coding discussions, “if this is a problem for you, your code sucks.” I mean, you’re not wrong, because all code sucks.

LLMs are like the junior dev. You have to review their output because they might have screwed up in some stupid way, but that doesn’t mean they’re not worth having.

sugar_in_your_tea@sh.itjust.works on 01 Oct 23:47 collapse

zealous application of DRY is not a good idea with tests

I absolutely agree. My point is that if you need complex setup, there’s a good chance you can reuse it and replace only the data that’s relevant for your test instead of constructing it every time.

But yes, there’s a limit here. We currently have a veritable mess because we populate the database with fixture data so we have enough data to not need setup logic for each test. Changing that fixture data causes a dozen tests to fail across suites. Since I started at this org, I’ve been pushing against that and introduced the repository pattern so we can easily mock db calls.

IMO, reused logic/structures should be limited to one test suite. But even then, rules are meant to be broken, just make sure you justify it.

also a place where an LLM can save you some time.

I’m still not convinced that’s the case though. A typical mock takes a minute or two to write, most of the time is spent thinking about which cases to hit or refactoring code to make testing easier. Working with the LLM takes at least that long, esp if you count reviewing the generated code and whatnot.

LLMs are like the junior dev

Right, and I don’t want a junior dev writing my tests. Junior devs are there to be trained with the expectation that they’ll learn from mistakes. LLMs don’t learn, they’re perennially junior.

That’s why I don’t use them for code gen and instead use them for research. Writing code is the easy part of my job, knowing what to write is what takes time, so I outsource as much of the latter as I can.

MangoCats@feddit.it on 30 Sep 23:18 next collapse

Ideally, there are requirements before anything, and some TDD types argue that the tests should come before the code as well.

Ideally, the customer is well represented during requirements development - ideally, not by the code developer.

Ideally, the code developer is not the same person that develops the unit tests.

Ideally, someone other than the test developer reviews the tests to assure that the tests do in-fact provide requirements coverage.

Ideally, the modules that come together to make the system function have similarly tight requirements and unit-tests and reviews, and the whole thing runs CI/CD to notify developers of any regressions/bugs within minutes of code check in.

In reality, some portion of that process (often, most of it) is short-cut for one or many reasons. Replacing the missing bits with AI is better than not having them at all.

themaninblack@lemmy.world on 30 Sep 23:25 next collapse

Saved this comment. No notes.

sugar_in_your_tea@sh.itjust.works on 30 Sep 23:58 next collapse

Ideally, the code developer is not the same person that develops the unit tests.

Why? The developer is exactly the person I want writing the tests.

There should also be integration tests written by a separate QA, but unit tests should 100% be the responsibility of the dev making the change.

Replacing the missing bits with AI is better than not having them at all.

I disagree. A bad test is worse than no test, because it gives you a false sense of security. I can identify missing tests with coverage reports, I can’t easily identify bad tests. If I’m working in a codebase with poor coverage, I’ll be extra careful to check for any downstream impacts of my change because I know the test suite won’t help me. If I’m working in a codebase with poor tests but high coverage, I may assume a test pass indicates that I didn’t break anything else.

If a company is going to rely heavily on AI for codegen, I’d expect tests to be manually written and have very high test coverage.

Nalivai@lemmy.world on 01 Oct 00:21 next collapse

Why? The developer is exactly the person I want writing the tests.

It’s better if it’s a different developer, so they don’t know the nuances of your implementation and test functionality only, avoids some mistakes. You’re correct on all the other points.

sugar_in_your_tea@sh.itjust.works on 01 Oct 01:42 next collapse

I really disagree here. If someone else is writing your unit tests, that means one of the following is true:

the tests are written after the code is merged - there will be gaps, and the second dev will be lazy in writing those tests
the tests are written before the code is worked on (TDD) - everything would take twice as long because each dev essentially needs to write the code again, and there’s no way you’re going to consistently cover everything the first time

Devs should write their tests, and reviewers should ensure the tests do a good job covering the logic. At the end of the day, the dev is responsible for the correctness of their code, so this makes the most sense to me.

Nalivai@lemmy.world on 01 Oct 15:56 collapse

the tests are written after the code is merged - there will be gaps, and the second dev will be lazy in writing those tests

I don’t really see how this follows. Why do the second one necessary have to be lazy, and what stops the first one from being lazy as well.
The reason I like it to be different people is so there are two sets of eyes looking at the same problem without the need for doing a job twice. If you miss something while implementing, it’s easier for you to miss it during test writing. It’s very hard to switch to testing the concept and not the specific implementation, but if you weren’t the one implementing it, you’re not “married” to the code and it’s easier for you to spot the gaps.

sugar_in_your_tea@sh.itjust.works on 01 Oct 23:26 collapse

Devs are more invested in code they wrote themselves. When I’m writing tests for something I didn’t write, I’m less personally invested in it. Looking at PRs by other devs when we do pushes for improving coverage, I’m not alone here. That’s just human psychology, you care more about things you built than things you didn’t.

I think testing should be an integral part of the dev process. I don’t think any code should be merged until there are tests proving its correctness. Having someone else write the tests encourages handing tests to jr devs since they’re “lower priority.”

MangoCats@feddit.it on 01 Oct 03:05 collapse

I’m mixed on unit tests - there are some things the developer will know (white box) about edge cases etc. that others likely wouldn’t, and they should definitely have input on those tests. On the other hand, independence of review is a very important aspect of “harnessing the power of the team.” If you’ve got one guy who gathers the requirements, implements the code, writes the tests, and declares the requirements fulfilled, that better be one outstandingly brilliant guy with all the time on his hands he needs to do the jobs right. If you’re trying to leverage the talents of 20 people to make a better product, having them all be solo-virtuoso actors working independently alongside each other is more likely to create conflict, chaos, duplication, and massive holes of missed opportunities and unforeseen problems in the project.

Nalivai@lemmy.world on 01 Oct 16:15 collapse

independence of review is a very important aspect of “harnessing the power of the team.”

Yep, that’s basically my rationale

MangoCats@feddit.it on 01 Oct 03:01 collapse

but unit tests should 100% be the responsibility of the dev making the change.

True enough

A bad test is worse than no test

Also agree, if your org has trimmed to the point that you’re just making tests to say you have tests, with no review as to their efficacy, they will be getting what they deserve soon enough.

If a company is going to rely heavily on AI for anything I’d expect a significant traditional human employee backstop to the AI until it has a track record. Not “buckle up, we’re gonna try somethin’” track record, more like two or three full business cycles before starting to divest of the human capital that built the business to where it is today. Though, if your business is on the ropes and likely to tank anyway… why not try something new?

Was a story about IBM letting thousands of workers go, replacing them with AI… then hiring even more workers in other areas with the money saved from the AI retooling. Apparently they let a bunch of HR and other admin staff go and beefed up on sales and product development. There are some jobs that you want more predictable algorithms in than potentially biased people, and HR seems like an area that could have a lot of that.

Nalivai@lemmy.world on 01 Oct 00:22 collapse

Replacing the missing bits with AI is better than not having them at all.

Nah, bullshit tests that pretend to be tests but are essentially “if true == true then pass” is significantly worse than no test at all.

MangoCats@feddit.it on 01 Oct 03:12 collapse

bullshit tests that pretend to be tests but are essentially “if true == true then pass” is significantly worse than no test at all.

Sure. But, unsupervised developers who: write the code, write their own tests, change companies every 18 months, are even more likely to pull BS like that than AI is.

You can actually get some test validity oversight out of AI review of the requirements and tests, not perfect, but better than self-supervised new hires.

Nalivai@lemmy.world on 01 Oct 16:21 collapse

You can actually get some test validity oversight out of AI review

You also will get some bullshit out of it. If you’re in a situation when you can’t trust your developers because they’re changing companies every 18 months, and you can’t even supervise your untrustworthy developers, then you sure as shit can’t trust whatever LLM will generate you. At least your flock of developers will bullshit you predictably to save time and energy, with LLM you have zero ideas where lies will come from, and those will be inventive lies.

MangoCats@feddit.it on 01 Oct 18:15 collapse

I work in a “tight” industry where we check ALL our code. By contrast, a lot of places I have visited - including some you would think are fairly important like medical office management and gas pump card reader software makers - are not tight, not tight at all. It’s a matter of moving the needle, improving a bad situation. You’ll never achieve “perfect” on any dynamic non-trivial system, but if you can move closer to it for little or no cost?

Of course, when I interviewed with that office management software company, they turned me down - probably because they like their culture the way it is and they were afraid I’d change things with my history of working places for at least 2.5 years, sometimes up to 12, and making sure the code is right before it ships instead of giving their sales reps that “hands on, oooh I see why you don’t like that, I’ll have our people fix that right away - just for you” support culture.

Draces@lemmy.world on 01 Oct 01:19 collapse

What model are you using? I’ve had such a radically different experience but I’ve only bothered with the latest models. The old ones weren’t even worth trying with

sugar_in_your_tea@sh.itjust.works on 01 Oct 01:50 collapse

I’ll have to check, we have a few models hosted at our company and I forget the exact versions and whatnot. They’re relatively recent, but not the highest end since we need to host them locally.

But the issue here isn’t directly related to which model it is, but to the way LLMs work. They cannot reason, they can only give believable output. If the goal is code coverage, it’ll get coverage, but not necessarily be well designed.

If both the logic and the tests are automated, humans will be lazy and miss stuff. If only the logic is generated, humans can treat the code as a black box and write good tests that way. Humans will be lazy with whatever is automated, so if I have to pick one to be hand written, it’ll be the code that ensures the logic is correct.

wesley@yall.theatl.social on 01 Oct 02:09 collapse

We’re mandated to use it at my work. For unit tests it can really go wild and it’ll write thousands of lines of tests to cover a single file/class for instance whereas a developer would probably only write a fourth as much. You have to be specific to get any decent output from them like “write a test for this function and use inputs x and y and the expected output is z”

Personally I like writing tests too and I think through what test cases I need based on what the code is supposed to do. Maybe if there are annoying mocks that I need to create I’ll let the AI do that part or something.

sugar_in_your_tea@sh.itjust.works on 01 Oct 15:28 collapse

Generating tests like that would take longer than writing the tests myself…

Nobody is going to thoroughly review thousands of lines of test code.

jjjalljs@ttrpg.network on 30 Sep 22:20 next collapse

One of the guys at my old job submitted a PR with tests that basically just mocked everything, tested nothing. Like,

with patch("something.whatever", return_value=True):
  assert whatever(0) is True
  assert whatever(1) is True

Except for a few dozen lines, with names that made it look like they were doing useful.

He used AI to generate them, of course. Pretty useless.

DarkDarkHouse@lemmy.sdf.org on 30 Sep 23:19 next collapse

True, I do feel mocked by this code.

MangoCats@feddit.it on 30 Sep 23:29 collapse

We have had guys submit tests like that, long before AI was a thing.

SparroHawc@lemmy.zip on 01 Oct 21:36 collapse

At least in those situations, the person writing the tests knows they’re not testing anything…

MangoCats@feddit.it on 02 Oct 01:16 collapse

Some do, some don’t, but more importantly: most just don’t care.

I had a tester wander into a set of edge cases which weren’t 100% properly handled and their first reaction was “gee, maybe I didn’t see that, it sounds like I’m going to have a lot more work because I did.”

Flamekebab@piefed.social on 30 Sep 22:53 collapse

I’ve seen it generate working unit tests plenty. In the sense that they pass.

…they do not actually test the functionality.
Of course that function returns what you’re asserting - you overwrote its actual output and checked against that!

peoplebeproblems@midwest.social on 30 Sep 22:09 next collapse

“No Duh,” say senior developers everywhere.

I’m so glad this was your first line in the post

Frozengyro@lemmy.world on 30 Sep 23:21 next collapse

No duh, says a layman who never wrote code in his life.

samus12345@sh.itjust.works on 30 Sep 23:26 next collapse

dylanmorgan@slrpnk.net on 30 Sep 23:37 next collapse

Oddly enough, my grasp of coding is probably the same as the guy in the middle but I still know that LLM generated code is garbage.

samus12345@sh.itjust.works on 30 Sep 23:50 next collapse

Yeah, I actually considered putting the same text on all 3, but we gotta put the idiots that think it’s great somewhere! Maybe I should have put it with the dumbest guy instead.

dylanmorgan@slrpnk.net on 01 Oct 01:21 next collapse

Yeah, there’s definitely morons out there who never bothered to even read about the theory of good code design.

vividspecter@aussie.zone on 01 Oct 03:14 next collapse

Think this one needs a bimodal curve with the two peaks representing the “caught up in the hype” average coder and the realistic average coder.

samus12345@sh.itjust.works on 01 Oct 23:56 collapse

Agreed, that’s why it didn’t feel quite right when I made it.

theterrasque@infosec.pub on 02 Oct 05:04 collapse

I guess I’m one of the idiots then, but what do I know. I’ve only been coding since the 90s

Digit@lemmy.wtf on 02 Oct 00:41 next collapse

All the best garbage to learn from, to debug, debug, debug, sharpening those skills.

theterrasque@infosec.pub on 02 Oct 05:03 collapse

That’s kinda wrong though. I’ve seen llm’s write pretty good code, in some cases even doing something clever I hadn’t thought of.

You should treat it as any junior though, and read the code changes and give feedback if needed.

jj4211@lemmy.world on 01 Oct 16:59 next collapse

Thing is both statements can be true.

Used appropriately and in the right context, LLMs can accelerate some select work.

But the hype level is ‘human replacement is here (or imminent, depending on if the company thinks the audience is willing to believe yet or not)’. Recently Anthropic suggested someone could just type ‘make a slack clone’ and it’ll all be done and perfect.

Digit@lemmy.wtf on 02 Oct 00:39 collapse

Heh. That’s a fun chart. If that’s programming aptitude, I scored 80 on that part of the broad spectrum aptitude test I got a sneak-peek chance to do several parts of. Well now I know why I’m so easily in agreement with “senior coders”, if it is programming aptitude quotient. If it’s just iq, … pulls hood up to block the glare.

Daunting that there may be a middling bias getting apparent advantages. Evolution may not serve us well like that.

Digit@lemmy.wtf on 02 Oct 00:34 collapse

And many between “seniour developers everywhere” and “a layman who never wrote code in his life”.

Like me, I’m saying it too. A big ol “No duh”.

Disbelieve the hype.

SparroHawc@lemmy.zip on 01 Oct 18:41 collapse

If not to editorialize, what else is the text box for? :)

z3rOR0ne@lemmy.ml on 30 Sep 22:35 next collapse

Even though this shit was apparent from day fucking 1, at least the Tech Billionaires were able to cause mass layoffs, destroy an entire generation of new programmers’ careers, introduce an endless amount of tech debt and security vulnerabilities, all while grifting investors/businesses and making billions off of all of it.

Sad excuses for sacks of shit, all of them.

Prove_your_argument@piefed.social on 01 Oct 01:30 collapse

Look on the bright side, in a couple of years they will come crawling back to us, desperate for new things to be built so their profit machines keep profiting.

Current ML techniques literally cannot replace developers for anything but the most rudimentary of tasks.

I wish we had true apprenticeships out there for development and other tech roles.

Dojan@pawb.social on 30 Sep 23:17 next collapse

I miss the days when machine learning was fun. Poking together useless RNN models with a small dataset to make a digital Trump that talked about banging his daughter, end endless nipples flowing into America. Exploring the latent space between concepts.

dylanmorgan@slrpnk.net on 30 Sep 23:36 next collapse

The most immediately understandable example I heard of this was from a senior developer who pointed out that LLM generated code will build a different code block every time it has to do the same thing. So if that function fails, you have to look at multiple incarnations of the same function, rather than saying “oh, let’s fix that function in the library we built.”

kescusay@lemmy.world on 01 Oct 02:38 collapse

Yeah, code bloat with LLMs is fucking monstrous. If you use them, get used to immediately scouring your code for duplications.

jj4211@lemmy.world on 01 Oct 10:36 collapse

Yeah if I use it and it generatse more than 5 lines of code, now I just immediately cancel it out because I know it’s not worth even reading. So bad at repeating itself and falling to reasonably break things down in logical pieces…

With that I only have to read some of it’s suggestions, still throw out probably 80% entirely, and fix up another 15%, and actually use 5% without modification.

kescusay@lemmy.world on 01 Oct 11:01 collapse

There are tricks to getting better output from it, especially if you’re using Copilot in VS Code and your employer is paying for access to models, but it’s still asking for trouble if you’re not extremely careful, extremely detailed, and extremely precise with your prompts.

And even then it absolutely will fuck up. If it actually succeeds at building something that technically works, you’ll spend considerable time afterwards going through its output and removing unnecessary crap it added, fixing duplications, securing insecure garbage, removing mocks (God… So many fucking mocks), and so on.

I think about what my employer is spending on it a lot. It can’t possibly be worth it.

COASTER1921@lemmy.ml on 01 Oct 00:29 next collapse

AI companies and investors are absolutely overhyping its capabilities, but if you haven’t tried it before I’d strongly recommend doing so. For simple bash scripts and Python it almost always gets something workable first try, genuinely saving time.

AI LLMs are pretty terrible for nearly every other task I’ve tried. I suspect it’s because the same amount of quality training data just doesn’t exist for other fields.

expr@programming.dev on 01 Oct 02:04 next collapse

Actually typing out code has literally never been the bottleneck. It’s a vanishingly small amount of what we do. An experienced engineer can type out bash or Python scripts without so much as blinking. And better yet, they can do it without completely fabricating commands and library functions.

The hard part is truly understanding what it is you’re trying to do in the first place, and that fundamentally requires a level of semantic comprehension that LLMs do not in any way possess.

It’s very much like the “no code” solutions of yesteryear. They sound great on paper until you’re faced with the reality of the buggy, unmaintainable nightmare pile of spaghetti code that they vomit into your repo.

LLMs are truly a complete joke for software development tasks. I remain among the top 3-4 developers in terms of speed and output at my workplace (and all of the fastest people refuse to use LLMs as well), and I don’t create MRs chock full of bullshit that has to be ripped out (fucking sick of telling people to delete absolutely useless tests that do nothing but slow down our CI pipeline). The slowest people are those that keep banging their head against the LLM for “efficiency” when it’s anything but.

It’s the fucking stupidest trend I’ve seen in my career and I can’t wait until people finally wake up and realize it’s both incredibly inefficient and incredibly wasteful.

Badabinski@kbin.earth on 01 Oct 02:49 next collapse

Oh god, please don't use it for Bash. LLM-generated Bash is such a fucking pot of horse shit bad practices. Regular people have a hard enough time writing good Bash, and something trained on all the fucking crap on StackOverflow and GitHub is inevitably going to be so bad...

Signed, a senior dev who is the "Bash guy" for a very large team.

flux@lemmy.ml on 01 Oct 09:07 collapse

The problem isn’t the tool, it’s the user: they don’t know if they’re getting good code or not, therefore they cannot make the prompt to improve it.

In my view the problems occur when using AI to do something you don’t already know how to do.

mcv@lemmy.zip on 01 Oct 08:56 collapse

I’ve found it’s pretty good at refactoring existing code to use a different but well-supported and well documented library. It’s absolutely terrible for a new and poorly documented library.

I recently tried using Copilot with Claude to implement something in a fairly young library, and did get the basics working, including a long repetitive string of “that doesn’t work, I’m getting error msg [error]”. Seven times of that, and suddenly it worked! I was quite amazed, though it failed me in many other ways with that library (imagining functions and options that don’t exist). But then redoing the same thing in the older, better supported library, it got it right on the first try.

But maybe the biggest advantage of AI coding is that it allows me to code when my brain isn’t fully engaged. Of course the risk there is that my brain might not fully engage because of the AI.

uncle_moustache@sh.itjust.works on 01 Oct 00:32 next collapse

The good news is: AI is a lot less impressive than it seemed at first.

The bad news is: so are a lot of jobs.

aesthelete@lemmy.world on 01 Oct 00:38 next collapse

It turns every prototyping exercise into a debugging exercise. Even talented coders often suck ass at debugging.

Somecall_metim@lemmy.dbzer0.com on 01 Oct 00:42 next collapse

I am jack’s complete lack of surprise.

DarkDarkHouse@lemmy.sdf.org on 01 Oct 00:53 next collapse

The biggest value I get from AI in this space is when I get handed a pile of spagehtti and ask for an initial overview.

jj4211@lemmy.world on 01 Oct 10:38 collapse

I thought that as well and got some code from someone that left the company and asked it to comment it.

It did the obvious “x= 5 // assign 5 to x” crap comments and then it got to the actually confusing part and just skipped that mess entirely…

favoredponcho@lemmy.zip on 01 Oct 01:51 next collapse

Glad someone paid a bunch of worthless McKinsey consultants what I could’ve told you myself

StefanT@lemmy.world on 01 Oct 06:40 collapse

It is not worthless. My understanding is that management only trusts sources that are expensive.

jj4211@lemmy.world on 01 Oct 10:32 collapse

Yep, going through that at work, they hired several consultant companies and near as I can tell, they just asked employees how the company was screwing up, we largely said the same things we always say to executives, they repeated them verbatim, and executives are now praising the insight on how to fix our business…

M0oP0o@mander.xyz on 01 Oct 01:57 next collapse

Wait, it was hyped? Not just ridiculed?

jj4211@lemmy.world on 01 Oct 10:56 collapse

A VP and his taking head sycophants at my work has not shut up about it. They went through some trouble to automatically measure employee use of the AI and made it a performance measure. So now we have it generate code so we don’t get fired and mostly throw it away.

M0oP0o@mander.xyz on 01 Oct 15:00 collapse

Oh, I did not know hell was hiring.

popekingjoe@lemmy.world on 01 Oct 02:14 next collapse

Oh wow. No shit. Anyway!

kadaverin0@lemmy.dbzer0.com on 01 Oct 03:07 next collapse

Imagine if we did “vibe city infrastructure”. Just throw up a fucking suspension bridge and we’ll hire some temps to come in later to find the bad welds and missing cables.

vrighter@discuss.tchncs.de on 01 Oct 04:47 next collapse

it’s slowing you down. The solution to that is to use it in even more places!

Wtf was up with that conclusion?

poopkins@lemmy.world on 01 Oct 09:48 collapse

I don’t think it’s meant to be a conclusion. The article serves as a recap of several reports and studies about the effectivity of LLMs with coding, and the final quote from Bain & Company was a counterpoint to the previous ones asserting that productivity gains are minimal at best, but also that measuring productivity is a grey area.

RagingRobot@lemmy.world on 01 Oct 05:05 next collapse

I have been vibe coding a whole game in JavaScript to try it out. So far I have gotten a pretty ok game out of it. It’s just a simple match three bubble pop type of thing so nothing crazy but I made a design and I am trying to implement it using mostly vibe coding.

That being said the code is awful. So many bad choices and spaghetti code. It also took longer than if I had written it myself.

So now I have a game that’s kind of hard to modify haha. I may try to setup some unit tests and have it refactor using those.

mcv@lemmy.zip on 01 Oct 07:41 next collapse

Sounds like vibecoders will have to relearn the lessons of the past 40 years of software engineering.

CheeseNoodle@lemmy.world on 01 Oct 09:26 collapse

As with every profession every generation… only this time on their own because every company forgot what employee training is and expects everyone to be born with 5 years of experience.

jaykrown@lemmy.world on 01 Oct 09:32 collapse

Wait, are you blaming AI for this, or yourself?

RagingRobot@lemmy.world on 01 Oct 12:03 collapse

Blaming? I mean it wrote pretty much all of the code. I definitely wouldn’t tell people I wrote it that way haha.

Goldholz@lemmy.blahaj.zone on 01 Oct 05:07 next collapse

No shit sherlock!

elbiter@lemmy.world on 01 Oct 06:32 next collapse

AI coding is the stupidest thing I’ve seen since someone decided it was a good idea to measure the code by the amount of lines written.

ellohir@lemmy.world on 01 Oct 09:38 next collapse

More code is better, obviously! Why else would a website to see a restaurant menu be 80Mb? It’s all that good, excellent code.

Slotos@feddit.nl on 02 Oct 07:16 collapse

It did solve my impostor syndrome though. Turns out a bunch of people I saw to be my betters were faking it all along.

MrScottyTay@sh.itjust.works on 01 Oct 07:21 next collapse

I use AI as an entryway to learning or for finding the name or technique that I’m thinking of but can’t remember or know it’s name so then i can look elsewhere for proper documentation. I would never have it just blindly writing code.

Sadly search engines getting shitter has sort of made me have to use it to replace them.

Then it’s also good to quickly parse an error for anything obviously wrong.

Fyrnyx@kbin.melroy.org on 01 Oct 09:18 next collapse

But will something be done about it?

NOooOoOoOoOoo. As long as it is still the new shiny toy for techbros and executive-bros to tinker with, it'll continue.

donalonzo@lemmy.world on 01 Oct 09:36 next collapse

LLMs work great to ask about tons of documentation and learn more about high-level concepts. It’s a good search engine.

The code they produce have basically always disappointed me.

jj4211@lemmy.world on 01 Oct 10:24 next collapse

I sometimes get up to five lines of viable code. Then upon occasion what should have been a one liner tries to vomit all over my codebase. The best feature about AI enabled IDE is the button to decline the mess that was just inflicted.

In the past week I had two cases I thought would be “vibe coding” fodder, blazingly obvious just tedious. One time it just totally screwed up and I had to scrap it all. The other one generated about 4 functions in one go and was salvageable, though still off in weird ways. One of those was functional, just nonsensical. It had a function to check whether a certain condition was present or not, but instead of returning a boolean, it passed a pointer to a string and set the string to “” to indicate false… So damn bizarre, hard to follow and needlessly more lines of code, which is another theme of LLM generated code.

nightlily@leminal.space on 01 Oct 11:07 collapse

On proprietary products, they are awful. So many hallucinations that waste hours. A manager used one on a code review of mine and only admitted it after I spent the afternoon chasing it.

zaphod@sopuli.xyz on 01 Oct 14:54 next collapse

Not even proprietary, just niche things. In other words anything that’s rarely used in open source code, because there’s nothing to train the models on.

Jason2357@lemmy.ca on 01 Oct 15:46 collapse

Those happen so often. I’ve taken to stop calling them hallucinations anymore (that’s anthropomorphising and over-selling what LLMs do imho). They are statistical prediction machines, and either they hit their practical limits of predicting useful output, or we just call it broken.

I think the next 10 years are going to be all about learning what LLMs are actually good for, and what they are fundamentally limited at no matter how much GPU ram we throw at it.

Feyd@programming.dev on 02 Oct 08:50 collapse

~~Hallucinations~~ bullshit

jaykrown@lemmy.world on 01 Oct 09:39 next collapse

I’ve found success using more powerful LLMs to help me create applications using the Rust programming language. If you use a weak LLM and ask it to do something very difficult you’ll get bad results. You still need to have a fundamental understanding of good coding practices. Using an LLM to code doesn’t replace the decision making.

jj4211@lemmy.world on 01 Oct 10:27 collapse

Based on my experience with claude sonnet and gpt4/5… It’s a little useful but generally annoying and fails more often than works.

I do think moderate use still comes out ahead, as it saves a bunch of typing when it does work, but I still get annoyed at the blatantly stupid suggestions I keep having to decline.

jaykrown@lemmy.world on 01 Oct 21:40 collapse

I remember GPT 4 being useless and constantly giving wrong information. Now with newer models they’ve become significantly more useful, especially when prompted to be extremely careful and to always double check to ensure the best response.

skulkbane@lemmy.world on 01 Oct 11:14 next collapse

Im not super surprised, but AI has been really useful when it comes to learning or giving me a direction to look into something more directly.

Im not really an advocate for AI, but there are some really nice things AI can do. And i like to test the code quality of the models i have access to.

I always ask for a ftp server and dns server, to check what it can do and they work surprisingly well most of the time.

drmoose@lemmy.world on 01 Oct 11:15 next collapse

I code with LLMs every day as a senior developer but agents are mostly a big lie. LLMs are great for information index and rubber duck chats which already is incredible feaute of the century but agents are fundamentally bad. Even for Python they are intern-level bad. I was just trying the new Claude and instead of using Python’s pathlib.Path it reinvented its own file system path utils and pathlib is not even some new Python feature - it has been de facto way to manage paths for at least 3 years now.

That being said when prompted in great detail with exact instructions agents can be useful but thats not what being sold here.

After so many iterations it seems like agents need a fundamental breakthrough in AI tech is still needed as diminishing returns is going hard now.

umbraroze@slrpnk.net on 01 Oct 12:05 next collapse

Oh yes. The Great pathlib. The Blessed pathlib. Hallowed be it and all it does.

I’m a Ruby girl. A couple of years ago I was super worried about my decision to finally start learning Python seriously. But once I ran into pathlib, I knew for sure that everything will be fine. Take an everyday headache problem. Solve it forever. Boom. This is how standard libraries should be designed.

HugeNerd@lemmy.ca on 01 Oct 12:45 next collapse

I disagree. Take a routine problem and invent a new language for it. Then split it into various incompatible dialects, and make sure in all cases it requires computing power that no one really has.

namingthingsiseasy@programming.dev on 01 Oct 15:56 collapse

Pathlib is very nice indeed, but I can understand why a lot of languages don’t do similar things. There are major challenges implementing something like that. Cross-platform functionality is a big one, for example. File permissions between Unix systems and Windows do not map perfectly from one system to another which can be a maintenance burden.

But I do agree. As a user, it feels great to have. And yes, also in general, the things Python does with its standard library are definitely the way things should be done, from a user’s point of view at least.

Jason2357@lemmy.ca on 01 Oct 15:37 next collapse

If it wasn’t for all the AI hype that it’s going to do everyone’s job, LLMs would be widely considered an amazing advancement in computer-human interaction and human assistance. They are so much better than using a search engine to parse web forums and stack overflow, but that’s not going to pay for investing hundreds of billions into building them out. My experience is like yours - I use AI chat as a huge information index mainly, and helpful sounding board occasionally, but it isn’t much good beyond that.

Feyd@programming.dev on 02 Oct 08:43 collapse

They are so much better than using a search engine to parse web forums and stack overflow,

The hallucinations (more accurately bullshitting) and the fact they have to get new training data but are discouraging people from engaging in the structures that do so make this highly debatable

Jason2357@lemmy.ca on 02 Oct 13:48 collapse

I agree that it is certainly debatable. However, my experience has been that information extracted about, say what may cause a strange error message from some R output, has been at least as reliable as random stack overflow posts - however, I get that answer instantly rather than after significant effort with a search engine. It can often find actual links better than a search engine for esoteric problems as well. This, however is merely a relative improvement, and not some world-changing event like AI boosters will claim, and it’s one of the only use-cases where AI provides a clear advantage. Generating broken code isn’t useful to me.

jj4211@lemmy.world on 01 Oct 16:05 collapse

I will concur with the whole ‘llm keeps suggesting to reinvent the wheel’

And poorly. Not only did it not use a pretty basic standard library to do something, it’s implementation is generally crap. For example it offered up a solution that was hard coded to IPv4, and the context was very ipv6 heavy

JackbyDev@programming.dev on 01 Oct 16:50 collapse

I have a theory that it’s partly because a bunch of older StackOverflow answers have more votes than newer ones using new features. More referring to not using relatively new features as much as it should.

korazail@lemmy.myserv.one on 01 Oct 17:17 collapse

I’d wager that the votes are irrelevant. Stock overflow is generously <50% good code and is mostly people saying ‘this code doesn’t work – why?’ and that is the corpus these models were trained on.

I’ve yet to see something like a vibe coding livestream where something got done. I can only find a lot of ‘tutorials’ that tell how to set up tools. Anyone want to provide one?

I could… possibly… imagine a place where someone took quality code from a variety of sources and generate a model that was specific to a single language, and that model was able to generate good code, but I don’t think we have that.

Vibe coders: Even if your code works and seems to be a success, do you know why it works, how it works? Does it handle edge cases you didn’t include in your prompt? Does it expose the database to someone smarter than the LLM? Does it grant an attacker access to the computer it’s running on, if they are smarter than the LLM? Have you asked your LLM how many 'r’s are in strawberry?

At the very least, we will have a cyber-security crisis due to vibe coding; especially since there seems to be a high likelihood of HR and Finance vibe coders who think they can do the traditional IT/Dev work without understanding what they are doing and how to do it safely.

HugeNerd@lemmy.ca on 01 Oct 12:00 next collapse

www.youtube.com/watch?v=VsE0BwQ3l8U&t=1492s

And the band plays on

Hastur@lemmy.ca on 01 Oct 15:42 next collapse

EVERYTHING about AI is overhyped. Its people trying to cash in on the latest buzz/trend.

andros_rex@lemmy.world on 01 Oct 15:47 next collapse

So when the AI bubble burst, will there be coding jobs available to clean up the mess?

aidan@lemmy.world on 01 Oct 16:15 next collapse

I mean largely for most of us I hope. But I feel like the tech sector was oversatured because of all the hype of it being an easy get rich quick job. Which for some people it was.

Alaknar@sopuli.xyz on 01 Oct 17:51 collapse

There already are. People all over LinkedIn are changing their titles to “AI Code Cleanup Specialist”.

AmericanEconomicThinkTank@lemmy.world on 01 Oct 16:05 next collapse

I would say absolutely in the general sense nost people, and the salesmen, frame them in.

When I was invited to assist with the GDC development, I got a chance to partner with a few AI developers and see the development process firsthand, try my hand at it myself, and get my hands on a few low parameter models for my own personal use. It’s really interesting just how capable some models are in their specific use-cases. However, even high param. models easily become useless at the drop of a hat.

I found the best case, one that’s rarely done mind you, is integrate the model into a program that has the ability to call a known database. With a properly trained model to format output in both natural language and use a given database for context calls, and concrete information, the qualitative performance leaps ahead by bounds. Problem is, that requires so much customization it pretty much ends up being something a capable hobbyist would do, it’s just not economically sound for a business to adopt.

Deflated0ne@lemmy.world on 01 Oct 16:07 next collapse

According to Deutsche Bank the AI bubble is a the pillar of our economy now.

So when it pops. I guess that’s kinda apocalyptic.

Edit - strikethrough

hroderic@lemmy.world on 01 Oct 16:37 collapse

Only for taxpayers ☝️

bitjunkie@lemmy.world on 01 Oct 16:15 next collapse

I’d much rather write my own bugs to have to waste hours fixing, thanks.

JackbyDev@programming.dev on 01 Oct 16:48 next collapse

The people talking about AI coding the most at my job are architects and it drives me insane.

ceiphas@feddit.org on 01 Oct 17:08 next collapse

I am a software architect, an mainly usw it to refactor my own old code… But i am maybe not a typical architect…

JackbyDev@programming.dev on 01 Oct 20:30 collapse

I don’t really care if people use it, it’s more that it feels like a quarter of our architect meeting presentations are about something AI related. It’s just exhausting.

kattfisk@lemmy.dbzer0.com on 02 Oct 10:31 collapse

Software architects that don’t write code are worse than useless

arc99@lemmy.world on 01 Oct 17:10 next collapse

I have never seen an AI generated code which is correct. Not once. I’ve certainly seen it broadly correct and used it for the gist of something. But normally it fucks something up - imports, dependencies, logic, API calls, or a combination of all them.

I sure as hell wouldn’t trust to use it without reviewing it thoroughly. And anyone stupid enough to use it blindly through “vibe” programming deserves everything they get. And most likely that will be a massive bill and code which is horribly broken in some serious and subtle way.

hietsu@sopuli.xyz on 01 Oct 18:06 next collapse

How is it not correct if the code successfully does the very thing that was prompted?

F.ex. in my company we don’t have any real programmers but have built handful of useful tools (approx. 400-1600 LOC, mainly Python) to do some data analysis, regex stuff to cleanup some output files, index some files and analyze/check their contents for certain mistakes, dashboards to display certain data, etc.

Of course the apps may not have been perfect after the very first prompt, or even compiled, but after iterating an error or two, and explaining an edge case or two, they’ve started to perform flawlessly, saving tons of work hours per week. So how is this not useful? If the code creates results that are correct, doesn’t that make the app itself technically ”correct” too, albeit likely not nearly as optimized as equivalent human code would be.

LaMouette@jlai.lu on 01 Oct 18:24 next collapse

It’s not bad for your use case but going beyond that without issues and actual developpers to fix the vibe code is not yet possible for llms

arc99@lemmy.world on 01 Oct 18:52 next collapse

If the code doesn’t compile, or is badly mangled, or uses the wrong APIs / imports or forgets something really important then it’s broken. I can use AI to inform my opinion and sometimes makes use of what it outputs but critically I know how to program and I know how to spot good and bad code.

I can’t speak for how you use it, but if you don’t have any real programmers and you’re iterating until something works then you could be producing junk and not know it. Maybe it doesn’t matter in your case if its a bunch for throwaway scripts and helpers but if you have actual code in production where money, lives, reputation, safety or security are at risk then it absolutely does.

hietsu@sopuli.xyz on 04 Oct 08:10 collapse

I disagree on the junk part: I see it so that if the output of the program are working, the logic must be flawless (just maybe not optimized when it comes to efficiency). Of course in our case the inputs are highly structured and it is easy for humans to spot errors in the output files so this ”iterate until outputs are perfect” has worked great, and yield huge savings in workhours. In our case none of the tools are exposed outside so in very worst case user may just crash the app.

But yeah I agree building any public frontend or anything business critical is likely the way to doom.

maskofdaisies@lemmy.dbzer0.com on 01 Oct 21:47 collapse

To add on to what others have said, vibe coding is ushering in a new golden age for black hat hackers. If someone is rely entirely on AI to generate code they likely don’t understand what the code they have is actually doing. This tends to lead to an app that works correctly for what the prompted specified but behaves badly the instant it has to handle anything outside of the prompt, like a malformed request or data outside the prompted parameters. As a result these apps tend to be easy to exploit by malicious actors, often in ways the original prompter never thought of.

korazail@lemmy.myserv.one on 02 Oct 01:23 collapse

I think this is what will kill vibe coding, but not before there’s significant damage done. Junior developers will be let go and senior devs will be told they have to use these tools instead and to be twice as efficient. At some point enough major companies will have had data breaches through AI-generated code that they all go back to using people, but there will be tons of vulnerable code everywhere. And letting Cursor touch your codebase for a year, even with oversight, will make it really tricky to find all the places it subtly fucked up.

ikirin@feddit.org on 01 Oct 19:15 next collapse

I’ve seen and used AI for snippets of code and it’s pretty decent at that.

With my colleagues I always compare it to a battery powered drill. It’s very powerful and can make shit a lot easier. But you’d not try to build furniture from scratch with only a battery powered drill.

You need the knowledge to use it - and also saws, screws, the proper bits for those screws and so on and so forth.

setsubyou@lemmy.world on 01 Oct 21:02 collapse

What bothers me the most is the amount of tech debt it adds by using outdated approaches.

For example, recently I used AI to create some python scripts that use polars and altair to parse some data and draw charts. It kept insisting to bring in pandas so it could convert the polars dataframes to pandas dataframes just for passing them to altair. When I told if that altair can use polars dataframes directly, that helped, but two or three prompts later it would try to solve problems by adding the conversion again.

This makes sense too, because the training material, on average, is probably older than the change that enabled altair to use polars dataframes directly. And a lot of code out there just only uses pandas in the first place.

The result is that in all these cases, someone who doesn’t know this would probably be impressed that the scripts worked, and just not notice the extra tech debt from that unnecessary dependency on pandas.

It sounds like it’s not a big deal, but these things add up and eventually, our AI enhanced code bases will be full of additional dependencies, deprecated APIs, unnecessarily verbose or complicated code, etc.

I feel like this is one aspect that gets overlooked a bit when we talk about productivity gains. We don’t necessarily immediately realize how much of that extra LoC/time goes into outdated code and old fashioned verbosity. But it will eventually come back to bite us.

Auli@lemmy.ca on 01 Oct 19:35 next collapse

Eh I had it write a program that finds my PCs ip and sends it to the Unifi gateway to change a rule. Worked fine but I guess technically it is mostly using the go libraries written by someone else.

theterrasque@infosec.pub on 01 Oct 21:49 next collapse

I’ve used Claude code to fix some bugs and add some new features to some of my old, small programs and websites. Not things I can’t do myself, but things I can’t be arsed to sit down and actually do.

It’s actually gone really well, with clean and solid code. easily readable, correct, with error handling and even comments explaining things. It even took a gui stream processing program I had and wrote a server / webapp with the same functionality, and was able to extend it with a few new features I’ve been thinking to add.

These are not complex things, but a few of them were 20+ files big, and it manage to not only navigate the code, but understand it well enough to add features with the changes touching multiple files (model, logic, view layer for example, or refactor a too big class and update all references to use the new classes).

So it’s absolutely useful and capable of writing good code.

chicagohuman@lemmy.zip on 01 Oct 21:58 next collapse

This is the truth. It has tremendous value but it isn’t a solution – it’s a tool. And if you don’t know how to code or what good code looks like, then it is a tool you can’t use!

Corridor8031@lemmy.ml on 01 Oct 22:40 collapse

would you deploy this server?

bountygiver@lemmy.ml on 02 Oct 08:00 collapse

for me it typically don’t cause syntax errors, but the main thing it fucks up is what you specifically told them to do, where the output straight up does not perform the way your specification requires. If it’s just some syntax errors at least the compiler can catch them, this you won’t even know if you don’t bother testing the output.

OmegaMan@lemmings.world on 01 Oct 17:12 next collapse

Writing apps with AI seems pretty cooked. But I’ve had some great successes using GitHub copilot for some annoying scripting work.

Canconda@lemmy.ca on 01 Oct 17:21 next collapse

AI is works well for mindless tasks. Data formatting, rough drafts, etc.

Once a task requires context and abstract thinking, AI can’t handle it.

OmegaMan@lemmings.world on 01 Oct 17:59 next collapse

Eh, I don’t know. As long as you can break it down into smaller sub-tasks, AI can do some complex stuff. Just have to figure out where the line is. I’ve nudged it along into reading multiple LENGTHY API documentation pages and written some fairly complex scripting logic.

Feyd@programming.dev on 02 Oct 08:48 collapse

Trusting it to have not fucked with your data while formatting it is pretty bold.

NikkiDimes@lemmy.world on 01 Oct 17:29 collapse

I think it’s useful for writing mundane snippets I’ve written a million times or helping me with languages I’m less familiar with, but anything more compex becomes pretty spaghetti pretty quick.

badgermurphy@lemmy.world on 01 Oct 17:24 next collapse

I work adjacent to software developers, and I have been hearing a lot of the same sentiments. What I don’t understand, though, is the magnitude of this bubble then.

Typically, bubbles seem to form around some new market phenomenon or technology that threatens to upset the old paradigm and usher in a new boom. Those market phenomena then eventually take their place in the world based on their real value, which is nowhere near the level of the hype, but still substantial.

In this case, I am struggling to find examples of the real benefits of a lot of these AI assistant technologies. I know that there are a lot of successes in the AI realm, but not a single one I know of involves an LLM.

So, I guess my question is, “What specific LLM tools are generating profits or productivity at a substantial level well exceeding their operating costs?” If there really are none, or if the gains are only incremental, then my question becomes an incredulous, “Is this biggest in history tech bubble really composed entirely of unfounded hype?”

TipsyMcGee@lemmy.dbzer0.com on 01 Oct 17:56 next collapse

When the AI bubble bursts, even janitors and nurses will lose their jobs. Financial institutions will go bust.

SparroHawc@lemmy.zip on 01 Oct 18:25 next collapse

From what I’ve seen and heard, there are a few factors to this.

One is that the tech industry right now is built on venture capital. In order to survive, they need to act like they’re at the forefront of the Next Big Thing in order to keep bringing investment money in.

Another is that LLMs are uniquely suited to extending the honeymoon period.

The initial impression you get from an LLM chatbot is significant. This is a chatbot that actually talks like a person. A VC mogul sitting down to have a conversation with ChatGPT, when it was new, was a mind-blowing experience. This is a computer program that, at first blush, appears to be able to do most things humans can do, as long as those things primarily consist of reading things and typing things out - which a VC, and mid/upper management, does a lot of. This gives the impression that AI is capable of automating a lot of things that previously needed a live, thinking person - which means a lot of savings for companies who can shed expensive knowledge workers.

The problem is that the limits of LLMs are STILL poorly understood by most people. Despite constructing huge data centers and gobbling up vast amounts of electricity, LLMs still are bad at actually being reliable. This makes LLMs worse at practically any knowledge work than the lowest, greenest intern - because at least the intern can be taught to say they don’t know something instead of feeding you BS.

It was also assumed that bigger, hungrier LLMs would provide better results. Although they do, the gains are getting harder and harder to reach. There needs to be an efficiency breakthrough (and a training breakthrough) before the wonderful world of AI can actually come to pass because as it stands, prompts are still getting more expensive to run for higher-quality results. It took a while to make that discovery, so the hype train was able to continue to build steam for the last couple years.

Now, tech companies are doing their level best to hide these shortcomings from their customers (and possibly even themselves). The longer they keep the wool over everyone’s eyes, the more money continues to roll in. So, the bubble keeps building.

badgermurphy@lemmy.world on 01 Oct 21:24 collapse

The upshot of this and a lot of the other replies I see here and elsewhere seem to suggest that one big difference between this bubble and other past ones is that with this most recent one, there is so much of the global economy now tied to the fate of this bubble that the entire financial world is colluding to delay the inevitable due to the expected severity of the consequences.

leastaction@lemmy.ca on 01 Oct 19:49 next collapse

AI is a financial scam. Basically companies that are already mature promise great future profits thanks to this new technological miracle, which makes their stock more valuable than it otherwise would be. Cory Doctorow has written eloquently about this.

JcbAzPx@lemmy.world on 01 Oct 22:31 next collapse

This struck upon one of the greatest wishes of all corporations. A way to get work without having to pay people for it.

brunchyvirus@fedia.io on 02 Oct 09:13 collapse

I think right now companies are competing until they're only 1 or 2 that clearly own the majority of the market.

Afterwards they will devolve back into the same thing search engines are now. A cesspool of sponsored ads and links to useless SEO blogs.

They'll just become gate keepers of information again and the only ones that will be heard are the ones who pay a fee or game the system.

Maybe not though, I'm usually pretty cynical when it comes to what the incentives of businesses are.

MIXEDUNIVERS@discuss.tchncs.de on 01 Oct 17:35 next collapse

i use it for programming arduinos for my smarthome. its pretty nice but also aggravating.

Blackmist@feddit.uk on 01 Oct 18:25 next collapse

Of course. Shareholders want results, and not just results for nVidia’s bottom line.

ChaoticEntropy@feddit.uk on 01 Oct 21:32 next collapse

Are you trying to tell me that the people wanting to sell me their universal panacea for all human endeavours were… lying…? Say it ain’t so.

SparroHawc@lemmy.zip on 01 Oct 21:38 collapse

I mean, originally they thought they had come upon a magic bullet. Turns out it wasn’t the case, and now they’re going to suffer for it.

Feyd@programming.dev on 02 Oct 08:29 collapse

You’re assuming honesty and they’ve earned the opposite posture.

Aljernon@lemmy.today on 01 Oct 22:04 next collapse

Senior Management in much of Corporate America is like a kind of modern Nobility in which looking and sounding the part is more important than strong competence in the field. It’s why buzzwords catch like wildfire.

Jankatarch@lemmy.world on 02 Oct 14:48 collapse

Lmao calling nobility would imply we can’t vote our senior management and it often ends up being whoever “the king” wants or one of king’s children.
Wait…

Corridor8031@lemmy.ml on 01 Oct 22:50 next collapse

I think maybe a good comparison is to written papers/ assignments. It can generate those just like it can generate code.

But it is not about the words themself, but about the content.

MrSulu@lemmy.ml on 01 Oct 22:50 next collapse

Perhaps it should read “All AI is over hyped, over done and we should be over it”

melfie@lemy.lol on 01 Oct 22:57 next collapse

This article sums up a Stanford study of AI and developer productivity. TL;DR - net productivity boost is a modest 15-20%, or as low as negative to 10% in complex, brownfield codebases. This tracks with my own experience as a dev.

linkedin.com/…/does-ai-actually-boost-developer-p…

sadness_nexus@lemmy.ml on 01 Oct 23:03 next collapse

I’m not a programmer in any sense. Recently, I made a project where I used python and raspberry pi and had to train some small models on a KITTI data set. I used AI to write the broad structure of the code, but in the end, it took me a lot of time going through python documentation as well as the documentation of the specific tools/modules I used to actually get the code working. Would an experienced programmer get the same work done in an afternoon? Probably. But the code AI output still had a lot of flaws. Someone who knows more than me would probably input better prompts and better follow up requirements and probably get a better structure from the AI, but I doubt they’ll get a complete code. In the end, even to use AI, you have to know what you’re doing to use AI efficiently and you still have to polish the code into something that actually works.

spicehoarder@lemmy.zip on 02 Oct 02:00 collapse

From my experience, AI just seems to be a lesson in overfitment. You can’t use it to do things nobody has done before. Furthermore, you only really get good responses from prompts related to Javascript

ready_for_qa@programming.dev on 02 Oct 00:03 next collapse

These types of articles always fail to mention how well trained the developers were on techniques and tools. In my experience that makes a big difference.

My employer mandates we use AI and provides us with any model, IDE, service we ask for. But where it falls short is providing training or direction on ways to use it. Most developers seem to go for results prompting and get a terrible experience.

I on the other hand provide a lot of context through documents and various mcp tooling, I talk about the existing patterns in the codebase and provide sources to other repositories as examples, then we come up with an implementation plan and execute on it with a task log to stay on track. I spend very little time fixing bad code because I spent the setup time nailing down context.

So if a developer is just prompting “Do XYZ”. It’s no wonder they’re spending more time untangling a random mess.

Another aspect is that everyone seems to always be working under the gun and they just don’t have the time to figure out all the best practices and techniques on their own.

I think this should be considered when we hear things like this.

korazail@lemmy.myserv.one on 02 Oct 01:09 collapse

I have 3 questions, and I’m coming from a heavily AI-skeptic position, but am open:

Do you believe that providing all that context, describing the existing patterns, creating an implementation plan, etc, allows the AI to both write better code and faster than if you just did it yourself? To me, this just seems like you have to re-write your technical documentation in prose each time you want to do something. You are saying this is better than ‘Do XYZ’, but how much twiddling of your existing codebase do you need to do before an AI can understand the business context of it? I don’t currently do development on an existing codebase, but every time I try to get these tools to do something fairly simple from scratch, they just flail. Maybe I’m just not spending the hours to build my AI-parsable functional spec. Every time I’ve tried this, asking something as simple as (and paraphrased for brevity) “write an Asteroids clone using JavaScript and HTML 5 Canvas” results in a full failure, even with multiple retries chasing errors. I wrote something like that a few years ago to learn Javascript and it took me a day-ish to get something that mostly worked.
Speaking of that context. Are you running your models locally, or do you have some cloud service? If you give your entire codebase to a 3rd party as context, how much of your company’s secret sauce have you disclosed? I’d imagine most sane companies are doing something to make their models local, but we see regular news articles about how ChatGPT is training on user input and leaking sensitive data if you ask it nicely and I can’t imagine all the pro-AI CEOs are aware of the risks here.
How much pen-testing time are you spending on this code, error handling, edge cases, race conditions, data sanitation? An experienced dev understands these things innately, having fixed these kinds of issues in the past and knows the anti-patterns and how to avoid them. In all seriousness, I think this is going to be the thing that actually kills AI vibe coding, but it won’t be fast enough. There will be tons of new exploits in what used to be solidly safe places. Your new web front-end? It has a really simple SQL injection attack. Your phone app? You can tell it your username is admin’joe@google.com and it’ll let you order stuff for free since you’re an admin.

I see a place for AI-generated code, for instant functions that do something blending simple and complex. “Hey claude, write a function to take a string and split it at the end of every sentence containing an uppercase A”. I had to write weird functions like that constantly as a sysadmin, and transforming data seems like a thing an AI could help me accelerate. I just don’t see that working on a larger scale, though, or trusting an AI enough to allow it to integrate a new function like that into an existing codebase.

ready_for_qa@programming.dev on 02 Oct 12:32 collapse

Thank you for reading my comment. I’m on the train headed to work and I’ll try to answer completely. I love talking about this stuff.

Do you believe that providing all that context, describing the existing patterns, creating an implementation plan, etc, allows the AI to both write better code and faster than if you just did it yourself?

For my work, absolutely. My work is a lot of tickets that were setup from multiple stories and multiple epics. It would be like asking me if I am really framing a house faster with a nail gun and compressor. If I were just hanging up a picture or a few pictures in the hallway, it’s probably faster to use a hammer than to set up the compressor and nail gun, plus cleanup.

However, a lot of that documentation already exists by the time it gets to me. All of the Design Documents and Product Requirement Documents have already been formed, discussed, and approved by our architecture team and team leads. Imagine if you already had this documentation for the asteroid game; how much better do you think your LLM would do? Maybe this is the benefit of using LLMs for development at an established company. Btw, a lot of those Documents were also created with the assistance of AI by the Product Team, Architects, and Principle/Staff/Leads anyway.

how much twiddling of your existing codebase do you need to do before an AI can understand the business context of it?

With the help of our existing documents and codebase(s) I feel I dont have any issues with the model knowing what we’re doing. I do have to set up my own context for how I want it to be done. To me this is like explaining to a Junior Engineer what I need them to help me with. If you’re familiar with “Know when to Direct, when to Delegate, or when to Develop” I would say it lands in between Direct and Delegate. I have markdown files with my rules and guidelines and provide that as context. I use Augment Code which is pretty good with codebase context.

write an Asteroids clone using JavaScript and HTML 5 Canvas

I would try “Let’s plan out the steps needed to write an Asteroids game using JavaScript and HTML 5. Identify and explain each step of the development plan. The game must build with no errors, be playable, and pass all tests. Do not wrote code at this time until our plan is approved” Then once it comes back with the initial steps, I would guide it further if needed. Finally I would approve the plan and tell it to execute while tracking it’s steps (Augment Code uses a task log).

Are you running your models locally, or do you have some cloud service? If you give your entire codebase to a 3rd party as context, how much of your company’s secret sauce have you disclosed?

We are required to use the frontier models that my employer has contracts with and are forbidden from using local models. In our enterprise contracts we have negotiated for no training on our data. I imagine we pay for that. I’m not involved in that level of interaction on the accounts.

How much pen-testing time are you spending on this code, error handling, edge cases, race conditions, data sanitation? An experienced dev understands these things innately, having fixed these kinds of issues in the past and knows the anti-patterns and how to avoid them. In all seriousness, I think this is going to be the thing that actually kills AI vibe coding, but it won’t be fast enough. There will be tons of new exploits in what used to be solidly safe places.

We have other teams that handle a lot of these tasks. These teams are also using AI tools to get the job done. In addition, we have static testing tools on our repo like CodeRabbit and another one I can’t remember the name of that looks specifically for security concerns. It will comment on the PR directly and our merge would be blocked until handled. Code coverage for testing is at 85% or it blocks the merge and we have a full QA department of Analysts and SDETs to QA. In addition to that we still have human approvals required (2 devs + Sr+). All of these people involved are still using AI tools to help them in each step.

I hope that answers your questions and gives you some insight into how I’ve found success in my experience with it. I will say that on my personal projects I don’t go this far with process and I don’t experience the same AI output that I do at work.

korazail@lemmy.myserv.one on 02 Oct 14:32 collapse

Thanks for your reply, and I can still see how it might work.

I’m curious if you have any resources that do some end-to-end examples. This is where I struggle. If I have an atomic piece of code I need and I can maybe get it started with a LLM and finish it by hand, but anything larger seems to just always fail. So far the best video I found to try a start-to-finish demo was this: www.youtube.com/watch?v=8AWEPx5cHWQ

He spends plenty of time describing the tools and how to use them, but when we get to the actual work, we spend 20 minutes telling the LLM that it’s doing stuff wrong. There’s eventually a prototype, but to get there he had to alternate between ‘I still can’t jump’ and ‘here’s the new error.’ He eventually modified code himself, so even getting a ‘mario clone’ running requires an actual developer and the final result was underwhelming at best.

For me, a ‘game’ is this tiny product that could be a viable unit. It doesn’t need to talk to other services, it just needs to react to user input. I want to see a speed-run of someone using LLMs to make a game that is playable. It doesn’t need to be “fun”, but the video above only got to the ‘player can jump and gets game over if hitting enemy’ stage. How much extra effort would it take to make the background not flat blue? Is there a win condition? How to refactor this so that the level is not hard-coded? Multiple enemy types? Shoot a fireball that bounces? Power Ups? And does doing any of those break jump functionality again? How much time do I have to spend telling the LLM that the fireball still goes through the floor and doesn’t kill an enemy when it hits them?

I could imagine that if the LLM was handed a well described design document and technical spec that it could do better, but I have yet to see that demonstrated. Given what it produces for people publishing tutorials online, I would never let it handle anything business critical.

The video is an hour long, and spends about 20 minutes in the middle actually working on the project. I probably couldn’t do better, but I’ve mostly forgotten my javascript and HTML canvas. If kaboom.js was my focus, though, I imagine I could knock out what he did in well under 20 minutes and have a better architected design that handled the above questions.

I’ve, luckily, not yet been mandated that I embed AI into my pseudo-developer role, but they are asking.

altphoto@lemmy.today on 02 Oct 01:09 next collapse

Its great for stupid boobs like me, but only to get you going. It regurgitates old code, it cannot come up with new stuff. Lately there have been less Python errors, but again the stuff you can do is limited. At least for the free stuff that you can get without signing up.

Smokeless7048@lemmy.world on 02 Oct 03:33 next collapse

Yea, I use it for home assistant, it’s amazingly powerful… And so incredibly dumb

It will take my if and statements, and shrunk it to 1/3 the length, while being twice as to robust… While missing that one of the arguments is entirely in the wrong place.

theterrasque@infosec.pub on 02 Oct 08:18 collapse

It regurgitates old code, it cannot come up with new stuff.

The trick is, most of what you write is basically old code in new wrapping. In most projects, I’d say the new and novel part is maybe 10% of the code. The rest is things like setting up db models, connecting them to base logic, set up views, api endpoints, decoding the message on the ui part, displaying it to user, handling input back, threading things so UI doesn’t hang, error handling, input data verification, basic unit tests, set up settings, support reading them from a file or env vars, making UI look not horrible, add translatable text, and so on and on and on. All that has been written in some variation a million times before. All can be written (and verified) by a half-asleep competent coder.

The actual new interesting part is gonna be a small small percentage of the total code.

altphoto@lemmy.today on 02 Oct 11:23 collapse

I totally agree with this. However, you can’t get there without coding experience and knowledge of the problem as well as education in computer science or experience in the field. I’m a generalist, I’m loving what I can do at home. But I still get the run around using AI. I have to read and understand the code to try to nudge the AI in the right direction or I’ll end up going in circles if I don’t.

Lettuceeatlettuce@lemmy.ml on 02 Oct 03:53 next collapse

You mean relying blindly on a statistical prediction engine to attempt to produce sophisticated software without any understanding of the underlying principles or concepts doesn’t magically replace years of actual study and real-world experience?

But trust me, bro, the singularity is imminent, LLMs are the future of human evolution, true AGI is nigh!

I can’t wait for this idiotic “AI” bubble to burst.

Tollana1234567@lemmy.today on 02 Oct 04:18 next collapse

so is the profit it was foretold to generate, but it actually costs money than its actually generating.

chaosCruiser@futurology.today on 02 Oct 11:48 next collapse

About that “net slowdown”. I think it’s true, but only in specific cases. If the user already knows well how to write code, an LLM might be only marginally useful or even useless.

However, there are ways to make it useful, but it requires specific circumstances. For example, you can’t be bothered to write a simple loop, you can use and LLM to do it. Give the boring routine to an LLM, and you can focus on naming the variables in a fitting way or adjusting the finer details to your liking.

Can’t be bothered to look up the exact syntax for a function you use only twice a year? Let and LLM handle that, and tweak the details. Now, you didn’t spend 15 minutes reading stack overflow posts that don’t answer the exact question you had in mind. Instead, you spent 5 minutes on the whole thing, and that includes the tweaking and troubleshooting parts.

If you have zero programming experience, you can use an LLM to write some code for you, but prepare to spend the whole day troubleshooting something that is essentially a black box to you. Alternatively, you could ask a human to write the same thing in 5-15 minutes depending on the method they choose.

BilboBargains@lemmy.world on 02 Oct 12:04 collapse

This is a sane way to use LLM. Also, pick your poison, some bots are better than others for a specific task. It’s kinda fascinating to see how other people solve coding problems and that is essentially on tap with a bot, it will churn out as many examples as you want. It’s a really useful tool for learning syntax and libraries of unfamiliar languages.

On one extreme side of LLM there is this insane hype and at the other extreme a great pessimism but in the middle is a nice labour saving educational tool.

Evotech@lemmy.world on 02 Oct 13:42 collapse

For most large projects, writing the code is the easy part anyway.

Jankatarch@lemmy.world on 02 Oct 14:42 collapse

Writing new code is easier than editing someone else’s code but editing a portion is still better than writing the entire program again from start to end.

Then there is LLMs which force you to edit the entire thing from start to end.