AI chatbots were tasked to run a tech company. They built software in under seven minutes — for less than $1. (www.businessinsider.com)
from shish_mish@lemmy.world to technology@lemmy.world on 11 Sep 2023 14:49
https://lemmy.world/post/4864491

threaded - newest

autotldr@lemmings.world on 11 Sep 2023 14:50 next collapse

This is the best summary I could come up with:

AI chatbots like OpenAI’s ChatGPT can operate a software company in a quick, cost-effective manner with minimal human intervention, a new study has found.

Based on the waterfall model — a sequential approach to creating software — the company was broken down into four different stages, in chronological order: designing, coding, testing, and documenting.

After assigning ChatDev 70 different tasks, the study found that the AI-powered company was able to complete the full software development process “in under seven minutes at a cost of less than one dollar,” on average — all while identifying and troubleshooting “potential vulnerabilities” through its “memory” and “self-reflection” capabilities.

“Our experimental results demonstrate the efficiency and cost-effectiveness of the automated software development process driven by CHATDEV,” the researchers wrote in the paper.

The study’s findings highlight one of the many ways powerful generative AI technologies like ChatGPT can perform specific job functions.

Nevertheless, the study isn’t perfect: Researchers identified limitations, such as errors and biases in the language models, that could cause issues in the creation of software.

The original article contains 639 words, the summary contains 172 words. Saved 73%. I’m a bot and I’m open source!

Pistcow@lemm.ee on 11 Sep 2023 15:02 next collapse

But did it work?

KoboldCoterie@pawb.social on 11 Sep 2023 15:07 next collapse

The study said 86.66% of the generated software systems were “executed flawlessly.”

But…

Nevertheless, the study isn’t perfect: Researchers identified limitations, such as errors and biases in the language models, that could cause issues in the creation of software. Still, the researchers said the findings “may potentially help junior programmers or engineers in the real world” down the line.

scarabic@lemmy.world on 11 Sep 2023 15:17 next collapse

So… they failed 13.34% of their own unit tests?

hayes_@sh.itjust.works on 11 Sep 2023 15:21 collapse

That’s a B+! Fire all our engineers immediately.

some tech CEO, somewhere

LazaroFilm@lemmy.world on 11 Sep 2023 15:38 collapse

Better than CyberPunk at release.

[deleted] on 11 Sep 2023 15:21 next collapse

KoboldCoterie@pawb.social on 11 Sep 2023 15:30 collapse

And when the reviews are terrible and end users start reporting unreal quantities of bugs, they’ll fire the junior devs. They should have fixed those!

radix@lemmy.world on 11 Sep 2023 15:47 collapse

🎵🎵 99 little bugs in the code, 99 bugs in the code, Fix one bug, compile it again, 101 little bugs in the code. 101 little bugs in the code, 101 bugs in the code, Fix one bug, compile it again, 103 little bugs in the code. 🎵🎵

ArbiterXero@lemmy.world on 11 Sep 2023 15:09 next collapse

As someone that uses ChatGPT daily for boilerplate code because it’s super helpful…

I call complete bullshite

The program here will be “hello world” or something like that.

ipha@lemm.ee on 11 Sep 2023 15:19 next collapse

“hello world” as a service?

SpaceNoodle@lemmy.world on 11 Sep 2023 16:26 collapse

github.com/…/hello-world-as-a-service

kitonthenet@kbin.social on 11 Sep 2023 15:28 next collapse

I can totally see the use case for boilerplate, but I’m also very very rarely writing new classes from scratch or whatever.

As always, proof of concept or gtfo

Semi-Hemi-Demigod@kbin.social on 11 Sep 2023 15:28 next collapse

It's great for things like "How do I write this kind of loop in this language" but when I asked it for something more complex like a class or a big-ish function it hallucinates. But it makes for a very fast way to get up to speed in a new language

LazaroFilm@lemmy.world on 11 Sep 2023 15:37 next collapse

Yea I ask it to show me examples of how to solve specific tasks. Not a whole app.

SpaceNoodle@lemmy.world on 11 Sep 2023 16:27 collapse

So just a little more time-consuming than just reading the online documentation.

Semi-Hemi-Demigod@kbin.social on 11 Sep 2023 16:46 collapse

It's a lot less in my opinion, because you can just ask it a question rather than having to read and interpret things. Every programming tutorial in every language is going to waste my time explaining how loops and conditionals work, when all I want is how this language does them.

Vlyn@lemmy.zip on 11 Sep 2023 18:06 collapse

Seriously?

If I google for example:

how to do loops in c#

The first result is www.w3schools.com/cs/cs_for_loop.php

In the time it took me to get to that ChatGPT would still be writing its reply.

Semi-Hemi-Demigod@kbin.social on 11 Sep 2023 18:41 collapse

Right, but you can't give it the variable names you're using and have it fill them in, and if you want to do something inside that loop with

I can ask ChatGPT "Write me a loop in C# that will add the variable value_increase to the variable current_value and exit when current_value is equal to or greater than the variable limit_value, with all the variables being floats"

You won't find that answer immediately on the Internet, and you're more likely to make errors synthesizing the new syntax.

But you do you, I'll keep using ChatGPT and looking like a miracle worker.

Vlyn@lemmy.zip on 11 Sep 2023 18:48 next collapse

If writing simple loops with ChatGPT makes you a miracle worker then you might have other problems than AI.

And even simple things break down when you ask it about using library functions (it likes to hallucinate heavily there).

Semi-Hemi-Demigod@kbin.social on 11 Sep 2023 18:52 collapse

It's not that writing loops does it, it's that I can ask ChatGPT to hand me pre-assembled parts that I can snap together instead of typing them out with my squishy human fingers. And I can do it for pretty much any language without too many syntax errors.

Vlyn@lemmy.zip on 11 Sep 2023 19:06 collapse

I’m a senior software developer (Currently .NET backend with DevOps). Writing code is probably less than 10% of my work day. And in that 10% Visual Studio autocomplete does most of the typing. It’s frequently wrong, but it’s good enough plenty of the times.

Actually working on software consists of writing specifications, security concerns, architecture, talking management out of dumb decisions, having meetings with stakeholders or other companies, working on automatic deployments, writing unit and integration tests, refactoring, performance optimizations, database migrations, bugfixing, …

Green field writing new code is rare and that’s mainly what AI can do (80% correct, maybe). Most of real programming work happens on existing code.

Semi-Hemi-Demigod@kbin.social on 11 Sep 2023 19:17 collapse

I'm not saying AI will write entire applications, but it is really useful at writing small bits of code for a human being to assemble which can greatly improve productivity.

Though if we could get it to handle stakeholder meetings I'll never use it for programming again.

Kerfuffle@sh.itjust.works on 11 Sep 2023 20:27 collapse

Right, but you can’t give it the variable names you’re using and have it fill them in, and if you want to do something inside that loop with

Why are you actively trying to avoid learning how to write the loop? Are you planning to have ChatGPT fill in your loop templates for the rest of your life?

But you do you, I’ll keep using ChatGPT and looking like a miracle worker.

It’s going to be slower overall than just using the reference and learning how to do it. I really, really am skeptical that a developer at the level where they need that feature is going to seem like a miracle worker to anyone other than people who are just impressed when you can do anything with a computer.

Semi-Hemi-Demigod@kbin.social on 11 Sep 2023 20:47 collapse

Why are you actively trying to avoid learning how to write the loop? Are you planning to have ChatGPT fill in your loop templates for the rest of your life?

First, how is this different from having your IDE fill in your loop templates?

Second, no, of course I learn how to do it and then copy/paste from my existing code like a normal person.

Third, this is much more customizable. The example I gave is pretty simple, but you can explain algorithms to ChatGPT and have it figure it out.

Finally, I'm usually doing this for a customer in a language I'll never use again. Last week it was LabView. My role has me writing proofs-of-concept for customers frequently so I'm not going to learn something I'll never use again.

It’s going to be slower overall than just using the reference and learning how to do it.

Not when you're not familiar with the syntax and don't have an IDE set up for it.

other than people who are just impressed when you can do anything with a computer.

This happens in my job a lot more than I'm comfortable with.

Kerfuffle@sh.itjust.works on 12 Sep 2023 07:03 collapse

First, how is this different from having your IDE fill in your loop templates?

I don’t do that actually, but I think there are some differences.

One is if there’s a loop template in your IDE, you know it’s going to work. With LLMs you have to double check stuff (or just have it be wrong some of the time).
You don’t have to type in a bunch of instructions to use a loop template. You also don’t really have to wait for the filled in template to get generated.
People don’t usually use that because they just don’t know how to write the loop themselves, it’s a convenience feature.

That said:

I’m usually doing this for a customer in a language I’ll never use again.

Maybe you’re the one in a million exception where this approach is a benefit. Most of the time when you talk to people on the internet, they’re going to assume you’re a reasonably typical case and not the extremely rare exception.

Ertebolle@kbin.social on 11 Sep 2023 15:33 next collapse

OTOH, if you take that hello world program and ask it to compose a themed cocktail menu around it, it'll cheerfully do that for you.

LazaroFilm@lemmy.world on 11 Sep 2023 15:36 collapse

Absolutely I can create a code for your app.

void myApp(void) {
  // add the code for your app here
  return true;
}

You may need to change the code above to fit your needs. Make sure you replace the comment with the proper code for your app to work.

whileloop@lemmy.world on 11 Sep 2023 16:15 collapse

Couldn’t even write a void method right, return true!

LazaroFilm@lemmy.world on 11 Sep 2023 17:03 collapse

LMAO. At list it didn’t sudo void… (:

scarabic@lemmy.world on 11 Sep 2023 15:16 collapse

And how long did it take to compose the “assignments?” Humans can work with less precise instructions than machines, usually, and improvise or solve problems along the way or at least sense when a problem should be flagged for escalation and review.

BombOmOm@lemmy.world on 11 Sep 2023 15:08 next collapse

The difficult part of software development has always been the continuing support. Did the chatbot setup a versioning system, a build system, a backup system, a ticketing system, unit tests, and help docs for users. Did it get a conflicting request from two different customers and intelligently resolve them? Was it given a vague problem description that it then had to get on a call with the customer to figure out and hunt down what the customer actually wanted before devising/implementing a solution?

This is the expensive part of software development. Hiring an outsourced, low-tier programmer for almost nothing has always been possible, the low-tier programmer being slightly cheaper doesn’t change the game in any meaningful way.

Puzzle_Sluts_4Ever@lemmy.world on 11 Sep 2023 15:51 next collapse

While I do agree that management is genuinely important in software dev:

If you can rewrite the codebase quickly enough, versioning matters a lot less. Its the idea of “is it faster to just rewrite this function/package than to debug it?” but at a much larger scale. And while I would be concerned about regressions from full rewrites of the code… have you ever used software? Regressions happen near constantly even with proper version control and testing…

As for testing and documentation: This is actually what AI-enhanced tools are good for today. These are the simple tasks you give to junior staff.

Conflicting requests and iterating on descriptions: Have you ever futzed around with chatgpt? That is what it lives off of. Ask a question, then ask a follow up question, and so forth.

I am still skeptical of having no humans in the loop. But all of this is very plausible even with today’s technology and training sets.

Just to add a bit more to that. I don’t think having an AI operated company is a good idea. Even ignoring the legal aspects of it, there is a lot of value to having a human who can make irrational decisions because one customer will pay more in the long run and so forth.

But I can definitely see entire departments being a node in a rack. Customers talk to humans (or a different LLM) which then talk to the “Network Stack” node and the “UI/UX” node and so forth.

Vlyn@lemmy.zip on 11 Sep 2023 18:02 collapse

If you just let it do a full rewrite again and again, what protects against breaking changes in the API? Software doesn’t exist in a vacuum, there might be other businesses or people using a certain API and relying on it. A breaking change could be as simple as the same endpoint now being named slightly differently.

So if you now start to mark every API method as “please no breaking changes for this” at what point do you need a full software developer again to take care of the AI?

I’ve also never seen AI modify an existing code base, it’s always new code getting spit out (80% correct or so, it likes to hallucinate functions that don’t even exist). Sure, for run of the mill templates you can use it, but even a developer who told me on here they rely heavily on ChatGPT said they need to verify all the code it spits out, because sometimes it’s garbage.

In the end it’s a damn language model that uses probability on what the next word should be. It’s fantastic for what it does, but it has no consistent internal logic and the way it works it never will.

Puzzle_Sluts_4Ever@lemmy.world on 11 Sep 2023 18:08 collapse

You are literally describing constraints. They can be applied to an LLM the same way they can be applied to a dev team. And if you have never had to report an API change that breaks functionality… I wish I was you.

And if your full time software engineers are just running a unit test suite all day? … are you hiring?

As for modifcations: Again, have you ever used an LLM? Have a conversation with chatgpt. It will iterate on its responses. That is iterating on code.

In the end it’s a damn language model that uses probability on what the next word should be. It’s fantastic for what it does, but it has no consistent internal logic and the way it works it never will.

And that is demonstrably false and mostly just highlights that you don’t know what you are talking about. Or what language is, for that matter.

Vlyn@lemmy.zip on 11 Sep 2023 18:16 collapse

Mate, I’ve used ChatGPT before, it straight up hallucinates functions if you want anything more complex than a basic template or a simple program. And as things are in programming, if even one tiny detail is wrong, things straight up don’t work. Also have fun putting ChatGPT answers into a real program you might have to compile, are you going to copy code into hundreds of files?

My example was public APIs, you might have an endpoint /v2/device that was generated the first time around. Now external customers/businesses built their software to access this endpoint. Next run around the AI generates /v2/appliance instead, everything breaks (while the software itself and unit tests still seem to work for the AI, it just changed a name).

If you don’t want that change you now have to tell the AI what to name things (or what to keep consistent), who is going to do that? The CEO? The intern? Who writes the perfect specification?

Puzzle_Sluts_4Ever@lemmy.world on 11 Sep 2023 18:19 collapse

Yes. ChatGPT is not perfect. Because it is a general purpose LLM. Stuff like Github CoPilot and other software specific approaches are a LOT better at avoiding all the noise from bad answers on stack overflow and proposals. In large part because they have more focused training data.

But it can still do a remarkably good job so long as you have a human looking at it after the fact. Which… is how I would describe most software engineers I have ever worked with. Even the SSEs need someone to review their code. Which… is what is being described here. Combine that with a gitlab runner and you got yourself a stew.

As for APis and the like: Again, it feels like nobody here has ever actually worked with public software and think regressions don’t exist. But this is literally constraints and would be put in the requirements document that you give either the dev team or the LLM.

As for who is going to make that document: The same people who already do? Management.

Vlyn@lemmy.zip on 11 Sep 2023 18:28 collapse

Management and sound technical specifications, that sounds to me like you’ve never actually worked in a real software company.

You just said what the main problem is: ChatGPT is not perfect. Code that isn’t perfect (compiles + has consistent logic) is worthless. If you need a developer to look over it you’ve already lost and it would be faster to have that developer write the code themselves.

Have you ever gotten a pull request with 10k lines of code? The AI could spit out so much code in an instant, no developer would be able to debug this mess or do a code review. They’ll just click “Approve” and throw it on the giant garbage heap whatever the AI decided to spit out.

If there’s a bug down the line (if you even get the whole thing to run), good luck finding it if no one in your developer team even wrote the code in the first place.

Puzzle_Sluts_4Ever@lemmy.world on 11 Sep 2023 19:13 collapse

Management and sound technical specifications, that sounds to me like you’ve never actually worked in a real software company.

Worked at quite a few. Once you get out of college and start engaging with companies beyond “Ugh, how dare they want me to waste my precious time by talking to people” you start to learn the value of a strong management team.

And, more importantly, where those jira tickets come from.

A bog standard development flow is “all pull requests are linked to a documented issue/ticket. All pull requests require tests to pass, code coverage to not decrease, and approval by a code owner”

How does that work in reality?

Issues/tickets (just going to say issues from here on out) are created by a combination of customer feedback, identified issues by the development team, and directives from on high (which is generally related to the overall roadmap). One or more developers work on a merge request, the person who best understands the appropriate code looks it over, it is tested, and it is merged in. After enough of those cycles happen, a release is prepared and a manager signs off on it.

How does that map to an “AI” based workflow?

Issues/tickets (just going to say issues from here on out) are created by a combination of customer feedback, identified issues by the development team, and directives from on high (which is generally related to the overall roadmap). Because LLMs can provide feedback and uncertainty measurements once you get past Google Bard. And regression testing and nightly performance testing can highlight deficiencies. The issue is put into a template, that includes all existing constraints, and the LLM generates a solution. Someone who understands the code checks to make sure that looks sane, it is tested, and it is merged in. After enough of those cycles happen, a release is prepared and a manger signs off on it.

And then it becomes a question of what level you start requiring humans. Because when I do a code review prior to a Release? I am relying VERY heavily on my team to have been doing their due diligence. I skim through the MRs and look for a few hot spots but it is mostly “Well, Fred and Nancy said this was good and it passes all the tests so…”

You just said what the main problem is: ChatGPT is not perfect. Code that isn’t perfect (compiles + has consistent logic) is worthless. If you need a developer to look over it you’ve already lost and it would be faster to have that developer write the code themselves.

I VEHEMENTLY disagree with this. If you don’t have developers looking over your code then you are not a software engineer. And if it takes them the same amount of time to review code as it does to write it? You aren’t working on interesting problems and are wasting vast amounts of money.

I can farm out a general task of “improve our code coverage” to an intern. They can spend a few days (or even weeks) doing that, and I can review their MRs in a few minutes. If something looks weird, I leave a comment and wait for them to get back to me. All the time I am working on much more interesting problems… or doing the same for my SSEs.

Once you stop worshiping the ground that “developers” walk on (which mostly comes from time and experience) you start to realize how many people spend most of their lives just filling out tickets with no understanding of “Why”. And how much work your management is putting in so that you don’t throw a temper tantrum or break the code base. Which… maps pretty well to an LLM.

Just to make it clear. I am not saying that all developers should strive to be managers. I actively disagree with that.

But if you aren’t interested in how management works? Whether it is because you want a heads up when crunch is coming or want to understand the big picture or just figure out when it is time to get getting? Then you aren’t growing as a developer and are not an engineer. You are a monkey with a typewriter in the basement.

Vlyn@lemmy.zip on 11 Sep 2023 19:23 collapse

You misunderstood, I never said management is worthless. The product managers know what customers want. The product owners keep 8 out of 10 dumb ideas away from the development team. And management again leans on the development team to find out what is actually technically possible and in what time frame.

If management just threw every customer wish into a magic black box to get code out, even if that code was perfect, you wouldn’t have a product. You’d have a pile of steaming crap.

I’ve done plenty of code reviews, they only work if they are small human readable increments. Like they say: A code review of 100 lines might take an hour. A code review of 10000 lines takes thirty minutes.

AI would spit out so much code with missing context for the developer, it would be impossible to properly review.

Puzzle_Sluts_4Ever@lemmy.world on 11 Sep 2023 19:28 collapse

Again: No

if it takes you the same amount of time to review 10k lines versus write 10k lines? Either you are bad at your job or you aren’t working on a meaningful problem. One of the most valuable things an engineer can learn is to ask questions. If this MR is hard to parse? Leave a comment and make the developer improve the documentation or restructure a function or two. And you can do that with LLMs.

And, again, there is no difference between assigning “Implement Feature X” ticket to Stan versus StanAI. If StanAI is writing 500x the amount of code that Stan would? StanAI sucks and needs to be retrained.

And, as it stands? Using tools like CoPilot or even ChatGPT, “StanAI” tends to write more concise AND more readable code. In large part because its training data is weighted by the code that has already gone through code review, was accepted, and may even be part of the production stack on half the planet.

Vlyn@lemmy.zip on 11 Sep 2023 19:44 collapse

You really don’t get the issue. Give real developers pull requests with 10, 100, 1000 and 10000 lines of changed code. I promise you, 100% that the quality on the latter two pull requests will be abysmal. No matter how good you are as a developer, you can be the best of the best, after a few hundred lines of code you’re unfamiliar with you’ll overlook obvious issues.

And let’s be honest, most developers will try to quickly get it done, read over it, hit the approve button and go back to their own work. This is how it works in the real world.

A small pull request with 10 or at most 100 lines will get a lot more scrutiny where developers actually have the mental capacity to think and reason about the code and its context.

If you let AI write a full system, or even a full module at once, spitting that code out, you’ll get large pull requests. Too large to do a meaningful review. It’s like if I threw you a pull request right now for a software you’re not familiar with and it’s 2000 lines of code. How well do you think you’ll do?

Puzzle_Sluts_4Ever@lemmy.world on 11 Sep 2023 19:52 collapse

And you know what you say if someone is submitting 10k SLOC in a pull request?

“Hey Fred, document the hell out of this and split it into multiple MRs”.

And if there is no way to accomplish that ticket without it being a 10k SLOC MR? Then it was a bad ticket and whoever made it failed.

Nothing you have described doesn’t apply to humans too. If anything, StanAI is less likely to throw a temper tantrum if I leave a comment on his MR.

A small pull request with 10 or at most 100 lines will get a lot more scrutiny where developers actually have the mental capacity to think and reason about the code and its context.

Hmm. If only there was a way to conserve that “mental capacity” by offloading the more banal tasks. Hmmm

It’s like if I threw you a pull request right now for a software you’re not familiar with and it’s 2000 lines of code. How well do you think you’ll do?

Horribly. I would also make it a point to never use any software you are responsible for again if you think asking someone who doesn’t understand a code base to review the MR.

Either you have no idea what you are talking about or you are a genuinely horrible manager who has been entirely dependent on having a few “rock star developers” to do your job for you. So… yeah.

Vlyn@lemmy.zip on 11 Sep 2023 19:59 collapse

You can’t have your cake and eat it too. The entire point of AI would be to off-load the development work. You write a specification, throw it into the magic AI box, then get a working code base out.

Why the hell would you invest ten times the amount of organization work to break every feature down into small human sized parts? The AI doesn’t need bite sized tickets like humans do, you can throw a complex 100 page specification at it and get out working code an hour later. But you’ll get out 100k lines of code at once in that case.

You’re treating the AI like a junior developer, give it tiny tickets it can work on, then let a human review the work. The human will do badly because they have no context (they’d have to read the entire specification first, then read the pull request, then try to reason about code that a machine wrote). Reviewing code is always more difficult than writing it, the writing part is easy.

Puzzle_Sluts_4Ever@lemmy.world on 11 Sep 2023 20:04 collapse

Again. If you are not already breaking down every feature into human sized parts, you are a horrible manager. And you seem hellbent on using a specific use case that you would never use in reality because… Frankenstein Complex?

And you continue to assume that the only people who can review a pull request are outside hires with no knowledge of the codebase or problem at all. Which… again, please never work on anything useful.

I’ll say this: If you actively sabotage your employees, they will fail. It doesn’t matter if that is Stan on the third floor or StanAI in the server room.

akrot@lemmy.world on 11 Sep 2023 15:56 next collapse

Absolutely true, but many direction into implementing those solution with AIs.

Knusper@feddit.de on 11 Sep 2023 16:49 next collapse

Yeah, I’m already quite content, if I know upfront that our customer’s goal does not violate the laws of physics.

Obviously, there’s also devs who code more run-of-the-mill stuff, like yet another business webpage, but those are still coded anew (and not just copy-pasted), because customers have different and complex requirements. So, even those are still quite a bit more complex than designing just any Gomoku game.

NoRodent@lemmy.world on 11 Sep 2023 19:23 next collapse

I’m already quite content, if I know upfront that our customer’s goal does not violate the laws of physics.

Haha, this is so true and I don’t even work in IT. For me there’s bonus points if the customer’s initial idea is solvable within Euclidean geometry.

Corkyskog@sh.itjust.works on 11 Sep 2023 19:53 collapse

Now I am curious what the most outlandish request or goal has been so far?

Knusper@feddit.de on 12 Sep 2023 06:17 collapse

Well, as per above, these are extremely complex requirements, so most don’t make for a good story.

One of the simpler examples is that a customer wanted a solution for connecting special hardware devices across the globe, which are normally only connected directly.

Then, when we talked to experts for those devices, we learnt that for security reasons, these devices expect requests to complete within a certain timeframe. No one could tell us what these timeframes usually are, but it certainly sounded like the universe’s speed limit, a.k.a. the speed of light, could get in our way (takes roughly 66 ms to go halfway around the globe).

Eventually, we learned that the customer was actually aware of this problem and was fine with a solution, even if it only worked across short distances. But yeah, we didn’t know that upfront…

doublejay1999@lemmy.world on 11 Sep 2023 18:54 collapse

Which is why plenty of companies merely pay lip service to it, or don’t do it at all and outsource it to ‘communities’

igorlogius@lemmy.world on 11 Sep 2023 15:11 next collapse

Do managment next and lets see who’s gonna be replaced first

mrginger@lemmy.world on 11 Sep 2023 16:06 next collapse

This is who will get replaced first, and they don’t want to see it. They’re the most important, valuable part of the company in their own mind, yet that was the one thing the AI got right, the management part. It still needed the creative mind of a human programmer to do the code properly, or think outside the box.

thanks_shakey_snake@lemmy.ca on 11 Sep 2023 17:03 collapse

They did do management-- They modeled the whole company as individual “staff” communicating with each other: CEO-bot communicates a product direction to the CTO-bot who communicates technical requirements to the developer-bot who asks for a “beautiful user interface” (lol) from the “art designer” (lol).

It’s all super rudimentary and goofy, but management was definitely part of the experiment.

igorlogius@lemmy.world on 11 Sep 2023 17:19 collapse

Sorry, my mistake i kind of misunderstood … but now I wonder which part of the “company” was most easy to replace and where the most and least failure rate/processing was located/necessary.

thanks_shakey_snake@lemmy.ca on 11 Sep 2023 18:28 collapse

It was testing that the code worked, of course :) That was the only place that had human intervention, other than a) providing the initial prompt, and b) providing icons and stuff for the GUI, instead of using generated ones. That was the “get out of jail free” card:

In cases where an interpreter struggles with identifying fine-grained logical issues, the involvement of a human client in software testing becomes optional. CHATDEV enables the human client to provide feedback and suggestions in natural language, similar to a reviewer or tester, using black-box testing or other strategies.

[deleted] on 11 Sep 2023 15:21 next collapse

Nougat@kbin.social on 11 Sep 2023 15:28 next collapse

I've tried to have ChatGPT help me out with some Powershell, and it consistently wanted me to use cmdlets which do not exist for on premise Exchange. I told it as much, it apologized, and wanted me to use cmdlets that don't exist at all.

Large Language Models are not Artificial Intelligence.

dojan@lemmy.world on 11 Sep 2023 15:42 next collapse

I had a weird XAML error I didn’t quite get, and the LLM gave me BS solutions before giving me back my original code.

[deleted] on 11 Sep 2023 15:56 collapse

[deleted] on 11 Sep 2023 15:55 next collapse

amanneedsamaid@sopuli.xyz on 11 Sep 2023 16:34 next collapse

Its glorified autocorrect trying to figure out how words string together coherently.

Lmaydev@programming.dev on 11 Sep 2023 17:06 collapse

There are by definition artificial intelligence.

thorbot@lemmy.world on 11 Sep 2023 16:14 next collapse

This also completely glosses over the fact that AI capable of writing this had huge R&D costs to get to that point and also have ongoing costs associated with running them. This whole article is a fucking joke, probably written by AI

aard@kyu.de on 11 Sep 2023 16:37 next collapse

You meant to say “a competent human”, which a lot of programmers are not.

While I’d expect this to be of rather low quality I’d bet money on having seen worse projects done by actual humans in the last 25 years.

lilShalom@lemmy.basedcount.com on 11 Sep 2023 16:52 collapse

Ive had google bard supply me code to use with a google api url that doesnt exist.

breadsmasher@lemmy.world on 11 Sep 2023 15:21 next collapse

It cost less than a dollar to run all those chatbots?

Doubt

dustyData@lemmy.world on 12 Sep 2023 19:29 collapse

Please ignore the hundreds of thousands of dollars and the corresponding electricity that was required to run the servers and infrastructure required to train and use this models, please. Or the master cracks the whip again, please, just say you’ll invest in our startup, please!

scarabic@lemmy.world on 11 Sep 2023 15:28 next collapse

A test that doesn’t include a real commercial trial or A/B test with real human customers means nothing. Put their game in the App Store and tell us how it performs. We don’t care that it shat out code that compiled successfully. Did it produce something real and usable or just gibberish that passed 86% of its own internal unit tests, which were also gibberish?

kitonthenet@kbin.social on 11 Sep 2023 15:31 next collapse

At the designing stage, the CEO asked the CTO to "propose a concrete programming language" that would "satisfy the new user's demand," to which the CTO responded with Python. In turn, the CEO said, "Great!" and explained that the programming language's "simplicity and readability make it a popular choice for beginners and experienced developers alike."

I find it extremely funny that project managers are the ones chatbots have learned to immitate perfectly, they already were doing the robot’s work: saying impressive sounding things that are actually borderline gibberish

thanks_shakey_snake@lemmy.ca on 11 Sep 2023 17:18 next collapse

What does it even mean for a programming language to “satisfy the new user’s demand?” Like when has the user ever cared whether your app is built in Python or Ruby or Common Lisp?

It’s like “what notebook do I need to buy to pass my exams,” or “what kind of car do I need to make sure I get to work on time?”

Yet I’m 100% certain that real human executives have had equivalent conversations.

realharo@lemm.ee on 12 Sep 2023 08:55 collapse

And ironically Python (with Pygame which they also used) is a terrible choice for this kind of game - they ended up making a desktop game that the user would have to download. Not playable on the web, not usable for a mobile app.

More interestingly, if decisions like these are going to be made even more based on memes and random blogposts, that creates some worrying incentives for even more spambots. Influence the training data, and you’re influencing the decision making. It kind of works like that for people too, but with AI, it’s supercharged to the next level.

[deleted] on 11 Sep 2023 15:36 next collapse

BombOmOm@lemmy.world on 11 Sep 2023 15:46 collapse

The new role of a senior dev will be contract work slicing these Gordian knots.

The amount of money wasted building and destroying these knots is immeasurable. Getting things right the first time takes experienced individuals who know the product well and can anticipate future pain points. Nothing is as expensive as cheap code.

[deleted] on 11 Sep 2023 15:53 collapse

gencha@feddit.de on 11 Sep 2023 16:03 next collapse

What a load of bullshit. If you have a group of researchers provide “minimal human input” to a bunch of LLMs to produce a laughable program like tic-tac-toe, then please just STFU or at least don’t tell us it cost $1. This doesn’t even have the efficiency of a Google search. This AI hype needs to die quick

[deleted] on 11 Sep 2023 16:10 collapse

atzanteol@sh.itjust.works on 11 Sep 2023 16:05 next collapse

This research seems to be more focused on whether the bots would interoperate in different roles to coordinate on a task than about creating the actual software. The idea is to reduce “halucinations” by providing each bot a more specific task.

The paper goes into more about this:

Similar to hallucinations encountered when using LLMs for natural language querying, directly generating entire software systems using LLMs can result in severe code hallucinations, such as incomplete implementation, missing dependencies, and undiscovered bugs. These hallucinations may stem from the lack of specificity in the task and the absence of cross-examination in decision- making. To address these limitations, as Figure 1 shows, we establish a virtual chat -powered software tech nology company – CHATDEV, which comprises of recruited agents from diverse social identities, such as chief officers, professional programmers, test engineers, and art designers. When presented with a task, the diverse agents at CHATDEV collaborate to develop a required software, including an executable system, environmental guidelines, and user manuals. This paradigm revolves around leveraging large language models as the core thinking component, enabling the agents to simulate the entire software development process, circumventing the need for additional model training and mitigating undesirable code hallucinations to some extent.

turmacar@kbin.social on 11 Sep 2023 16:23 collapse

I assume the endgame of this is the boardroom suggestion ~~guy~~ bot asking "is this based on real facts? / does this actually function?"

[deleted] on 11 Sep 2023 16:06 next collapse

blazera@kbin.social on 11 Sep 2023 16:07 next collapse

Researchers, for example, tasked ChatDev to "design a basic Gomoku game," an abstract strategy board game also known as "Five in a Row."

What tech company is making Connect Four as their business model?

realharo@lemm.ee on 12 Sep 2023 08:01 collapse

This is also the kind of task you would expect it to be great at - tutorial-friendly project for which there are tons of examples and articles written online, that guide the reader from start to finish.

The kind of thing you would get a YouTube tutorial for in 2016 with title like “make [thing] in 10 minutes!”. (see www.google.com/search?q=flappy+bird+in+10+minutes)

Other things like that include TODO lists (which is even used as a task for framework comparisons), tile-based platformer games, wordle clones, flappy bird clones, chess (including online play and basic bots), URL shorteners, Twitter clones, blogging CMSs, recipe books and other basic CRUD apps.

I wasn’t able to find a list of tasks in the linked paper, but based on the gomoku one, I suspect a lot of it will be things like these.

Knusper@feddit.de on 11 Sep 2023 16:27 next collapse

the CTO responded with Python. In turn, the CEO said, “Great!” and explained that the programming language’s “simplicity and readability make it a popular choice for beginners and experienced developers alike.”

Yep, that does sound like my CEO.

theluddite@lemmy.ml on 11 Sep 2023 16:48 next collapse

“I gave an LLM a wildly oversimplified version of a complex human task and it did pretty well”

For how long will we be forced to endure different versions of the same article?

The study said 86.66% of the generated software systems were “executed flawlessly.”

Like I said yesterday, in a post celebrating how ChatGPT can do medical questions with less than 80% accuracy, that is trash. A company with absolute shit code still has virtually all of it “execute flawlessly.” Whether or not code executes it not the bar by which we judge it.

Even if it were to hit 100%, which it does not, there’s so much more to making things than this obviously oversimplified simulation of a tech company. Real engineering involves getting people in a room, managing stakeholders, navigating conflicting desires from different stakeholders, getting to know the human beings who need a problem solved, and so on.

LLMs are not capable of this kind of meaningful collaboration, despite all this hype.

PlexSheep@feddit.de on 11 Sep 2023 17:18 next collapse

Thank you for writing this so I only have to ~~upvore~~ upvote you.

Edit: What the difference between one key can be

nul@programming.dev on 11 Sep 2023 17:25 next collapse

I don’t know what an upvore is and I don’t want to know.

NoRodent@lemmy.world on 11 Sep 2023 19:14 collapse

Is it… vore but… upwards? So… vomiting people? Nah, I don’t want to know either.

Transcendant@lemmy.world on 11 Sep 2023 20:12 next collapse

What’s up, vore!

AFAIK vore is a rare fetish where someone gains sexual gratification from imagining swallowing someone whole (or imagining themselves being swallowed whole). Like the Bilquis scenes from American Gods, which I found oddly arousing.

Oh fuck.

RiikkaTheIcePrincess@kbin.social on 11 Sep 2023 21:05 collapse

Well, there are different kinds. Not all involve swallowing a critter whole, not all involve death, not all involve, er, mouths.

Hey wait, where's everyone going? Oh well, more vore for me 🤣Guess I should go check out American Gods. ... And look for a particular kind of place to hang out 🤔

Transcendant@lemmy.world on 11 Sep 2023 21:25 collapse

It’s not for everyone, but I loved it and was saddened that the show got cancelled. It’s very surreal in places, the settings switch from standard middle America to jaw-droppingly-stunning god realm stuff.

Kerfuffle@sh.itjust.works on 11 Sep 2023 20:20 collapse

If I got vored, promptly being upvored seems like the best case scenario.

Absolutemehperson@lemmy.world on 11 Sep 2023 17:28 collapse

I only have to upvore you

holy music stops

thantik@lemmy.world on 11 Sep 2023 18:07 next collapse

AI regularly hallucinates API endpoints that don’t exist, functions that aren’t part of that language, libraries that don’t exist. There’s no fucking way it did any of this bullshit. Like, yeah - it can probably do a mean autocomplete, but this is being pushed so hard because they want to drive wages down even harder. They want know-nothing middle-managers to point to this article and say “I can replace you with AI, get to work!”…that’s the only purpose of this crap.

Corkyskog@sh.itjust.works on 11 Sep 2023 19:47 collapse

I think there is less of a conspiracy, and it’s just pushing investment. These AI articles sound exactly like when the internet was new and most people only had a cursory experience with it and people were pumping any company if they just said the word internet.

Now that “Blockchain” has been beaten to death, they need a new hype word to drive mindless investment.

c0mbatbag3l@lemmy.world on 11 Sep 2023 21:37 next collapse

LLMs are not capable of this kind of meaningful collaboration

Which is why they’re a tool for professionals to amplify their workload, not a replacement for them.

CmdrShepard@lemmy.one on 11 Sep 2023 23:02 collapse

But C-suites will read articles like this and fire their development teams “because AI can do it.” I have my popcorn ready for the day it begins.

merc@sh.itjust.works on 11 Sep 2023 23:27 next collapse

80% accuracy, that is trash

More than 80% of most codebases is boilerplate stuff: including the right files for dependencies, declaring functions with the right number of parameters using the right syntax, handling basic easily anticipated errors, etc. Sometimes there’s even more boilerplate, like when you’re iterating over a list, or waiting for input and handling it.

The rest of the stuff is why programming is a highly paid job. Even a junior developer is going to be much better than an LLM at this stuff because at least they understand it’s hard, and at least often know when they should ask for help because they’re in over their heads. An LLM will “confidently” just spew out plausible bullshit and declare the job done.

Because an LLM won’t ask for help, won’t ask for clarifications, and can’t understand that it might have made a mistake, you’re going to need your highly paid programmers to go in and figure out what the LLM did and why it’s wrong.

Even perfecting self-driving is going to be easier than a truly complex software engineering project. At least with self-driving, the constraints are going to be limited because you’re dealing with the real world. The job is also always the same – navigate from A to B. In the software world you’re only limited by the limits of math, and math isn’t very limiting.

I have no doubt that LLMs and generative AI will change the job of being a software engineer / programmer. But, fundamentally programming comes down to actually understanding the problem, and while LLMs can pretend they understand things, they’re really just like well-trained parrots who know what sounds to make in specific situations, but with no actual understanding behind it.

superfes@lemmy.world on 11 Sep 2023 23:55 next collapse

But they could replace CEOs from what I can tell.

Cethin@lemmy.zip on 12 Sep 2023 00:51 next collapse

A monkey could replace CEOs.

phx@lemmy.ca on 12 Sep 2023 01:30 collapse

Please, PLEASE do not use Elon Musk, Bezos and other such people as the training model

electromage@lemm.ee on 12 Sep 2023 06:51 next collapse

But did you hear that it uses more water than regular data centers?

Lucidlethargy@sh.itjust.works on 12 Sep 2023 07:57 collapse

So what you’re saying is that 86.66% of the time, it works every time.

taanegl@lemmy.ml on 11 Sep 2023 18:37 next collapse

Future software is going to be written by AI, no matter how much you would like to avoid that.

My speculation is that we will see AI operating systems at some point, due to the extreme effectiveness of future AI to hack and otherwise subvert frameworks, services, libraries and even protocols.

So mutating protocols will become a thing, whereby AI will change and negotiate protocols on the fly, as a war rages between defensive AI and offensive AI. There will be shared codebase, but a clear distinction of the objective at hand.

That’s why we need more open source AI solutions and less proprietary solutions, because whoever controls the AI will be controlling the digital world - be it you or some fat cat sitting on a Smaug hill of money.

EDIT: gawdDAMN there’s a lot of naysayers. I’m not talking stable diffusion here, guys. I’m talking about automated attacks and self developing software, when computing and computer networking reaches a point of AI supremacy. This isn’t new speculation. It’s coming fo dat ass, in maybe a generation or two… or more…

BetaDoggo_@lemmy.world on 11 Sep 2023 19:26 next collapse

That all sounds pointless. Why would we want to use something built on top of a system that’s constantly changing for no good reason?

Unless the accuracy can be guaranteed at 100% this theoretical will never make sense because you will ultimately end up with a system that could fail at any time for any number of reasons. Predictive models cannot be used in place of consistent, human verified and tested code.

For operating systems I can maybe see llms being used to script custom actions requested by users(with appropriate guard rails), but not much beyond that.

It’s possible that we will have large software entirely written by machines in the future, but what it will be written with will not in any way resemble any architecture that currently exists.

[deleted] on 11 Sep 2023 21:15 collapse

[deleted] on 11 Sep 2023 19:38 next collapse

1984@lemmy.today on 11 Sep 2023 20:20 collapse

This is very much like the people saying airplanes will never fly after watching the prototypes fail in the 1900s.

It’s 100% guaranteed that computers will be able to write software much better and faster than humans. The only variable is how long it will take.

I think within a decade. Could be wrong and it could be two decades but I doubt it.

Think about it - these bots are already being used by humans to solve tasks every day. The only difference now compared to the future is that now there is a slow human typing something on a keyboard.

In the future, you will have bots talking to bots, millions of times per second, and models will learn in real time, not being pre-trained.

[deleted] on 11 Sep 2023 21:09 next collapse

1984@lemmy.today on 12 Sep 2023 04:44 collapse

I’m very confident about this, yes. But it’s still just my opinion.

nychtelios@rlyeh.icu on 12 Sep 2023 05:05 collapse

Computers are already able to write software faster than humans, this is called compilation. Languages are only a way to describe a problem and the computer automatically builds an extremely efficient software to solve it. No language models involved, so no randomicity, no biases and no errors caused by the inability to follow elementary syllogisms. Language models are not intelligent and they will never be. Yes, true AI imho is possible, but it won’t be a statistical model trying to predict words in a phrase, this is ridiculous and just companies marketing which you continue to take the bait.

1984@lemmy.today on 12 Sep 2023 05:35 collapse

Yep true AI is not language models, this is just the beginning.

Compilation turns source code into binaries, and that’s because humans wants to write code rather than machine code, again because we are not smart enough to quickly write machine code.

I expect computers to skip all these steps completely in the future and just generate programs immediately.

nychtelios@rlyeh.icu on 12 Sep 2023 05:56 collapse

Programming languages are only a way to describe a problem. Even with AI if you want it to build software for you, you have to describe your problem and to describe problems human languages are not that efficient, soooo… AI would require kinda a programming language, just an high level one. Maybe you can avoid writing logics, but as a software engineer, writing logics is the easiest and less time demanding task in serious software development.

1984@lemmy.today on 12 Sep 2023 06:01 collapse

You will be able to talk to the computer but more importantly, the computer will already know a lot of patterns what is the best way to do something under the conditions you will describe.

It will be like having only top programmers write code when you explain to them what you want, except much faster. There could also be brain implants to interact directly, but that I think is at least 30 years away.

shotgun_crab@lemmy.world on 12 Sep 2023 07:10 next collapse

I don’t think so. Having a good architecture is far more important and makes projects actually maintainable. AI can speed up work, but humans need to tweak and review its work to make sure it fits with the exact requirements.

realharo@lemm.ee on 12 Sep 2023 08:03 collapse

Future software is going to be written by AI

Of course, if you look far enough into the future. Look far enough and the whole concept of “software” itself could become obsolete.

The main disagreements are about how close that future is (years, decades, etc), and whether just expanding upon current approaches to AI will get us there, or we will need a completely different approach.

doublejay1999@lemmy.world on 11 Sep 2023 19:00 next collapse

Plot twist - the AI just cut and paste from stack overflow like real devs.

[deleted] on 11 Sep 2023 19:30 collapse

frokie@lemmy.world on 11 Sep 2023 19:33 next collapse

It should generate its own acceptance tests and keep asking itself to fix it until they all pass

CmdrShepard@lemmy.one on 11 Sep 2023 23:01 next collapse

What if it just kept going and created a brand new language and IDE?

derpgon@programming.dev on 12 Sep 2023 07:17 collapse

Still valid Python code ^/s^

flamekhan@lemmy.world on 11 Sep 2023 22:58 collapse

“We asked a Chat Bot to solve a problem that already has a solution and it did ok.”

merc@sh.itjust.works on 11 Sep 2023 23:30 collapse

to solve a problem that already has a solution

And whose solution was part of its training set…

[deleted] on 12 Sep 2023 05:51 next collapse

variaatio@sopuli.xyz on 12 Sep 2023 05:54 collapse

half the time hallucinating something crazy in the in the mix.

Another funny: Yeah, it’s perfect we just need to solve this small problem of it hallucinating.

Ahemm… solving hallucinating is the “no it actually has to understand what it is doing” part aka the actual intelligence. The actually big and hard problem. The actual understanding of what it is asked to do and what solutions to that ask are sane, rational and workable. Understanding the problem and understanding the answer, excluding wrong answers. Actual analysis, understanding and intelligence.

merc@sh.itjust.works on 12 Sep 2023 06:35 collapse

Not only that, but the same variables that turn on “hallucination” are the ones that make it interesting.

By the very design of generative LLMs, the same knob that makes them unpredictable makes them invent “facts”. If they’re 100% predictable they’re useless because they just regurgitate word for word something that was in the training data. But, as soon as they’re not 100% predictable they generate word sequences in a way that humans interpret as lying or hallucinating.

So, you can’t have a generative LLM that is both “creative” in that it comes up with a novel set of words, without also having “hallucinations”.

JoBo@feddit.uk on 12 Sep 2023 10:02 collapse

the same knob that makes them unpredictable makes them invent “facts”.

This isn’t what makes them invent facts, or at least not the only (or main?) reason. Fake references, for example, arise because it encounters references in text, so it knows what they look like and where they should be used. It just doesn’t know what one is or that it’s supposed to match up to something real which says what the text implies that it says.

merc@sh.itjust.works on 12 Sep 2023 18:04 collapse

so it knows what they look like and where they should be used

Right, and if it’s set to a “strict” setting where it only ever uses the 100% perfect next word, if the words leading up to a reference are a match for a reference it has seen before it will spit out that specific reference from its training data. But, when it’s set to be “creative”, and predict words that are a good but not perfect match, it will spit out references that are plausible but don’t exist.

So, if you want it to only use real references, you have to set it up to not be at all creative and always use the perfect next word. But, that setting isn’t very interesting because it just word-for-word spits out whatever was in its training data. If you want it to be creative, it will “daydream” references that don’t exist. The same knob controls both behaviours.

JoBo@feddit.uk on 12 Sep 2023 18:14 collapse

That’s not how it works at all. That’s not even how references work.