How many r are there in strawberry?
from jarfil@beehaw.org to technology@beehaw.org on 03 Aug 23:11
https://beehaw.org/post/21442308

Just got schooled by an AI.

According to Wiktionary:

(UK) IPA(key): /ˈstɹɔːb(ə)ɹi/
(US) IPA(key): /ˈstɹɔˌbɛɹi/

…there are indeed only two /ɹ/ in strawberry.

So much for dissing on AIs for not being able to count.

#technology

threaded - newest

Rhaedas@fedia.io on 03 Aug 23:17 next collapse

Oh wow, I didn't think about how many r sounds. But then if you ask it how many ks are in knight, it should say none.

jarfil@beehaw.org on 03 Aug 23:48 collapse

I continued the conversation, then asked it about knight:

<img alt="" src="https://beehaw.org/pictrs/image/ffb7b2c4-dec3-439a-98df-0675803aca15.jpeg">

KazuchijouNo@lemy.lol on 03 Aug 23:41 next collapse

This is nonsense, you cannot justify this type of errors from a language model. It’s just a bunch of words strung together based on probability. This is just an artifact of such a construction, it’s all right, don’t break your brain on it. The “AI” sure isn’t.

jarfil@beehaw.org on 04 Aug 01:31 collapse

This is not a standalone model, it’s from a unnamed chatbot platform “character” in non-RP mode.

I’ve been messing with it to check its limitations. It has:

  • Access to the Internet (verified)
  • Claims to have access to various databases
  • Likely to use interactions with all users to train further (~20M MAUs)
  • Ability to create scenes and plotlines internally, then follow them (verified)
  • Ability to adapt to the style of interaction and text formatting (verified)

Obviously has its limitations. Like, it fails at OCR of long scrolling screenshots… but then again, other chatbots fail even more spectacularly.


Edit: removed chatbot platform name “advertisement”. If you want to know which platform it is, ask the ones accusing me of spamming.

KazuchijouNo@lemy.lol on 04 Aug 18:39 collapse

Oh! I got it now, this is advertisement

jarfil@beehaw.org on 06 Aug 01:00 collapse

No you didn’t.

Sxan@piefed.zip on 03 Aug 23:45 next collapse

You don't use IPA for counting the number of letters in words. That would be stupid, and even linguists would laugh at you.

It's still a stupid AI, and it was confidently, and unambiguously, wrong.

Powderhorn@beehaw.org on 04 Aug 00:04 next collapse

I use IPAs to forget about work crap. (Former linguistics major; I know the other meaning, but it doesn’t come up much at bars.)

ChaoticNeutralCzech@feddit.org on 05 Aug 05:49 collapse

Yup. Letters ≠ phonemes and in this case, even the character is different: r ≠ ɹ.

user224@lemmy.sdf.org on 04 Aug 00:00 next collapse

Just AI being good at math again
<img alt="" src="https://files.catbox.moe/16i9xn.jpg">

TranquilTurbulence@lemmy.zip on 04 Aug 06:54 next collapse

Well, at least I’m not worried about my job. Not even a little.

Krauerking@lemy.lol on 04 Aug 15:08 collapse

I love that its wrong/right the whole first response (everyone knows A comes first in PEMDAS right?) and corrects itself even more out of touch with reality because thats what the developers told it would appease the users.
Say the user is correct and try an even worse answer.

kbal@fedia.io on 04 Aug 00:40 next collapse

It just goes to show that the AI is not yet superhuman. If it were really smart it would know, as humans can tell at a glance, that there are four r's in strawberry. There's the first one, the two in the double r combination, and then the rr digram itself which counts as a fourth r.

psx_crab@lemmy.zip on 04 Aug 00:48 next collapse

Kinda remind me of 13yo me, so why not both?

Bob_Robertson_IX@discuss.tchncs.de on 04 Aug 00:55 next collapse

<img alt="" src="https://discuss.tchncs.de/pictrs/image/85f51435-89c8-4bad-b027-bcfe8fe7dc1b.jpeg">

Powderhorn@beehaw.org on 04 Aug 01:03 collapse

Oh, for fuck’s sake … another land war in Asia?

lvxferre@mander.xyz on 04 Aug 00:58 next collapse

Wrong maths, you say?

<img alt="" src="https://i.imgur.com/3LvZRI9.png">

<img alt="" src="https://i.imgur.com/K5YafMv.png">

<img alt="" src="https://i.imgur.com/m9uaj6Y.png">

Anyway. You didn’t ask the number of times the phoneme /ɹ/ appears in the spoken word, so by context you’re talking about the written word, and the letter ⟨r⟩. And the bot interpreted it as such, note it answers

here, let me show you: s-t-r-a-w-b-e-r-r-y

instead of specifying the phonemes.

By the way, all explanation past the «are you counting the “rr” as a single r?» is babble.

jarfil@beehaw.org on 04 Aug 02:26 next collapse

Those are all the smallest models, and you don’t seem to have reasoning mode, or external tooling, enabled?

LLM ≠ AI system

It’s been known for some time, that LLMs do “vibe math”. Internally, they try to come up with an answer that “feels” right… which makes it pretty impressive for them to come anywhere close, within a ±10% error margin.

Ask people to tell you what a right answer could be, give them 1 second to answer… see how many come that close to the right one.

A chatbot/AI system on the other hand, will come up with some Python code to do the calculation, then run it. Still can go wrong, but it’s way less likely.

all explanation past the «are you counting the “rr” as a single r?» is babble

Not so sure about that. It treats r as a word, since it wasn’t specified as “r” or single letter. Then it interpretes it as… whatever. Is it the letter, phoneme, font, the programming language R… since it wasn’t specified, it assumes “whatever, or a mix of”.

It failed at detecting the ambiguity and communicating it spontaneously, but corrected once that became part of the conversation.

It’s like, in your examples… what do you mean by “by”? “3 by 6” is 36… you meant to “multiply 36”? That’s nonsense… 🤷

lvxferre@mander.xyz on 04 Aug 04:43 collapse

[special pleading] Those are all the smallest models

[sarcasm] Yeah, because if you randomly throw more bricks in a construction site, the bigger pile of debris will look more like a house, right. [/sarcasm]

and you don’t seem to have reasoning [SIC] mode, or external tooling, enabled?

Those are the chatbots available through DDG. I just found it amusing enough to share, given

  1. The logic procedure to be followed (multiplication) is rather simple, and well documented across the internet, thus certainly present in their corpora.
  2. The result is easy to judge: it’s either correct or incorrect.
  3. All answers are incorrect and different from each other.

Small note regarding “reasoning”: just like “hallucination” and anything they say about semantics, it’s a red herring that obfuscates what is really happening.

At the end of the day it’s simply weighting the next token based on the previous tokens + prompt, and optionally calling some external tool. It is not really reasoning; what’s doing is not too different in spirit from Markov chains, except more complex.

[no true Scotsman] LLM ≠ AI system

If large “language” models don’t count as “AI systems”, then what you shared in the OP does not either. You can’t eat your cake and have it too.

It’s been known for fome time, that LLMs do “vibe math”.

I.e. they’re unable to perform actual maths.

[moving goalposts] Internally, they try to come up with an answer that “feels” right…

It doesn’t matter if the answer “feels” right (whatever this means). The answer is incorrect.

which makes it pretty impressive for them to come anywhere close, within a ±10% error margin.

No, the fact they are unable to perform a simple logical procedure is not “impressive”. Specially not when outputting the “approximation” as if it was the true value; note how none of the models outputted anything remotely similar to “the result is close to $number” or “the result is approximately $number”.

[arbitrary restriction + whataboutism] Ask people to tell you what a right answer could be, give them 1 second to answer… see how many come that close to the right one.

None of the prompts had a time limit. You’re making shit up.

Also. Sure, humans brainfart all the time; that does not magically mean that those systems are smart or doing some 4D chess as your OP implies.

A chatbot/AI system on the other hand, will come up with some Python code to do the calculation, then run it. Still can go wrong, but it’s way less likely.

I.e. it would need to use some external tool, since it’s unable to handle logic by itself, as exemplified by maths.

all explanation past the «are you counting the “rr” as a single r?» is babble

Not so sure about that. It treats r as a word, since it wasn’t specified as “r” or single letter. Then it interpretes it as… whatever. Is it the letter, phoneme,

The output is clearly handling it as letters. It hyphenates the letters to highlight them, it mentions “digram” (i.e. a sequence of two graphemes), so goes on. And in no moment is referring to anything that can be understood as associated with sounds, phonemes. And it’s claiming there’s an ⟨r⟩ «in the middle of the “rr” combination».

font, the programming language R…

There’s no context whatsoever to justify any of those interpretations.

since it wasn’t specified, it assumes “whatever, or a mix of”.

If this was a human being, it would not be an assumption. Assumption is that sort of shit you make up from nowhere; here context dictates the reading of “r” as “the letter ⟨r⟩”.

However since this is a bot it isn’t even assuming. Just like a boulder doesn’t “assume” you want it to roll down; it simply reacts to an external stimulus.

It failed at detecting the ambiguity and communicating it spontaneously, but corrected once that became part of the conversation.

There’s no ambiguity in the initial prompt. And no, it did not correct what it says; the last reply is still babble, you don’t count ⟨rr⟩ in English as a single letter.

It’s like, in your examples… what do you mean by “by”? “3 by 6” is 36… you meant to “multiply 36”? That’s nonsense… 🤷

I’d rather not answer this on

jarfil@beehaw.org on 06 Aug 01:48 collapse

I’d rather not answer this one because, if I did, I’d be pissing on Beehaw’s core values.

I feel like you already did, and I won’t be responding in kind. Good day, to you.

SmartmanApps@programming.dev on 10 Aug 08:41 collapse

Wrong maths, you say?

Yes. If I want to know what 1+2 equals, and I throw a dice, there’s a chance I will get the correct answer. If I do, that doesn’t mean it knows how to do Maths. Also, notice where it said “Here’s the calculation”, it didn’t actually show you the calculation? e.g. long multiplication, or even grouping, or the way the Chinese do it. Even a broken clock is right twice a day. Even if AI manages to randomly get a correct answer here and there, it still doesn’t know how to do Maths (which includes not knowing how to count to begin with)

lvxferre@mander.xyz on 10 Aug 18:40 collapse

What’s interesting IMO is that it got the first two and the last two digits right; and this seems rather consistent across attempts with big numbers. It doesn’t “know” how to multiply numbers, but it’s “trying” to output an answer that looks correct.

In other words, it’s “bullshitting” - showing disregard to truth value, but trying to convince you.

spit_evil_olive_tips@beehaw.org on 04 Aug 02:03 next collapse

So much for dissing on AIs for not being able to count.

no, I’m still going to do that.

jarfil@beehaw.org on 04 Aug 02:34 collapse

Nobody’s stopping you. I’m going to reassess and double check my assumptions instead… and ask the AI to explain itself.

hazelnoot@beehaw.org on 04 Aug 18:38 collapse

wait you’re serious? this isn’t satire???

jarfil@beehaw.org on 06 Aug 01:07 collapse

What were your assumptions to say that?

01189998819991197253@infosec.pub on 04 Aug 02:50 next collapse

Did you ask how many /ɹ/ there are, or how many r there are? It can’t count, then it went and tried to justify its moronic behavior, and manipulated you into believing its “logic”.

jarfil@beehaw.org on 04 Aug 02:55 collapse
zonnewin@feddit.nl on 04 Aug 05:37 next collapse

A normal human would understand that the question is about the spelling, not the pronunciation.

AI still has a lot to learn.

megopie@beehaw.org on 04 Aug 12:11 next collapse

It also is just making up a string of words that are probabilistically plausible as a continuation of the dialog.

You can do the same tests with other words and it will just contradict it’s self and get things wrong about how many times a letter is pronounced in a word.

jarfil@beehaw.org on 06 Aug 01:45 collapse

It’s not a “normal human”, it’s an AI using an LLM.

AI still has a lot to learn.

Does it, though? Does a hammer have a lot to learn, or does the person wielding it have to learn how not to smash their own fingers?

zonnewin@feddit.nl on 06 Aug 01:52 collapse

it’s an AI using an LLM

Which we know by now often produces wrong answers.

Also, the term AI would assume some kind of intelligence, for which I see no evidence.

jarfil@beehaw.org on 06 Aug 04:12 collapse

I’m seeing about as many wrong questions as wrong answers. We’re at a point, where it’s becoming more accurate to ask, whether the quality of the answer, is “aligned” with the quality of the question.

As for “AI” and “intelligence”… not so long ago, dogs had no intelligence or soul, and a tic-tac-toe machine was “AI”. The exact definition of “intelligence”, seems to constantly flow and bend, mostly following anthropocentric egocentrism trends.

SmartmanApps@programming.dev on 10 Aug 08:30 collapse

a tic-tac-toe machine was “AI”.

No it wasn’t. It was (and is) a deterministic program. AI isn’t.

jarfil@beehaw.org on 10 Aug 13:23 collapse

It still is: www.google.com/search?q=tic+tac+toe+ai

Plenty of examples out there.

SmartmanApps@programming.dev on 11 Aug 11:39 collapse

It still is

A deterministic program, yes

Plenty of examples out there

You found plenty of examples of people adding an AI player to the game. The game itself is still a deterministic program.

jarfil@beehaw.org on 15 Aug 07:51 collapse

Kind of like saying that ChatGPT is people adding an AI player to the deterministic program of a chat… nah, I’m not going to discuss that. Tic-tac-toe is a classical example problem for neural networks 101, kind of a “hello world”.

SmartmanApps@programming.dev on 15 Aug 11:33 collapse

Kind of like saying that ChatGPT is people adding an AI player to the deterministic program of a chat

Except ChatGPT was written from scratch, so not at all like that

Tic-tac-toe is a classical example problem for neural networks 101, kind of a “hello world”

Doesn’t change that that isn’t how it was implemented to begin with. In your search results there are even people on Reddit asking how to add an AI player to their existing game. Seems like you gave me search results without even looking at them.

jarfil@beehaw.org on 15 Aug 12:16 collapse

Not sure if I’m not explaining myself, or you’re choosing to not understand me. I’m going to leave it here.

Ulrich@feddit.org on 04 Aug 06:31 next collapse

You know if the letter was L and the language spanish it’d almost be right…

jarfil@beehaw.org on 06 Aug 01:39 collapse

At first I thought it was talking about “rr” as a Spanish digraph. Not sure how far that lies from the truth, these models are multilingual and multimodal after all. My guess is that it’s surfacing the ambiguity of its internal vector for a “token: rr” vs “token: r”, though.

Could be interesting to dig deeper… but I think I’m fine with this for now. There are other “curious” behaviors of the chatbot, that have me more intrigued right now. Like, it is self-adapting to any repeated mistakes in the conversation history, but at other times it can come up with surprisingly “complex” status tracking, then present it spontaneously as bullet points with emojis. Not sure what to make out of that one yet.

megopie@beehaw.org on 04 Aug 12:02 next collapse

I asked it how many X’s there are in the word Bordeaux it told me there are none.

I asked it how many times X is pronounced in Bordeaux it told me the x in Bordeaux isn’t pronounced with the word ending in an “o” sound.

I asked it how many “o” there are in Bordeaux it told me there are no o in Bordeaux.

So, is it counting the sounds made in the word? Or is it counting the letters? Or is it doing none of the above and just giving a probabilistic output based on an existing corpus of language, without any thought or concepts.

jarfil@beehaw.org on 06 Aug 01:26 collapse

Yes, no, both… and all other interpretations… all at once.

With any ambiguity in a prompt, it assumes a “blend” of all the possible interpretations, then responds using them all over the place.

In the case of “Bordeaux”:

It’s pronounced “bor-DOH”, with the emphasis on the second syllable and a silent “x.”

So… depending on how you squint: there is no “o”, no “x”, only a “bor” and a “doh”, with a “silent x”, and ending in an “oh like o”.

Perfectly “logical” 🤷

Vodulas@beehaw.org on 04 Aug 13:00 next collapse

Also ignoring the fact it said one r was in the middle of the word

jarfil@beehaw.org on 06 Aug 01:17 collapse

There is a middle ground between “blindly rejecting” and “blindly believing” whatever an AI says.

LLMs use tokens. The answer is “correct, in its own way”, one just needs to explore why and how much. Turns out, that can also lead to insights.

Vodulas@beehaw.org on 06 Aug 01:48 collapse

It is not correct in any way, though. Unless you count a way you gave it to justify it’s wrong answer, but that is just it being a Yes Man to keep you engaged.

[deleted] on 06 Aug 04:28 collapse

.

Vodulas@beehaw.org on 06 Aug 05:01 collapse

It is correct in an “ambiguous multi-dimensional” sense

That’s a lot of words to say it’s wrong.

The question is incredibly straightforward, and again the “reason” it gave is one you provided in the clarifying question itself. There is no reasoning going on, because it can’t understand the question (or reason for that matter).

[deleted] on 06 Aug 15:18 collapse

.

Vodulas@beehaw.org on 06 Aug 16:07 collapse

Ah cool, you’ve resorted to being a jerk. Have fun wasting water and electricity overthinking wrong answers from chatbots.

[deleted] on 06 Aug 23:22 collapse

.

Krauerking@lemy.lol on 04 Aug 15:04 next collapse

Yeah there is a stupid human in this chat but mostly cause they let themselves get tricked by bad logic in order to justify a bad answer.

pruwybn@discuss.tchncs.de on 04 Aug 17:45 collapse

Yes, this is the saddest thing about this, that people trust these bullshitting chatbots so much that they doubt their own knowledge.

jarfil@beehaw.org on 06 Aug 01:09 collapse

Not as sad as those so secure of their own knowledge, that they refuse to ever revise it.

pruwybn@discuss.tchncs.de on 06 Aug 03:23 collapse

I’m just not convinced there are only 2 r’s in strawberry.

jarfil@beehaw.org on 06 Aug 03:49 collapse

Indeed. The point is, that asking about r is ambiguous.

LukeZaz@beehaw.org on 04 Aug 17:06 collapse

I shudder to think how much electricity got wasted so you could get fooled by an LLM into believing nonsense. Let alone the equally-unnecessary followup questions.

Vodulas@beehaw.org on 05 Aug 04:27 collapse

Also, the LLM is just Yes Manning. OP gave it the ‘rr’ counts as a single ‘r’ answer with a very loaded question