Human-like object concept representations emerge naturally in multimodal large language models

Human-like object concept representations emerge naturally in multimodal large language models (www.nature.com)
from yogthos@lemmy.ml to technology@lemmy.ml on 21 Jun 17:51
https://lemmy.ml/post/32049558

#technology

threaded - newest

queermunist@lemmy.ml on 21 Jun 18:33 next collapse

Isn’t this just because LLMs use the object concept representation data from actual humans?

yogthos@lemmy.ml on 21 Jun 18:48 collapse

The object concept representation is an emergent property within these networks. Basically, the network learns to create stable associations between different modalities and associate an abstract concept of an object that unites them together.

queermunist@lemmy.ml on 21 Jun 18:52 collapse

But it’s emerging from networks of data from humans, which means our object concept representation is in the data. This isn’t random data, after all, it comes from us. Seems like the LLMs are just regurgitating what we’re feeding them.

What this shows, I think, is how deeply we are influencing the data we feed to LLMs. They’re human-based models and so they produce human-like outputs.

yogthos@lemmy.ml on 21 Jun 19:10 collapse

Ultimately the data both human brains and artificial neural networks are trained on comes from the material reality we inhabit. That’s the underlying context. We’re feeding LLMs data about our reality encoded in a way that’s compatible with how our brains interpret it. I’d argue that models being based on data encoding that we ourselves use is a feature, because ultimately we want to be able to interact with them in a meaningful way.

queermunist@lemmy.ml on 21 Jun 19:16 collapse

LLMs are not getting raw data from nature. They’re being fed data produced by us and uploaded into their database: human writings and human observations and human categorizations and human judgements about what data is valuable. All the data about our reality that we feed them is from a human perspective.

This is a feature, and will make them more useful to us, but I’m just arguing that raw natural data won’t naturally produce human-like outputs. Instead, human inputs produce human-like outputs.

yogthos@lemmy.ml on 21 Jun 19:23 collapse

I didn’t say they’re encoding raw data from nature. I said they’re learning to interpret multimodal representations of the encodings of nature that we feed them in human compatible formats. What these networks are learning is to make associations between visual, auditory, tactile, and text representations of objects. When a model recognizes a particular modality such as a sound, it can then infer that it may be associated with a particular visual object, and so on.

Meanwhile, the human perspective itself isn’t arbitrary either. It’s a result of evolutionary selection process that shaped the way our brains are structured. This is similar to how brains of other animals encode reality as well. If you evolved a neural network on raw data from the environment, it would eventually start creating similar types of representations as well because it’s an efficient way to model the world.

queermunist@lemmy.ml on 21 Jun 21:05 collapse

I didn’t say they’re encoding raw data from nature

Ultimately the data both human brains and artificial neural networks are trained on comes from the material reality we inhabit.

Anyway, the data they are getting not only comes in a human format. The data we record is only recorded because we find meaningful as humans and most of the data is generated entirely by humans besides. You can’t separate these things; they’re human-like because they’re human-based.

It’s not merely natural. It’s human.

If you evolved a neural network on raw data from the environment, it would eventually start creating similar types of representations as well because it’s an efficient way to model the world.

We don’t know that.

We know that LLMs, when fed human-like inputs, produce human-like outputs. That’s it. That tells us more about LLMs and humans than it tells us about nature itself.

yogthos@lemmy.ml on 21 Jun 21:17 collapse

It’s not merely natural. It’s human.

I’m not disputing this, but I also don’t see why that’s important. It’s a representation of the world encoded in a human format. We’re basically skipping a step of evolving a way to encode this data.

We know that LLMs, when fed human-like inputs, produce human-like outputs. That’s it. That tells us more about LLMs and humans than it tells us about nature itself.

Did you actually read through the paper?

queermunist@lemmy.ml on 22 Jun 02:43 collapse

I’m not disputing this, but I also don’t see why that’s important.

What’s important the use of “natural” here, because it implies something fundamental about language and material reality, rather than this just being a reflection of the human data fed into the model. You did it yourself when you said:

If you evolved a neural network on raw data from the environment, it would eventually start creating similar types of representations as well because it’s an efficient way to model the world.

And we just don’t know this, and this paper doesn’t demonstrate this because (as I’ve said) we aren’t feeding the LLMs raw data from the environment. We’re feeding them inputs from humans and then they’re displaying human-like outputs.

Did you actually read through the paper?

From the paper:

to what extent can complex, task-general psychological representations emerge without explicit task-specific training, and how do these compare to human cognitive processes across abroad range of tasks and domains?

But their training is still a data set picked by humans and given textual descriptions made by humans and then used a representation learning method previously designed for human participants. That’s not “natural”, that’s human.

A more accurate conclusion would be: human-like object concept representations emerge when fed data collected by humans, curated by humans, annotated by humans, and then tested by representation learning methods designed for humans.

human in ➡️ human out

yogthos@lemmy.ml on 22 Jun 03:16 collapse

A more accurate conclusion would be: human-like object concept representations emerge when fed data collected by humans, curated by humans, annotated by humans, and then tested by representation learning methods designed for humans.

Again, I’m not disputing this point, but I don’t see why it’s significant to be honest. As I’ve noted, human representation of the world is not arbitrary. We evolved to create efficient models that allow us to interact with the world in an effective way. We’re now seeing that artificial neural networks are able to create similar types of internal representations that allow them to meaningfully interact with the data organized in a way that’s natural for humans.

I’m not suggesting that human style representation of the world is the one true way to build a world model, or that other efficient representations aren’t possible. However, that in no way detracts from the fact that LLMs can create a useful representation of the world, that’s similar to our own.

Ultimately, the end goal of this technology is to be able to interact with humans, to navigate human environments, and to accomplish tasks that humans want to accomplish.

queermunist@lemmy.ml on 22 Jun 04:01 collapse

LLMs create a useful representation of the world that is similar to our own when we feed them our human created+human curated+human annotated data. This doesn’t tell us much about the nature of large language models nor the nature of object concept representations, what it tells us is that human inputs result in human-like outputs.

Claims about “nature” are much broader than the findings warrant. We’d need to see LLMs fed entirely non-human datasets (no human creation, no human curation, no human annotation) before we could make claims about what emerges naturally.

yogthos@lemmy.ml on 22 Jun 04:50 collapse

You continue to ignore my point that human representation are themselves not arbitrary. Our brains have emerged naturally, and that’s what makes the representations humans make natural. You could evolve a representation of the model from scratch by hooking up a neural network to raw sensory inputs, and its topology will eventually become tuned to model those inputs. I don’t see what would be fundamentally more natural about that though.

queermunist@lemmy.ml on 22 Jun 05:03 collapse

If we define human inputs as “natural” then the word basically ceases to mean anything.

It’s the equivalent of saying that paintings and sculptures emerge naturally because artists are human and humans are natural.

yogthos@lemmy.ml on 22 Jun 12:44 collapse

Are you saying that humans are not a product of nature?

queermunist@lemmy.ml on 22 Jun 22:21 collapse

I’m saying that the terms “natural” and “artificial” are in a dialectical relationship, they define each other by their contradictions. Those words don’t mean anything once you include everything humans do as natural; you’ve effectively defined “artificial” out of existence and as a result also defined “natural” out of existence.

yogthos@lemmy.ml on 23 Jun 02:32 collapse

I haven’t defined artificial out of existence at all. My definition of artificial is a system that was consciously engineered by humans. The human mind is a product of natural evolutionary processes. Therefore, the way we perceive and interpret the world is inherently a natural process. I don’t see how it makes sense to say that human representation of the world is not natural.

An example of something that’s artificial would be taking a neural network we designed, and having it build a novel representation of the world that’s unbiased by us from raw inputs. It would be an designed system, as opposed to one that evolved naturally, with its own artificial representation of the world.

queermunist@lemmy.ml on 23 Jun 02:52 collapse

My definition of artificial is a system that was consciously engineered by humans.

And humans consciously decided what data to include, consciously created most of the data themselves, and consciously annotated the data for training. Conscious decisions are all over the dataset, even if they didn’t design the neural network directly from the ground up. The system still evolved from conscious inputs, you can’t erase its roots and call it natural.

Human-like object concept representations emerge from datasets made by humans because humans made them.

yogthos@lemmy.ml on 23 Jun 03:06 collapse

Human-like object concept representations emerge from datasets made by humans because humans made them.

And humans made them that way because human minds evolved to represent data in this way. As I keep pointing out, we’re feeding data into neural networks that’s organized in a way that’s natural for our brains to operate on. It’s an artificial system that mimics the way we naturally represent data in our own minds.

The artificial aspect of the system lies in the implementation details. The ways we’ve come up to encode data. These are not essential. It’s like a difference between an algorithm, and its concrete implementation in a programming language. The fact that the data is encoded using human designed formats is incidental to the structure of the data which is derived from the way our brains encode information.

Human-like object concept representations emerge from the way our brains are structured. These are the representations that are encoded into data sets by humans.

Also, you’ve talked about a dialectical relationship, but dialectics are about understanding evolution of dynamic systems. The contradictions represent the opposing forces within a system that guide its development over time. When we talk about a distinction between natural and artificial, what’s the system that we’re discussing here what are the opposing forces?

brisk@aussie.zone on 21 Jun 23:16 next collapse

Unpaywalled direct link to paper [PDF] courtesy of the Unpaywall add-on.

MCasq_qsaCJ_234@lemmy.zip on 22 Jun 00:17 collapse

In your opinion, is this a good thing, a bad thing, or is it just a curiosity that LLMs currently have?

yogthos@lemmy.ml on 22 Jun 01:49 collapse

It’s a good thing in a sense that it means the models are creating stable representations of objects across modalities. It means that there is potential for extending LLM approach to building actual world models in the future.