Discussion about this post

User's avatar
Egg Syntax's avatar

'Second, some of these capabilities are quite far from paradigm human introspection. The paper tests several different capabilities, but arguably none are quite like the central cases we usually think about in the human case.'

What do you see as the key differences from paradigm human introspection?

Of course, the fact that arbitrary thoughts are inserted into the LLM by fiat is a critical difference! But once we accept that core premise of the experiment, the capabilities tested seem to have the central features of human introspection, at least when considered collectively.

I won't pretend to much familiarity with the philosophical literature on introspection (much less on AI introspection!), but when I look at the Stanford Encyclopedia of Philosophy (https://plato.stanford.edu/entries/introspection/#NeceFeatIntrProc) it lists three ~universally agreed necessary qualities of introspection, of which all three seem pretty clearly met by this experiment.

In talking with a number of people about this paper, it's become clear that people's intuitions differ on the central usage of 'introspection'. For me and at least some others, its primary meaning is something like 'accessing and reporting on current internal state', and as I see it, that's exactly what's being tested by this set of experiments.

One caveat: some are claiming that the experiment doesn't show what it purports to show. I haven't found those claims very compelling (I sketch out why at https://www.lesswrong.com/posts/Lm7yi4uq9eZmueouS/eggsyntax-s-shortform?commentId=pEaQWb6oRqibWuFrM), but they're not strictly ruled out. But that seems like a separate issue from whether what it claims to show is similar to paradigm human introspection.

F.A.Kessler's avatar

I feel like AI research has a god of the gaps style problem, but in reverse. Nobody is making falsifiable predictions. Instead, we observe behavior and say either,

1) "That's not what we mean by introspection"

2) "That can be explained mechanically that's different from human introspection"

But note that our a priori prediction would not predict this. The regular model of LLMs as "sophistocated autocorrect" does not have room for introspection. So instead everything is a "just-so" story that explains the data after the fact. This leads to a receding horizon of unobserved behaviors that we'll eventually find and then explain with another story

Let me use an analogy. If we said something was "alive", we automatically understand it would have a metabolism, the ability to reproduce, and act on the world. So we can use our understanding of other living things to make useful predictions without having fully measured

In a similar way, "LLMs are also conscious" gives us useful predictions because we can map expectations over from humans (at least to first order). We're refusing to do so even though many experiments like this already exist where consciousness would have predicted the result correctly. So in that sense, we already have the explanation but we're refusing to name it. That's a problem for ethics even if it's okay (but limiting) for science

6 more comments...

No posts

Ready for more?