Discussion about this post

User's avatar
Egg Syntax's avatar

'Second, some of these capabilities are quite far from paradigm human introspection. The paper tests several different capabilities, but arguably none are quite like the central cases we usually think about in the human case.'

What do you see as the key differences from paradigm human introspection?

Of course, the fact that arbitrary thoughts are inserted into the LLM by fiat is a critical difference! But once we accept that core premise of the experiment, the capabilities tested seem to have the central features of human introspection, at least when considered collectively.

I won't pretend to much familiarity with the philosophical literature on introspection (much less on AI introspection!), but when I look at the Stanford Encyclopedia of Philosophy (https://plato.stanford.edu/entries/introspection/#NeceFeatIntrProc) it lists three ~universally agreed necessary qualities of introspection, of which all three seem pretty clearly met by this experiment.

In talking with a number of people about this paper, it's become clear that people's intuitions differ on the central usage of 'introspection'. For me and at least some others, its primary meaning is something like 'accessing and reporting on current internal state', and as I see it, that's exactly what's being tested by this set of experiments.

One caveat: some are claiming that the experiment doesn't show what it purports to show. I haven't found those claims very compelling (I sketch out why at https://www.lesswrong.com/posts/Lm7yi4uq9eZmueouS/eggsyntax-s-shortform?commentId=pEaQWb6oRqibWuFrM), but they're not strictly ruled out. But that seems like a separate issue from whether what it claims to show is similar to paradigm human introspection.

AbstractNoun's avatar

Studies like this highlight the tension between structural probing of LLMs and more surface interactions such as those being done by Eleos. I think the former might be more illuminating than the latter because my I'm tempted to believe that cognitive or causal structure is doing a lot more work in our ascriptions of ethical regard.

9 more comments...

No posts

Ready for more?