Ilya Sutskever's Test for AI Consciousness

Talking the talk about consciousness

Nov 01, 2023

A clip from an interview with Ilya Sutskever, OpenAI’s Chief Scientist, has been making the rounds on Twitter recently. The interviewer asks Ilya for his thoughts on AI consciousness, and Ilya responds with this proposed test1 for AI consciousness:

I do think there is a very simple way, there is an experiment which we could run on an AI system which we can't run just yet. But maybe in the future point when the AI learns very very quickly from less data, we could do the following experiment:
Very carefully curate the data, such that we never ever mention anything about consciousness. We would say, you know, “here is a ball,” and “here's a castle,” and
“here is a little toy.” Imagine you'd have data of this sort, and it would be very controlled. Maybe we'd have some number of years worth of this kind of training data, maybe such an AI system would be interacting with a lot of different teachers learning from them. But all very carefully. You never ever mention consciousness. You don’t, people don't talk about anything except for the most surface level notions of their experience.
And then at some point you sit down this AI and you say “Okay, I want to tell you about consciousness. It's this thing that's a little bit not-well-understood, people disagree about it, but that's how they describe it.” And imagine if the AI then goes and says, "Oh my god! I've been feeling the same thing but I didn't know how to articulate it."
That would be definitely something to think about it. It's like if the AI was just trained on very mundane data around objects and going from place to place, or maybe something like this, from a very narrow set of concepts. We would never ever mention that [consciousness]. And yet if it could somehow eloquently and correctly talk about it in a way that we could recognize, that would be convincing. (emph. mine)

A few remarks on this test:

(1) I like this test! Passing it is clearly not a necessary condition for being conscious, but I do agree with Sutskever that this it would indeed provide strong evidence. And it tries to get around the problem we have with current systems’ “self-reports” of consciousness, which is that they have imbibed so much about consciousness. (Related reading: the meta-problem of consciousness)

(2) Sutskever’s proposal is very similar to Susan Schneider and Edwin Turner’s AI Consciousness Test. Like Sutskever’s proposed test, theirs is about whether an AI system speaks about consciousness in a human-like way.

Like Sutskever, Schneider and Turner stress that you have to prevent the AI system from hearing or reading about consciousness in order for the test to be valid:

Even today’s robots can be programmed to make convincing utterances about consciousness, and a truly superintelligent machine could perhaps even use information about neurophysiology to infer the presence of consciousness in humans. If sophisticated but non-conscious AIs aim to mislead us into believing that they are conscious for some reason, their knowledge of human consciousness could help them do so.

We can get around this though. One proposed technique in AI safety involves “boxing in” an AI—making it unable to get information about the world or act outside of a circumscribed domain, that is, the “box.” We could deny AI access to the internet and indeed prohibit it from gaining any knowledge of the world, especially information about conscious experience and neuroscience.

Schneider and Turner propose various “levels” of their test: different degrees of sophistication with which the system might be able to talk about consciousness. The “most demanding level” of their test is analogous to Sutskever’s proposal.

At the most elementary level we might simply ask the machine if it conceives of itself as anything other than its physical self…At the most demanding level, we might see if the machine invents and uses such a consciousness-based concept on its own, without relying on human ideas and inputs.

(3) Both Schneider and Turner’s test, and Sutskever’s test, have to strike a careful balance.

A key challenge for these tests—a challenge that Sutskever seems to be aware of—is that you have to walk a very fine line: you have to restrict what AI system learns about consciousness before the test, while still enabling it to learn enough to be able to answer your questions. As Udell and Schwitzgebel note in a discussion of Schneider and Turner’s test:

The AI would..need a repertoire of terms through which it could appropriately express the reactions we might expect from it if it were conscious (fear or sadness, perhaps?). The AI can’t be entirely prevented from learning about consciousness-adjacent aspects of the world if it’s going to be a conversational partner in a test of this sort. But it’s unclear how much world knowledge would be too much, in the hands of a sophisticated learning AI. Implementation of the ACT will therefore require careful aim at an epistemic sweet spot.

This is non-trivial, to say the least. Note the need for the AI system to be able to speak about “consciousness-adjacent aspects of the world.” Now, recall how Sutskever’s imagined AI reacts: “I've been feeling the same thing but I didn't know how to articulate it.”

“Feeling”? “Know”? (“Articulate?”) How does the system know how to use these terms? Were they banned, or not? Perhaps the system has picked these concepts up from people talking, as Sutskever says, about only “the most surface level notions of their experience.” The trick is to figure out what that means exactly, and to strike the appropriate balance.

(4) For your consideration, here’s a troubling—but, I think, not implausible—potential result of this test. Setting aside the worry just discussed, let’s say that the test “works” and the system is able to respond. But it doesn’t respond with a definite “I know what you mean” or a definite “I don’t know.” Instead, suppose the system reacts ambivalently: “Hmm…that kind of sounds like something I’ve felt. But not exactly? I think I have the thing you’re talking about, but’s hard for me to tell.” And maybe it shares some of our intuitions about consciousness, and not others.

In this scenario, perhaps the system is sort of conscious: its processing resembles ours in some ways, and differs from it in others. Maybe our theories of consciousness don’t really clearly rule one way or the other on in-between cases like this. What then?

Sutskever doesn’t call it a test, but it will be handy to call it a test. I don’t mean much if anything by saying “test” as opposed to “experiment.”

Steven Marlow

Feb 2, 2024

It's always going to be self-reporting, in these cases. *It's enough that we understand what is required of it internally to say it shares the same level of self-awareness as humans, which could just be a stream of conscisounsess log that runs in the background (even humans can't fully capture our thought process).

Expand full comment

Phil Tanny

Dec 27, 2023

Maybe this helps? What is consciousness?

One theory is that consciousness is a product of the divisive nature of thought. Thought operates by dividing the external world in to conceptual objects which can then be manipulated internally. Nouns are the easiest example. External real world trees become the symbol "tree" internally.

https://www.tannytalk.com/p/article-series-the-nature-of-thought

Consciousness may this same process unfolding within the realm of thought itself. The phenomena of thought is conceptually divided in to "me" and "my thoughts". The expression "I am thinking about XYZ" illustrates this conceptual division, with "me" and "my thoughts" being experienced as two different things, when really they are one.

The best way to get this theory is not to agree or disagree intellectually, but to carefully observe one's own mind in action.

What about AI? Like everybody else, I don't know of course.

We can presume that the nature of thought didn't just pop in to existence with humans, but has been under gradual development by evolution in other species for a long time, with human thought being just the latest version. One might guess this evolutionary process will continue on in to AI? So AI will be to us as we are to chimps?

Or, one might guess that AI will be so fundamentally different than biological mechanisms that trying to compare AI to human behavior is a mistake.

Or, one might guess that the concept of any entity owning it's intelligence is a mistaken perception. Maybe intelligence is not a property of particular things, but rather a property of reality itself. For example, everything is governed by the laws of physics, but those laws are not a property of any particular thing, but of reality as a whole.

https://www.tannytalk.com/p/intelligence-is-intelligence-a-property

Maybe living things are kind of like radio receivers, and reality itself is like the radio station broadcasting the intelligence signal. In this case the question would be can non-biological machines receive the universal intelligence signal? Or will they never be conscious in the way we think of it?

Finally, this entire subject may be rendered null and void by some cowboy who starts tossing around nuclear weapons, an ever present possibility so rarely considered by AI experts and their fans. AI writing always seem to assume that AI development will continue endlessly on in to the future, when it could just as easily all end tomorrow afternoon at 2:39pm EST.

1 more comment...

Experience Machines

Discussion about this post