Ilya Sutskever's Test for AI Consciousness
Talking the talk about consciousness
A clip from an interview with Ilya Sutskever, OpenAI’s Chief Scientist, has been making the rounds on Twitter recently. The interviewer asks Ilya for his thoughts on AI consciousness, and Ilya responds with this proposed test1 for AI consciousness:
I do think there is a very simple way, there is an experiment which we could run on an AI system which we can't run just yet. But maybe in the future point when the AI learns very very quickly from less data, we could do the following experiment:
Very carefully curate the data, such that we never ever mention anything about consciousness. We would say, you know, “here is a ball,” and “here's a castle,” and
“here is a little toy.” Imagine you'd have data of this sort, and it would be very controlled. Maybe we'd have some number of years worth of this kind of training data, maybe such an AI system would be interacting with a lot of different teachers learning from them. But all very carefully. You never ever mention consciousness. You don’t, people don't talk about anything except for the most surface level notions of their experience.And then at some point you sit down this AI and you say “Okay, I want to tell you about consciousness. It's this thing that's a little bit not-well-understood, people disagree about it, but that's how they describe it.” And imagine if the AI then goes and says, "Oh my god! I've been feeling the same thing but I didn't know how to articulate it."
That would be definitely something to think about it. It's like if the AI was just trained on very mundane data around objects and going from place to place, or maybe something like this, from a very narrow set of concepts. We would never ever mention that [consciousness]. And yet if it could somehow eloquently and correctly talk about it in a way that we could recognize, that would be convincing. (emph. mine)
A few remarks on this test:
(1) I like this test! Passing it is clearly not a necessary condition for being conscious, but I do agree with Sutskever that this it would indeed provide strong evidence. And it tries to get around the problem we have with current systems’ “self-reports” of consciousness, which is that they have imbibed so much about consciousness. (Related reading: the meta-problem of consciousness)
(2) Sutskever’s proposal is very similar to Susan Schneider and Edwin Turner’s AI Consciousness Test. Like Sutskever’s proposed test, theirs is about whether an AI system speaks about consciousness in a human-like way.
Like Sutskever, Schneider and Turner stress that you have to prevent the AI system from hearing or reading about consciousness in order for the test to be valid:
Even today’s robots can be programmed to make convincing utterances about consciousness, and a truly superintelligent machine could perhaps even use information about neurophysiology to infer the presence of consciousness in humans. If sophisticated but non-conscious AIs aim to mislead us into believing that they are conscious for some reason, their knowledge of human consciousness could help them do so.
We can get around this though. One proposed technique in AI safety involves “boxing in” an AI—making it unable to get information about the world or act outside of a circumscribed domain, that is, the “box.” We could deny AI access to the internet and indeed prohibit it from gaining any knowledge of the world, especially information about conscious experience and neuroscience.
Schneider and Turner propose various “levels” of their test: different degrees of sophistication with which the system might be able to talk about consciousness. The “most demanding level” of their test is analogous to Sutskever’s proposal.
At the most elementary level we might simply ask the machine if it conceives of itself as anything other than its physical self…At the most demanding level, we might see if the machine invents and uses such a consciousness-based concept on its own, without relying on human ideas and inputs.
(3) Both Schneider and Turner’s test, and Sutskever’s test, have to strike a careful balance.
A key challenge for these tests—a challenge that Sutskever seems to be aware of—is that you have to walk a very fine line: you have to restrict what AI system learns about consciousness before the test, while still enabling it to learn enough to be able to answer your questions. As Udell and Schwitzgebel note in a discussion of Schneider and Turner’s test:
The AI would..need a repertoire of terms through which it could appropriately express the reactions we might expect from it if it were conscious (fear or sadness, perhaps?). The AI can’t be entirely prevented from learning about consciousness-adjacent aspects of the world if it’s going to be a conversational partner in a test of this sort. But it’s unclear how much world knowledge would be too much, in the hands of a sophisticated learning AI. Implementation of the ACT will therefore require careful aim at an epistemic sweet spot.
This is non-trivial, to say the least. Note the need for the AI system to be able to speak about “consciousness-adjacent aspects of the world.” Now, recall how Sutskever’s imagined AI reacts: “I've been feeling the same thing but I didn't know how to articulate it.”
“Feeling”? “Know”? (“Articulate?”) How does the system know how to use these terms? Were they banned, or not? Perhaps the system has picked these concepts up from people talking, as Sutskever says, about only “the most surface level notions of their experience.” The trick is to figure out what that means exactly, and to strike the appropriate balance.
(4) For your consideration, here’s a troubling—but, I think, not implausible—potential result of this test. Setting aside the worry just discussed, let’s say that the test “works” and the system is able to respond. But it doesn’t respond with a definite “I know what you mean” or a definite “I don’t know.” Instead, suppose the system reacts ambivalently: “Hmm…that kind of sounds like something I’ve felt. But not exactly? I think I have the thing you’re talking about, but’s hard for me to tell.” And maybe it shares some of our intuitions about consciousness, and not others.
In this scenario, perhaps the system is sort of conscious: its processing resembles ours in some ways, and differs from it in others. Maybe our theories of consciousness don’t really clearly rule one way or the other on in-between cases like this. What then?
Sutskever doesn’t call it a test, but it will be handy to call it a test. I don’t mean much if anything by saying “test” as opposed to “experiment.”

It's always going to be self-reporting, in these cases. *It's enough that we understand what is required of it internally to say it shares the same level of self-awareness as humans, which could just be a stream of conscisounsess log that runs in the background (even humans can't fully capture our thought process).
What I don't like about the 'test' that Sutskever proposes is that it feels completely unrealistic for us to train an incredibly sophisticated system without referencing anything related to consciousness.
To me a strong indicator for consciousness would if the machine had an internal experience that we could pick up on. One way would be if we can see if it has 'thoughts of its own' and 'a will of its own'. The current paradigm is that we prompt a machine, it runs, and then it present us with a result. Input/output. For starters, a conscious AI would have to show that it has wants and needs of its own (that weren't put in there by us), and perform actions that are consistent with those wants and needs.
Another important aspect, to me, seems to be the ability to recognize 'other minds'. Children are able to recognize that others have diverse beliefs from a very young age. They learn that others have access to different knowledge bases and are able to understand that others may have false beliefs and that others are capable of hiding emotions. I feel that we could design behavioral tests that could test whether a machine has the ability to perceive other minds.