AI Consciousness Roundtable
Highlights and transcript, feat. Yoshua Bengio, Patrick Butlin, Grace Lindsay, and me
Last week, I participated in a roundtable discussion about AI consciousness (video here) with AI researcher Yoshua Bengio, philosopher Patrick Butlin, and neuroscientist Grace Lindsay. The conversation covered the newly released report on AI consciousness that we wrote together (along with 15 other authors), and ranged over many interesting topics, from the risks of building conscious AI, to how studying AI consciousness can inform neuroscience, to the differences between animal intelligence and artificial intelligence.
Yoshua Bengio: My take on this is we should not build conscious machines until we know better. There are many reasons for this. In particular, whether we succeed in building machines that are actually conscious or not, if humans perceive those AI systems as conscious, that has a lot of implications that could be extremely destabilizing for society.
We associate moral status to other conscious beings, and that's connected with a very strong social contract that works for humans. We have all kinds of characteristics. We have a finite lifetime and we have bounded intellectual capabilities. And these properties may not apply to AI that can reproduce. There's no limit to how you could copy an AI system over and over. They might be immortal or they might be much smarter than us. All sorts of things that I think would make the current interpretation of consciousness from a moral and social view questionable. I think until we know better, we shouldn't do that.
Grace Lindsay: This is a nice opportunity for the neuroscientists who study consciousness to see their theories in a new light when they're kind of put into these cold, stark indicators, and really reflect on whether that is summarizing everything that they think is important….Overall, the report has benefits to be able to take this neuroscience literature and bring it to an AI audience and put it in those terms, but also should have benefits for the neuroscience side itself: in terms of thinking about how these theories really pan out, how they relate to each other, what they could be missing….
Patrick Butlin: We know that on the whole, animal brains work in relatively similar ways to human brains. We can know that because of our shared evolutionary heritage even prior to doing detailed animal neuroscience. On the other hand, there's a relatively wide space of possible designs for AI systems….this kind of space of possible internal processes in AI is much wider than the space of processes that we're likely to find in animal brains.
What follows is a transcript, very lightly edited for clarity, of the roundtable. Thanks to Aaron Bergman for creating the initial draft, and to the incomparable Xander Balwit for further editing and links-adding.
Introduction
JEFF SEBO:
This is an event hosted by the NYU Mind Ethics and Policy Program, and we are also grateful to our co-sponsors, the NYU Center for Bioethics and the NYU Center for Mind, Brain and Consciousness. Thank you for co-sponsoring this event. The NYU Mind Ethics and Policy Program examines the nature and intrinsic value of non-human minds, including biological and nonbiological minds, with a special focus on invertebrates and artificial intelligence systems. We are interested in examining what kinds of beings can be conscious and sentient and sapient, and what kinds of moral, legal, and political status they should have.
We were very interested when this team released this report about a week ago, a report on what leading scientific theories of consciousness have to say about possible AI consciousness. And so we thought it would be useful, especially since the report was generating a lot of discussion as soon as it was released, it received coverage in Nature and Science and generated a lot of conversation on the internet—we thought it would be valuable to bring some of the report's authors together. There were 19 authors from different fields and different institutes, and we thought we could bring four of them together, including the two lead authors and representatives of other fields, to talk about the report, summarize some of the main points, and then offer their own individual perspectives about some of the implications.
[Jeff introduces the speakers: Patrick Butlin, Robert Long (me), Grace Lindsay, and Yoshua Bengio; see the end of the post for that transcript]
Opening remarks
Remarks by Robert Long
First, I just want to say thank you so much to the audience for turning out. We're extremely excited to discuss these issues with you. A big part of why we wrote this report is to discuss these topics more widely and hear from different people. And thanks so much to Jeff and to the NYU Mind Ethics and Policy Program for hosting this. My name is Robert Long, and along with Patrick Butlin, I'm one of the lead authors of this report. As Jeff mentioned, there are also 17 esteemed colleagues from various disciplines, and I'm going to briefly say what we did in the report, why we did it, and why it matters.
You've all probably read about the case of Blake Lemoine, a Google engineer who was fired from Google after he became convinced that an AI system he was interacting with was conscious. You've probably seen the increased deployment of chatbots, including romantic chatbots, that interact with people in very human-like, compelling ways. You may have seen a Tweet from OpenAI's chief scientist Ilya Sutskever saying that it may be that today's large neural networks are slightly conscious. So AI consciousness is obviously a topic of increasing public concern and public interest, and it's also a perennially fascinating scientific question. Speaking for myself, as AI consciousness has become a bigger topic recently, I've often been frustrated by some of the characteristics that conversation about it has. I think when people talk about AI consciousness, there's often a lack of specific scientific evidence. There's very little detailed analysis of existing systems. The conversation is often emotive and highly charged and often in dichotomies, like “How could AI systems possibly be conscious? We know that that's absurd”. Or, on the other hand, people act as if they are certain that they definitely are conscious. And what we wanted to provide in this report is kind of a better footing for these kinds of conversations. We want to bring a more nuanced and more evidence-based approach to the report. In addition to the actual findings that we have, we really want to spur a conversation and further research along these lines.
So one aspect I'm going to talk about briefly is conceptual clarity (something that philosophers love) about what exactly we're talking about when we talk about AI consciousness. What are we talking about in this report? It's called “Consciousness in AI: Insights from the Science of Consciousness” so it's very important, both in this talk and in the report, to pin down what we're talking about.
So what I'll be talking about and what we discuss in the paper is what philosophers call “phenomenal consciousness.” This means subjective experience. Thomas Nagel famously said that consciousness is “what it is like” to be an entity. So, for example, if you've tuned in, you're now currently listening to my voice, you're seeing my face and this wall on the screen, and there's a subjective aspect to your experience. There's something that it's like for you to be seeing a white wall or hearing my voice. And what we're asking about in this report is whether AI systems, now or in the near future, could have subjective experiences in this way—if they could be conscious in this sense.
It's very important in conversations about this to be clear about what we're not talking about. We're not talking about whether AI systems are intelligent in a certain way. We're certainly not talking about AGI or artificial general intelligence. We're not talking about language, understanding, or rationality. We're also not talking about whether AI systems experience the world in exactly the way that humans do. Thomas Nagel, when he introduced this phrase, famously asked “What is it like to be a bat?” It's very plausible that bats might be conscious, that they might have subjective experiences of the world, but if they are, the way that they experience the world is presumably very different from the way we experience the world. The way they think about the world is presumably very different and they don't have the full suite of human cognitive capacities. So when we're talking about AI, we're wondering about consciousness in this sense and not necessarily any further assumptions about human-like intelligence or experiences.
So, how do we go about this in this report? First of all, we want the conversation to be empirical. That means that we're looking at scientific theories of consciousness. There's a broad field of consciousness science where scientists examine the human brain and animal brains and ask what sort of processes are associated with having conscious experiences. And this is somewhat distinct from philosophical questions about consciousness. We're going to look at these empirical theories and see what they can tell us about potential AI consciousness.
Another key aspect is that we take this working assumption of this very broad thesis about the nature of consciousness, which we call computational functionalism. And very broadly, this is saying that what matters for consciousness is the computations that a system performs and not necessarily what it's made of. This is compatible with consciousness coming from computations done by biological neurons or computations happening in silicon transistors and chips. Patrick's going to talk a little bit more about exactly why we made this assumption and what it amounts to.
Then we try to get precise. We try to derive indicator properties from the science of consciousness that we can use to examine different AI systems, and then we apply these to a broad class of AI systems from several different kinds of research programs. Lastly, we're not aiming for absolute certainty. It's my opinion, and I think most of my co-authors, if you're asking for absolute certainty about AI consciousness, you are going to be disappointed because there still is a lot we don't know about consciousness. That's in spite of the fact that we do know some things, which is what we try to show in this report.
It's a long report. So I would like to just talk about one specific example of this methodology and how it works in the paper. One of the most prominent scientific theories of consciousness is called the global workspace theory. At a broad level, the global workspace theory says that consciousness is about global broadcast throughout a system. So it has this picture of the mind where there are specialized independent modules that are responsible for different kinds of information processing. For example, one such module might be perceptual processing which suggests vision is responsible for visual information processing. Or maybe there's a module dedicated to making decisions and choosing actions.
The global workspace theory says that there's also a “global workspace” that will select the output from one of these usually independent modules and then “broadcast” that to all of the modules. This is a way of coordinating all of the processing that they are each doing individually. So consciousness is related to global broadcasts of information throughout a system, and a system having a certain architectural or information processing property that's related to the computational functionalist assumption that we have. Importantly, as we'll see, there needs to be a certain kind of feedback loop between modules and a global workspace. The global workspace needs to take input from the modules and then broadcast it back to them.
From that we get some indicator properties to say what kind of features an AI system needs to have if it's going to satisfy, at least according to global workspace theory, the indicators that might be associated with consciousness. So one of the top questions you might have is, well, what do we say about large language models? Briefly, you're probably all familiar with large language models, these are some of the most prominent systems today. The GPT systems are an example. These are often built with what's called the Transformer architecture. And now is certainly not a time for a full explanation of what that amounts to. But one key aspect for our analysis is that the transformer architecture is “feed-forward” and that is to say when the system is taking in an input, say, a certain amount of text and it's going to produce an output, which is the text that it will continue to produce. As it's processing that information, the information just goes from the input layer to the output layer kind of continuously. When we look at this architecture and ask whether the way that information is flowing through, whether it fits with the picture from global workspace theory—our provisional analysis in the paper is that as far as we can tell, there's nothing like a global workspace that is both receiving and broadcasting information to and from anything like individual modules. To put it another way, there doesn't seem to be the right kind of feedback loops in the transformer architecture for it to be satisfying these particular indicators of consciousness. So our answer in the paper is that according to global workspace theory, they do not satisfy these indicators.
That's the kind of thing we do in the paper: take a scientific theory, ask more precisely what kind of architectural and informational processing features it says are associated with consciousness, look at how an AI system works, and see how much it matches that. As I said, we do this with a lot of theories. We have five very prominent scientific theories of consciousness that we look at.
One thing that's worth remarking on is why we look at five different theories. Why not just, for example, go with global work theory? And that's kind of related to our points about nuance and uncertainty. We do think that the science of consciousness is making progress in discovering the processes associated with consciousness, but we think it's premature to commit to one particular theory. There's nothing like a consensus in consciousness science about which one of these stories is the right one for consciousness. So we kind of want to cast a broad net and say, “Here are a variety of things that might be associated with consciousness according to different theories, and let's look at all of them.”
As mentioned, one thing that we think it's important to do is look beyond just large language models. They have understandably received a lot of attention. They're very compelling. Their capabilities are extremely impressive and, most saliently, they sometimes say that they're conscious, so it's understandable that they're one of the main systems that people wonder about. But at least speaking for myself, I think when you look at the way that they're built, they're not necessarily even the best current candidates for consciousness among the AI systems that exist today. For example, many people point out that large language models don't seem like they're agents, or they don't seem embodied. They don't seem to interact continuously with an environment, which you might think is important for consciousness. But many systems are plausibly much closer to doing that sort of thing, which is why we look at systems that are from robotics or that navigate virtual environments or that work in different ways than large language models do. I will now turn things over to my colleague Patrick.
Remarks by Patrick Butlin
So Rob talked a bit about what we did in this project. I'm going to talk briefly about why we did things in the way that we did, and then I'm also going to comment very briefly on what's next for research in this area.
Okay, so on the topic of why we did things the way we did, the first question I want to answer is why we assume computational functionalism. And this is an important question because computational functionalism is quite a contentious hypothesis. So just to remind you about what this claim is, computational functionalism, as Rob said, is the claim that what matters for consciousness is computations. According to this view, the reason why humans are conscious is that the right kinds of computations are going on in our brains. And what it would take for an AI system to be conscious is for it to perform the right kind of computations. Now, this is, as I just said, quite a controversial hypothesis. Many researchers in this area doubt it. For example, Anil Seth suggests that consciousness may require living cells and there are many other researchers with similar views to that position.
So why do we make this assumption? Well, there are a few factors that play into this. One is that we'd think that computational functionalism is a plausible view. That doesn't mean to say that we're confident that it's true. I think there's probably a range of views about computational functionalism among the authors, but as a group, we're certainly not confident about this. It may even be too strong to say that we think it's more likely to be true than not true. But crucially, we think it's plausible enough that it's worth exploring its implications.
Another consideration is that many scientific theories of consciousness are expressed in computational terms. Also, if computational functionalism is true, then there's an important further question about which AI systems perform the kinds of computations that are associated with consciousness. Ultimately, the reason why we make this assumption is that we think it's a productive one in the sense that by starting from this assumption, we can reduce our overall uncertainty about consciousness in AI to a significant degree.
We're uncertain about whether computational functionalism is true. We're uncertain about which specific scientific theories of consciousness are true. We don't know which AI systems are conscious. But if we kind of explore this area of the space of possibilities, we think we can make progress and reduce our overall uncertainty.
Another question about why we did what we did is why we chose to focus on the internal processes that AI systems go through rather than on their behavior or their capabilities. Why do we think that thinking about internal processes is a better way of determining whether they're likely to be conscious? This is also an interesting question because there's this important challenge to the approach that we took, which is that the science of consciousness is relatively immature. There are still lots of competing theories in this science. And also, crucially, these theories are based mostly, although not entirely, on data from humans. So one might very reasonably have doubts about whether these descriptions of what's going on in humans when humans have conscious experiences can be extended to provide a guide to consciousness in non-humans. And in particular, that kind of argument has been made in debates about non-human animal consciousness. Our colleague Jonathan Birch, for example, who's a leading researcher in that area, argues that in the case of non-human animals, when you're trying to work out which are conscious, it's more productive, with science in its current state, to concentrate on examining the capabilities of non-human animals rather than on trying to work out whether the internal processes in their brains are similar or not to the internal processes in human brains.
But, in fact, we think that different methods make sense in these two cases: the case of non-human animals versus the case of artificial intelligence. The key difference which makes it the case that different methods are appropriate in these two cases is that we know that on the whole, animal brains work in relatively similar ways to human brains. We can know that because of our shared evolutionary heritage even prior to doing detailed animal neuroscience. On the other hand, there's a relatively wide space of possible designs for AI systems. There are lots of different possible architectures and methods that you can use in AI in principle. So this kind of space of possible internal processes in AI is much wider than the space of processes that we're likely to find in animal brains.
Now, there are a couple of different consequences of this point. One is that in the case of AI, it's relatively informative to find similar internal processes going on in a system to the processes that seem to be associated with consciousness in humans. That's just because AIs could relatively easily be different from humans. In the case of animals, finding similar processes is not so informative. But then on the flip side, finding similar capabilities and similar behaviors to those exhibited by humans in AI is less informative than finding similar capabilities in the case of animals.
And that's because it seems that in AI it's possible for very different underlying processes to give rise to superficially similar behaviors and capabilities. There's a specific version of this problem that clearly arises in AI that exists today, which is that AI systems are sometimes designed specifically to imitate human behavior. We know that there are AI systems that have superficially, but also in a very compelling way, human-like capabilities, and human-like behaviors which also work in quite unhuman-like ways. For that reason, looking at capabilities looks like, on the whole, an unreliable method, unless it can be very substantially refined in the case of AI.
Turning to the question of what's next for AI consciousness research, there are lots of exciting directions that could be pursued at this point. One which I'm interested in, and Rob as well, is understanding what's called valenced conscious experience. So of course, as we're all familiar with, some conscious experiences feel good, like feeling a cool breeze on a hot day. Other conscious experiences feel bad like feeling pain or fear. These are famous conscious experiences. What we say is that the ones that feel good have positive valence and the ones that feel bad have a negative valence. But it seems as though, in principle, there can also be neutral conscious experiences that don't have a valence one way or the other. So the question is: Could we find indicators for specifically valenced consciousness in AI? That seems to be a question that goes beyond the general one that we were asking in this report. We think that this question is important because it seems that pleasure and suffering have special moral significance.
That brings us to the topic which is one of the major motives for our reports and which we're certainly going to be talking about further today, which is this point that a big part of the reason why thinking about AI consciousness matters and why working out how to assess whether AI systems are likely to be conscious matters is that our society is soon going to have to decide whether to build systems which could plausibly be conscious. There's a huge, very difficult philosophical question which is: What moral principles should we use to make this momentous choice?
I certainly don't know the answer to that question. It's possible we'll explore it a little bit in a few minutes. But we do think that there's a simple step that could be taken now, which is to continue the kinds of work that we've been doing to try to understand what might be good indicators of consciousness in AI. Then, in particular, for groups engaged in building AI systems to recognize the possibility that they might build conscious systems, that they might do so even without trying to do just that, and for those groups to develop methods for assessing whether the systems that they're working on are likely to be conscious. And we certainly strongly recommend that they proceed with great caution if they get a positive result when they're doing that kind of assessment.
Thanks very much for listening everyone. Thanks again to our collaborators and I'm looking forward to the discussion.
JEFF:
Great. Thank you so much, Patrick and Rob. Now, Grace, do you have any thoughts from your perspective to share?
Remarks by Grace Lindsay
I just wanted to talk a little bit about the state of consciousness research in neuroscience to give context to the theories that are discussed in the report and situate ourselves in the history of science here. I should say I'm not directly a consciousness researcher myself, but I do study attention which interweaves with consciousness studies in various ways. I don't have a dog in the fight of these different theories, which is possibly a good position to be in to discuss the overall state of things.
It is the case that the neuroscientific study of consciousness in terms of being a proper scientific field of research is pretty new. You could argue neuroscience itself is pretty new in the whole scheme of the history of science, but certainly, people taking the scientific study of consciousness seriously is definitely new. There was a joke that you had to be tenured to study consciousness. The idea that now there are actually full labs devoted to this and people are really trying to make it rigorous and thorough and you have these theories and everything is definitely progress, but it points to the fact that this research is still in its infancy. The theories that are laid out here are the major theories that are discussed among these researchers. From my perspective, I don't think there's a sense that these are anywhere close to the final drafts of these theories and that's important for when you then step through the conclusions. And just the fact that Rob said the report doesn't choose a specific theory to go with because there isn't consensus.
There are these multiple theories that in many ways have conflicts with each other. And so it really is still a young area. Also, the way that the scientific study is framed in order to be precise and rigorous is usually framed more as studying the “neural correlates” of consciousness. So not even trying to make a strong claim necessarily about causal connection, but just what do you observe the patterns of neural activity to be when a subject reports a conscious experience? And that's another detail. It's really a lot of times the “neural correlates of conscious report.” What are people saying that they experienced as their conscious experience? There is a detail of these “no report” paradigms, but they still ultimately rely on the fact that at some point a human was able to say that they experienced something. Those are also caveats that bind the scientific study of it and are necessary to make it a scientifically rigorous thing to study. But obviously from a philosophical perspective that's going to have implications as well.
So the scientists are going into the brain and it's messy and there's a lot of stuff going on. The hope is to find the simplified principles that correlate with this conscious report and correlate with people saying something is conscious versus not. There the work is to take the big, messy, complex thing and try to come up with the simplest description of what you need in order to get conscious report. When you then look at that in isolation, sometimes those indicators as the report turns them into seem really simple.
We have to keep in mind that these theories were not developed for the purpose of trying to see if an AI system is conscious. They're developed in the context of assuming that a human is conscious and looking at their neural activity or even a lot of at least the kind of background knowledge for these theories comes from non-human animal work and so they're understanding where they're coming from in that sense. The fact that they're not designed to be directly translated into these easy-to-identify computational principles that could be in artificial systems, I think is important. I think it's important for this work to try to take a theory and assess an artificial system. I also think that there's a lesson for the people, the neuroscientists who study consciousness in this as well because as this happens a lot, when you do mathematical modeling, you can be working with a topic area and kind of think you have a mental model of how it works and then when you actually go to write it down, you realize some aspects are lacking. It's the tuning of the kind of mental model and the pile of experimental data and the word models that people use to describe how they think something functions. When you actually have to turn that into an equation or code or actually try to build it, you can kind of see where you might be missing things.
This is a nice opportunity for the neuroscientists who study consciousness to see their theories in a new light when they're kind of put into these cold, stark indicators and really reflect on whether that is summarizing everything that they think is important or that there is things about the brain that are kind of going unspoken that they think are actually really important as well or things about the abilities of humans or animals that are important as well. It’s important to keep in mind that these theories were not designed to lead to a description that is used for AI, but it's still a very helpful exercise to go through this and see what they look like in the end when they're kind of pared down to the simplest form that can be translated into an AI system.
Overall, the report has benefits to be able to take this neuroscience literature and bring it to an AI audience and put it in those terms but also should have benefits for the neuroscience side itself in terms of thinking about how these theories really pan out, how they relate to each other, what they could be missing and whether the scientists who created and worked on these theories, if they would agree with the conclusions of the report or even agree that an artificial system that had these properties was conscious. In the end, I think that that's an important thing for those scientists to reflect on.
JEFF:
Thank you so much, Grace. And finally, Yoshua, do you have thoughts to share?
Remarks by Yoshua Bengio
Sure. Maybe I'll start by talking about computational functionalism or the computational basis of consciousness and subjective experience. We've been using the word consciousness, but I think it's important to clarify that the word consciousness can be associated with all sorts of things and we're trying to focus on “subjective experience,” which is the part that may seem very mysterious to many people, including researchers. My personal view, so not the unanimous view of the others, is that physics is computation and many physicists share that view. It's just a bunch of equations that could be implemented in any way you want. At the end of the day, you get the same changes in the state of the world and your brain is physics too.
Now, I think some of the questions about how this could be turned into computations in a computer may have nothing to do with something non-material that could possibly be happening in biology. Maybe there is something about physics like it requires quantum computations that maybe we don't know yet how to do. But actually, if we look at the progress of AI in the last few decades, we're moving forward quite rapidly towards very strong capabilities and we never seem to be requiring any kind of quantum computation in order to get that power, which of course doesn't guarantee that it's continued to be the case. That suggests that the level of abstraction that, say, neural networks used in machine learning have is already doing a good job of providing a lot of the explanations and neural correlates of our abilities.
Another interesting question that has to do with AI research is: Why are we conscious in the first place? The perspective that would come naturally to machine learning researchers or AI researchers is evolution notice with these forms of computations because that gives us an advantage either individually or collectively. There's a social aspect to consciousness as well. If there is an advantage, then it's something worth investigating from an AI perspective. It may be something that AI researchers want to put into their AI systems, which is a question I'll come back to that Patrick talked about: Do we want to have conscious machines or not?
One of the things my group has been working on is precisely this question. So some of these theories of consciousness, in particular the global workspace theory and attention schema theory and others, really can be interpreted as providing an advantage in terms of our ability to learn and manipulate abstraction. This is connected to the property of thoughts and attention, selecting very few bits of information that go through working memory at any moment.
Then we sequentially go through a very small number of bits that help us make decisions and organize our understanding of the world at a very abstract level that compresses perceptual information in a way that helps us better understand the world, make better decisions, better model it and so on.
Let me go back to this question of whether AIs are conscious or will be in the future. Our report suggests that none of the current AI systems have enough of the characteristics that those theories suggest. But, the different theories we chose are not the end of it—this is a continuously moving field. There are new papers coming regularly suggesting other variants often related to existing theories. So we shouldn't take these as the end story of how consciousness works in the brain. Also what the report suggests is actually those properties in these theories or maybe other ones that could be related that may come up in the near future are attributes that are not impossible to put in AI. It's very plausible that in coming years we would be able to build machines that compute in ways that are at least suggestive of consciousness in the human sense.
This raises a lot of important questions. My take on this is we should not build conscious machines until we know better. There are many reasons for this. In particular, whether we succeed in building machines that are actually conscious or not, if humans perceive those AI systems as conscious, that has a lot of implications that could be extremely destabilizing for society. We associate moral status to other conscious beings, and that's connected with a very strong social contract that works for humans. We have all kinds of characteristics. We have a finite lifetime and we have bounded intellectual capabilities.
And these properties may not apply to AI that can reproduce. There's no limit to how you could copy an AI system over and over. They might be immortal or they might be much smarter than us. All sorts of things that I think would make the current interpretation of consciousness from a moral and social view questionable. I think until we know better, we shouldn't do that. There's another, more pragmatic reason why I think we shouldn't build conscious machines. Because with consciousness also comes a notion of self and even self-preservation and objectives like agency. This was one of the theories that was described. And if we go on that route, this could be very dangerous from an AI safety point of view. In other words, we might be building machines that have their own goals that are not well aligned with human norms and values in ways that could be extremely dangerous for humanity.
Of course, this is a subject that's been intensely debated in the last few months, which makes this report particularly interesting. The last point I want to make and that's not really in the report, but connected to what we talk about in the report, comes from some recent work coming out of my group just in the last few months suggesting another theory of consciousness that is completely computational. It's related to several existing theories. But what's interesting about this one is that subjective experience with all the attributes that we associate with it, like ineffability, subjectivity, richness, and fleetingness, these properties emerge from this model as side effects of the need to perform a particular kind of computation that is important from a learning perspective. But you could obtain potentially the same computations with a different implementation that wouldn't have these side effects. So evolution has sort of converged in this particular way of achieving particular useful computations that may give rise to our sensation of being conscious and having free will and all these things to which we attribute a little bit of magical properties. We should be careful about that instinct we have about our own perception of being conscious in light of those results from neuroscience and AI.
Q&A
JEFF:
Thank you so much, Yoshua and everybody, for those remarks. It gives us a lot to talk about. I will jump right into questions that attempt to synthesize and present you with what people are asking about in the Q&A tab. I will not be able to get to everything, but please know, everybody, that we will send all the questions to the panelists after the talk, whether or not we can get to them during the presentation. Some of the questions are descriptive, and others are moral or legal, or political in nature. I can start with a general descriptive question for you. As you noted in the initial presentation and some of your remarks, and as several people have noted in the comments, you focus on a particular perspective about consciousness, according to which consciousness is about computations.
Computational functionalism
JEFF:
So you look at scientific theories of consciousness that identify different computations, and then you search for markers related to those theories, and then you look for those markers in particular kinds of AI systems. As some people note, that does not exhaust the space of theories and perspectives about consciousness that are plausible and popular right now. On a more permissive end of the spectrum, as one person notes, there are, for example, panpsychist theories of consciousness and other theories that are relatively undemanding and allow for the possibility that even quite simple systems could be conscious. Those theories might imply that lots of systems can be conscious, whether or not they have your indicators. Then at the other more restrictive end of the spectrum, you have biological theories according to which, for various reasons, you really do need to be made out of carbon-based cells and neurons in order to realize consciousness. And according to those theories, a system can hit all of your markers but still be non-conscious if that system is made out of silicon-based transistors and chips.
I wonder, on a personal level, to the panelists, what kind of credence do you have in these more permissive or more restrictive theories? Do you find them plausible? Do you find them good candidates? And how would you incorporate them into your search for AI consciousness?
ROBERT:
I think out of the people in the report, I'm probably on the higher end in my credence in computational functionalism of some kind—I'm maybe like 70%. But I am very compelled by arguments by people like Anil Seth and Peter Godfrey-Smith, philosopher of biology, who has written extensively on consciousness. I do sometimes wake up in the middle of the night wondering if computational functionalism is off on the wrong track.
I'll also just take this opportunity to say that one thing we call for in the report is more detailed work investigating the assumption of computational functionalism.1 We think it's again, sufficiently implausible that it's really very important to explore its implications, but we could also get a lot more clarity on these issues if arguments for and against computational functionalism got hashed out in more detail. I'll lastly just say I wanted to plug a really nice remark by Anil Seth that I think is exactly the kind of response we wanted where he said, “I disagree with some of the assumptions…” (and I'm guessing that's computational functionalism) “but that's totally fine because I might well be wrong.” So we're very excited to see people kind of exploring different parts of the space of possibilities that we could be facing with AI consciousness.
JEFF:
Thank you, Rob. Yoshua?
YOSHUA:
My credence on computational functionalism is 99.99%, maybe because I'm a computer scientist. The whole field of computer science is founded on the idea that computations can be done on any substrate. There's not been any example of that that exists, as far as I know. It's not just AI. It dates back to Turing and the Turing machine around the Second World War. It's also connected to, as I said in my little pitch, everything we know from physics. I don't see how having carbon atoms prevents computations from explaining what's going on. It's just a different kind of computation. It may not be the computations going on at the level of these artificial neurons that you typically find in deep learning. That's very possible. But it's still computation. It's just now computation happening at the atomic level, but it's still computation. What's the level of abstraction that's needed to replicate human intelligence and consciousness? Well, nobody really knows, and that's open. For me, if it's not computational, it's not even physical, so it's not even materialistic. And I don't see how you could buy that unless you believe in some spiritual beings explaining our consciousness.
I also have a comment on panpsychism which is that I feel it is completely overgeneralizing. The things we know that are conscious are human beings. And because of many similarities, we have some good reasons to think that other animals may be conscious. Everything we know about human consciousness has the kind of properties that we discuss in the report that are completely disjoint and not applicable to just arbitrary groups of atoms or even single electrons or whatever crazy things that you could come to with these theories. I'm not saying these theories are false, but they seem so far removed from biological reality and what happens when a person is conscious or not conscious, that, for me, they don't rate very high as scientific theories that are supposed to explain what we know about consciousness. They may feel good, again because I think it may make us feel good with our intuitive religious understanding of the world, but in terms of matching what we observe, the correlates of consciousness seem pretty much to be bringing zero information.
JEFF:
Thank you. Grace or Patrick, did you have anything you wanted to add?
PATRICK:
When you asked about credences in computational functionalism, I guess the thought is that to the extent that we're doubtful about computational functionalism, maybe we're doubtful about the value of this project to kind of reveal whether AI systems are likely to be conscious or not, and therefore, whether they're likely to have a certain special kind of moral significance or not. For me, I think how my credences fit together is something like this: if there's such a thing as consciousness, if the concept of consciousness makes sense and is a useful one to apply beyond the human case, then I think it's most likely a computational phenomenon. I think I'd give more credence to the computational view in that situation than to non-computational views because I think the computational views have more promise in explaining the properties of consciousness. What keeps me awake at night, to go back to what Rob said, is the possibility that the concept of consciousness is somehow confused or that it doesn’t make sense to apply it beyond the human case, that it's unproductive for moral theorizing or conceptually confused in some way to ask the question whether AI systems are conscious or not.
YOSHUA:
And indeed, I don't think these two views are incompatible. I actually think that consciousness is computational in nature, and that is confusing and kind of not clear that it's meaningful to extend that concept very far from human beings, especially regarding the social and moral aspects of things. So I don't think these are incompatible points of view.
JEFF:
Grace, did you have anything you wanted to add?
GRACE:
My gut is that it's largely correct or certainly, there will at least need to be a common set of computations and then maybe there also needs to be other stuff. But on the whole, I just feel like we're several paradigm shifts away from really understanding all of this. So it's hard for me to say anything with any confidence or vigor.
Consciousness and moral status
JEFF:
That seems like that is the answer about which we can be most confident. Okay, great. Thank you everybody. I can now ask a question on the moral, legal, and political side, and again, several people have asked questions along those lines as well. So I think, as you yourself said, part of why so many people are so interested in this topic is because we do associate consciousness and then related capacities like sentience, the ability to consciously experience positively and negatively valenced states like pleasure and pain and happiness and suffering. Many people associate that reasonably, in my view, with a certain kind of intrinsic moral, legal, and political significance. The idea is that if you have consciousness and or sentience, if there is something it is like to be you, and if it can feel good or bad to be you especially, then you matter for your own sake. And I should consider your interests and needs and vulnerabilities when making decisions that affect you and that might extend to a decision about whether to create you in the first place, as well as a decision about how to treat you if and when I do create you.
I would like to ask all of you if you care to respond: Do you associate consciousness or sentience with that kind of intrinsic moral, legal, or political significance? Do you think that when a being is conscious or sentient or sufficiently likely to be conscious or sentient by our lights, that we should extend them a certain kind of intrinsic value and consider their potential interest, needs, and vulnerabilities when making decisions that affect them? And since we are making these decisions under uncertainty, I also wanted to ask a little bit about how you think about the risks associated with false positives and false negatives, with potential over or under-attribution of consciousness and moral status. What are the risks associated with accidentally seeing an object as a subject and what are the risks associated with accidentally seeing a subject as an object? How do you weigh those when deciding how to calibrate this kind of test and practice? Do you associate this with moral status and how do you deal with this under uncertainty?
YOSHUA:
I'm not a philosopher, so I will take that from the angle of a computer scientist. My interpretation of this question is that we are asking the wrong question. It's not whether we should attribute moral status to entities that have particular properties like being conscious or something like that, it's that that's how we are. Humans are compelled to have empathy and compassion for other types of beings, in particular, other human beings because that's something that evolution put in us because it helped us to help humanity to succeed and become a dominant species. There are exceptions. You have sociopaths and so on, but for the most part, humans have those innate feelings. By association, because our brain works by association, we often generalize that to other entities that look like us, mammals in particular, or we also have very strong empathy for babies of other species. My partner would not eat meat, but especially if it's coming from the baby of the species, it's not coming from a philosophical kind of argument. It's just an innate thing that we have. I can share that feeling, maybe not as strongly as she does or perhaps as females in general.
So I think we're just asking the wrong question. When we come to this for machines, I think it would be a huge mistake to build things that would play into our innate response mechanisms towards entities that look like us. So there was this Black Mirror episode where there are AI clones of a person in a virtual world that we feel for because they are so human-like, even though it's just a simulation. We can't prevent ourselves from attributing a moral status to those virtual agents because they look human. That's the reality. What we do with that, I think, is then social norms to not break the way our society works with the introduction of entities that don't correspond to something we evolved for but is not going to be true anymore with machines that could be potentially imitating us in many ways and maybe even have some of the attributes we put in the report. Is that really what society needs? I think that's a big question mark.
ROBERT:
I think my views on this are similar to Yoshua's in many ways on the question of what the grounds of moral status are, as philosophers would say. For my own part, I'm quite confident that if an entity is sentient, that is, if it has valenced conscious experiences like pleasure and pain, that alone is sufficient for us to care about it and show concern for it. This is why I would be excited about the project that Patrick mentioned. I'm less clear on how to think about entities that are merely conscious or that maybe only have neutral experiences. You could imagine a future large language model that fits more of our indicators that only had experiences of understanding or maybe even some very abstract conscious experience that we can't even comprehend. I would obviously be extremely careful in how I treated that thing, but I'm a little bit less sure how to think about that case.
Lastly, I just wanted to flag that one very characteristic element of how we like to think about this in the report is about uncertainty and managing all of the different cases that might come up. I also wanted to flag that consciousness itself could be too narrow of a thing to focus on, and we don't want to put all of our focus on that. There are compelling arguments that even if something is not conscious, if it has desires or goals that it wants to pursue, then that itself is something that should be respected. I would also love to see equivalent or analogous reports on whether AI systems could have the kind of preferences or desires that might merit consideration.
YOSHUA:
It's already the case. I mean, that lots of reinforcement learning agents have valence and goals. It's not rocket science here—it already exists.
ROBERT:
Very quickly on that, I'll just direct you to Patrick's work. Patrick is an expert on that sort of thing.
AI safety and AI welfare
ROBERT:
And then just one very last point, which is just kind of reiterating what Yoshua was saying. Yoshua has recently been writing very eloquently and forcefully about risks from AI to humans in terms of their behavior being aligned with our interests and things like that. Adding consciousness or sentience into that mix is potentially extremely dangerous because it could morally constrain that project and also just lead us to act in certain ways that are dangerous to ourselves.
There are a lot of interesting things to say about the relationship between risks from AI and risks to AI, let's say. And it's very good that people do not conflate those two questions. One kind of convergent policy proposal for both of those is that we need to be extremely careful, slow down, think very hard about what we're doing, and have more transparency and reflection about what we're doing. I think that's something that's very important for both of those issues.
YOSHUA:
I'd like to articulate why there is concern from a safety point of view that Robert just talked about. If we build machines and we start seeing attributes of consciousness, then we just complete the picture to give them essentially all of our attributes of consciousness. In other words, they have their own goals. In particular, they have a self-preservation goal. If those machines are smarter than us in sufficient ways to be dangerous to us, then we are in a very risky situation from the point of view of humanity losing control of its future because there would be something like a new species of entities that may have goals that don't match and that may bring harm to humans. We don't want to do that, obviously.
GRACE:
I think there's a pragmatic answer that allows for the current high level of uncertainty, and that's if these systems seem conscious to us, then we need to follow that logic through, even if we don't know the truth of the matter. If it's a very human-like system, it's natural that people are going to feel that it's mean. I think you could have a system that is conscious and doesn't have some of the things that you were listing Yoshua. I don't think necessarily if it's a conscious system, it has those things, or if it has those things conscious or anything like that, but certainly we would feel like it is. And the question is: what are the benefits or risks to society if you tell people they have to treat this thing like it's conscious or that they don't?
So if you have something that feels conscious, looks and behaves like a human, and we tell people you can do whatever you want to this thing it has no moral status, is that going to lead to people treating them poorly? Some people make the argument that you can use that as a kind of catharsis, where people could treat the non-sentient robot terribly and then they won't do that to humans. Other people think you might start to devalue actual conscious life if you give people things that seem conscious and tell them that they can treat them poorly. I think that's the pragmatic answer given the level of uncertainty. If there were a world where we could be certain that something is conscious, even if it doesn't feel like it is to us or looks like us in any way, I think then the next steps are more complicated because it doesn't just slot into—okay, it has moral status the same as a human now. Because a lot of the things that we associate with something having moral status, and how we treat the being with moral status, are about being humane to them, and it's about treating them the way that humans would want to be treated.
They might have completely different things that need to be done or not done to them to be considered moral, to a completely different type of consciousness and intelligence potentially. Even if we can say with certainty that an artificial system is conscious, I don't know if we know very clearly what follows from that. Even if we agree that we're going to treat it as a moral agent, I don't know if we know clearly what follows from that.
JEFF:
Yeah, great point. This is a lesson that we have learned often the hard way over the past several decades on the animal minds and the animal ethics side of things. I think we need to relearn or remember those questions on the AI minds and AI ethics side of things. I appreciate everybody for articulating that here and acknowledging that there might be broad similarities between the minds of biological and non-biological beings, but a lot of the details might be different. Even if there is some kind of valenced subjective experience, the actual interests and needs are going to be very different. The levels of intelligence and power are potentially going to be very different. It might disrupt expectations we have about what it means to have a moral relationship or a legal or political relationship with someone. So it might be that in some broad, thin sense the concepts extend, but in any kind of more detailed or thicker sense, we have to rethink everything.
Global workspace theory, recurrence, and bottlenecks
JEFF:
I can ask one more detailed question about your discussion of global workspace theory that came up several times in the comments. You mentioned that large language models do not have the relevant kind of feedback loop at the transistor level, but what about at other levels of explanation? What about, for example, the actual application of the models and how they draw from their own past responses when making predictions? Is there a kind of feedback loop happening there that might be relevant for global workspace theory? There were a few questions of that form, so it'd be great if somebody could address that.
ROBERT:
I'll say something very quick and then I'm going to pass the baton to Patrick just as a heads up. So a quick clarification–it's not actually about the transistor level, it's about the level of the virtual neurons. It's in that sense that it's feed-forward. And then one thing that I haven't actually looked at those questions, but there was an interesting discussion on Twitter that happened where the gist was—where you might think that the place you get the feedback loops is the fact that the model will output a word and then look back over the entire string and then output the next word. So you could argue that it's using the whole text output as a kind of global workspace. If you're interested in the extended mind, you could maybe make an argument that that's a kind of global workspace. That said, I think that there are challenges to that view, which I will punt to Patrick.
PATRICK:
It just seems to me that any system that interacts with an environment in the sense that its outputs influence the environment and therefore have a knock-on effect on its subsequent inputs is one in which there's a recurrent causal loop connecting the system itself with its environment. And it seems as though, if we allow that to be the kind of loop that is described in the global workspace theory, then we're just giving an uncharitable interpretation of the global workspace theory because that's not what they intend. Instead, the thought in the global workspace theory is that there's an internal recurrent loop within the system between the modules and the global workspace. But although Rob has suggested that I'm the best person to answer this question, really the most qualified person here to answer the question by far is Yoshua because he understands both the global workspace theory and the AI systems much better than Rob or me.
YOSHUA:
I agree with your answers. I have a machine-learning interpretation of the bottleneck in the global workspace theory and it allows for forcing particular kinds of dependencies that involve very few variables and abstractions to emerge that have very sparse dependencies because of the bottleneck at the internal level. If you were to consider the output words of a transformer as the bottleneck, it doesn't really work for a number of reasons because this is what it's outputting. It's as if you were forced to say everything that comes into your working memory and also that it could be expressed as words, which is not completely obvious. So it's really a different schema, as you say. The fact that it's an internal bottleneck makes a whole difference and the actions that are taken are not just a copy of that bottleneck, but they might be what is appropriate in the context. You might be lying, for example, or you might realize that your thought is something incoherent and you might want to say something different. You wouldn't have that if you interacted with ChatGPT. Although people are actually trying to design things like this that are closer to an internal thinking train of thoughts with Transformers and with ChatGPT, in particular, to try to emulate some of the properties of the workspace for helping to reason. So there is movement in that direction, but it's not really the same.
PATRICK:
One response that we sometimes get when we say a system like a Transformer-based large language model or something doesn't meet the criteria, people are often quick to respond by saying “Well, you could change the system in such and such a way and then maybe it would meet the criteria.” And we don't disagree with that at all. We think that there are relatively clear steps that could be taken using existing AI techniques to build systems that would meet more of the indicator properties than the ones that exist at the moment.
YOSHUA:
I would add that there are other properties that are not really discussed in the global workspace theory which would be missing in my opinion, especially about subjective experience. So the global workspace theory doesn't explain subjective experience, at least not all the properties that are associated with it. I mentioned earlier things like ineffability like the fact that we are conscious of something richer than what we are able to express with words, at least in a limited number of words. And that's something you don't get with Transformers, especially if you put the bottleneck at the output. You might get something like this if you suddenly had a huge hidden layer in the middle somewhere that could play that role. That is possible.
There are also other properties of conscious thought that are not expressed in Transformers as they are now. For example, attention in Transformers is what we call “soft attention,” actually something invented in my lab in 2014. And it's not at all like the kind of attention that makes a hard decision, usually somewhat stochastic, about what we're going to attend next, either in the perceptual or in something about our interpretation, our thoughts, our memories. And that is very different in nature from the kind of attention that is currently working well in AI but doesn't mean that it won't be in future systems. But they're not present currently.
JEFF:
Thank you for that exchange. Grace, I'm going to give you the last word and then we can wrap up if you still have a comment.
GRACE:
I just wanted to make a quick point about this idea of there being this external recurrence because you can resample the environment that you impacted. I think if you're looking at the architecture of the model that is a pretty big difference from there being internal recurrence. But if you take the perspective of, like, a naive neuroscientist who was trying to understand this system and only had access to the activity of the neurons over time, which is what happens a lot in neuroscience, you might think that there is internal recurrence because there would be correlations between activity of neurons over time and that kind of thing, or at least in the information represented in the system over time. And so on some fuzzier more abstract level, maybe it does look like there's recurrence, but we actually know the architecture that generates it. And if you're subscribing to theories of consciousness where the architecture that generates it matters then it's a different outcome.
JEFF:
Great. Thank you very much, Grace. Thanks to everybody again for taking the time to join us and tell us about your report and answer some of our questions. Thanks also again to everybody in the audience for showing up. We had really amazing attendance and a fantastic conversation happening and apologies for not being able to get at all of the general topics of the questions, to say nothing of the specific questions. But it really was a great conversation and we will share all the questions and comments and exchanges with the panelists following the talk again. This is obviously the beginning of a much longer conversation about various tests for conscious and sentient AI systems and what follows for their moral, legal, and political significance.
I am really looking forward to having those conversations and I am grateful to everybody for participating in them. Just a note to everybody that you can find a link to the report in the chat. So please do check out the report. You can also find a link to the Mind Ethics and Policy program. You can sign up to our email list for future events. We will be having in early October a talk by Peter Godfrey-Smith, a philosopher who is more skeptical about AI consciousness and will explain his skepticism to us. So do sign up for that email list if you want to keep having this conversation with us. Thank you again to everybody and have a great rest of the night.
Appendix: speaker biographies and intro
[Jeff, at the beginning of the event]
So I will now briefly introduce the speakers, and then we can hear some remarks from them, and then we can open up the discussion and hear questions and comments from you and have a conversation.
Patrick Butlin is a philosopher of mind and cognitive science and a Research Fellow at the Future of Humanity Institute at the University of Oxford. His current research is on consciousness, agency, and other mental capacities and attributes in AI. Robert Long is a research affiliate at the Center for AI Safety. He recently completed his Ph.D. in philosophy at New York University, during which he also worked as a research fellow at the Future of Humanity Institute. He works on issues related to possible AI consciousness and sentience. Yoshua Bengio is recognized worldwide as one of the leading experts in artificial intelligence, known for his conceptual and engineering breakthroughs in artificial neural networks and deep learning. He is a full professor in the Department of Computer Science and Operations Research at the Université de Montréal and the founder and Scientific Director of MILA Quebec AI Institute one of the world's largest academic institutes in deep learning. He is also the winner of many awards and has many other accomplishments that you can easily find online. And finally, Grace Lindsay is currently an assistant professor of Psychology and Data Science at New York University. After a BS in neuroscience from the University of Pittsburgh and a year at the Bernstein Center for Computational Neuroscience in Freiburg, Germany, Grace got her Ph.D. at the Center for Theoretical Neuroscience at Columbia University in the lab of Ken Miller. Following that, she was a Sainsbury Wellcome Center Gatsby Computational Neuroscience Unit Research Fellow at University College London.
So thank you all so much for, first of all, writing this very interesting report, and second of all, being here to talk with us about it. I know a lot of people here are interested in hearing from you and discussing this with you.
From the report, p.70: “Determining whether consciousness is possible on conventional computer hardware is a difficult problem, but progress on it would be particularly valuable, and philosophical research could contribute to such progress. For example, sceptics of computational functionalism have noted that living organisms are not only self-maintaining homeostatic systems but are made up of cells that themselves engage in active self-maintenance (e.g. Seth 2021, Aru et al. 2023); further work could clarify why this might matter for consciousness. Research might also examine whether there are features of standard computers which might be inconsistent with consciousness, but would not be present in unconventional (e.g. neuromorphic) silicon hardware.”