It is plausible that AI systems developed by leading AI labs could deserve our moral consideration some time in the not-too-distant future. Potential AI moral patienthood is not some remote far-future concern—it could be an important component of ensuring that AI development goes well. It deserves more attention now. The subject of this post is what AI labs might do now, in spite of our considerable uncertainty about these issues, to start handling this issue responsibly.
There are several features that AI systems could have that are (a) plausibly sufficient for moral patienthood and (b) features they could develop relatively soon. My work has focused on three in particular:
Consciousness: by which I mean subjective experience, or what philosophers call "phenomenal consciousness".
Sentience: certain kinds of conscious experiences, namely positive and negative (“valenced”) experiences.
Agency: in this context, the possession of desires, preferences, and goals, whether or not they are conscious. Notably, morally-relevant agency might not necessarily require consciousness. On some moral views, there could be entities that have goals and preferences and desires that should be respected, even if they are not conscious.
There is obviously a lot of uncertainty about these issues - both moral uncertainty, about which features we do care about or should care about, and conceptual and empirical uncertainty about what entities have consciousness, sentience, and agency.
These questions are hard enough in the case of animals. And with AI systems, we are dealing with even stranger entities, at the edge of our philosophical and scientific understanding, and far past the usual remit of our moral practices. We should be very humble given this state of knowledge, and not be dogmatic about any of these issues. (For example, notice that I am not claiming that AI systems will definitely deserve moral consideration soon).
That said, this uncertainty is not total and it's not hopeless. It is not an excuse to throw up our hands, or just declare these issues to be philosophical confusions. While these are perplexing issues, there's low hanging fruit to be had in just trying to be more precise when we analyze AI systems in light of our best (albeit tentative) scientific theories of consciousness, sentience, and agency. We can also investigate the possibility of just asking AI systems themselves about these issues. (Note: I mean future systems that we could actually communicate with—not today’s LLMs, which don’t really “say” all that much about themselves in any straightforward sense).
Furthermore, we can look out for red flags of some worst-case scenarios for (potential) AI welfare, even if we can’t draw a confident line between those AI systems which are, and aren’t conscious (or sentient or agency-having).
Leading AI labs should develop internal procedures, even tentative ones, for investigating the likelihood that their AI systems have these properties. Ideally, we would have precise evaluations and benchmarks and interpretability tools. I don’t think we know enough, about our AI systems or about moral patienthood, for anything so precise right now. But short of that, labs should specify procedures for at least qualitatively assessing these likelihoods, for example by allowing experts in the relevant areas to review their models. And in the meantime, they can support research into developing more precise evaluations.
And even while we are uncertain—even while we suspect (as I do) that AI welfare might currently be a non-issue—labs can take actions that are cheap, not contrary to AI safety efforts, and robustly good for AI welfare. It would be good for a lab to take at least take one concrete action on behalf of potential AI welfare (even if that action can be and is justified for other reasons as well). Some suggestions of such actions can be found in Bostrom and Shulman’s “Propositions Concerning Digital Minds and Society” and in Ryan Greenblatt’s post on AI welfare.
And finally, labs should communicate about these issues honestly. People in leadership positions at several leading labs have expressed concern about AI moral patienthood. But the issue is perceived to be outside of the current Overton window, and no lab has ever officially acknowledged that this issue is at least worth considering. That should change.
Is there an easy example of a non-conscious entity with morally relevant agency?
Well I don't agree with how those terms were defined, but I do think anyone trying to build machines that can act in a manner even slightly outside of "dumb" automation should REALLY consider the implications. Given that most of ML/DL as an industry, not to mention every investor, is foaming at the mouth over millions or even billions of "installs" or users, I'd say they have already failed.