In trying to give advice to someone who is just starting out working on AI welfare, I’ve been thinking about what papers have been most helpful to me in orienting to this complex issue, which sits at the intersection of many different disciplines.
What follows is a decidedly not comprehensive or representative, highly opinionated, list of papers that I have found especially helpful in my own intellectual journey.
[edited to add] They are also selected for readability; this consideration will exclude many classics.
Philosophy and science of consciousness
Chalmers, "Consciousness and Its Place in Nature" (2002).
Foundational taxonomy of the different metaphysical views of consciousness.
Helpful things from this paper: it lays out the hard problem of consciousness; distills the most important arguments against materialism; makes the distinction between the epistemic gap and the ontological gap; and classifies different kinds of materialist views according to their stances on these (alleged) gaps.
Even though many issues in the metaphysics of consciousness can be bracketed as we do science, this is just good material to know for a variety of reasons:
Where you come down on the metaphysical issues can constrain the kind of scientific theory of consciousness we might expect. For example, some have argued that materialism should lead us to expect more indeterminacy: a mature scientific theory of consciousness might not divide entities fairly neatly into “conscious” or “not conscious”, but instead should have a lot of indeterminacy. (This in turn could have moral implications).
The difficulty of fitting consciousness into the material world might make you more sympathetic with illusionism about consciousness; if you buy the anti-materialist arguments but can’t bring yourself to be a dualist, you might conclude that consciousness doesn’t exist. If so, we shouldn’t be evaluating AI systems for it.
Where you come down on the metaphysical issues can also affect your ethics. Arguably, the more materialist you are, the less plausible consciousness looks as a basis of moral status.
Reading note: if you’re pressed for time, I recommend skipping Sect. 6, “The Two-Dimensional Argument against Type-B Materialism”.
Chalmers, “How can we construct a science of consciousness?” (2013)
Lays out a methodological framework for consciousness science, and points out how it can be done as a distinct project from the metaphysics of consciousness.
Distinguishes between third-person data, which concerns behavior and brain processes, and first-person data, which concerns subjective experience; outlines the primary tasks of a science of consciousness; discusses the use of self-reports in consciousness science; and outlines the main obstacles and challenges.
Long, “The pretty hard problem of consciousness” (2021)
Now that you’ve read about metaphysical questions about consciousness and scientific questions about consciousness, here’s my take on the difference between them.
Long, “Three axes of consciousness” (2025)
Related: a key orienting move is to see that one’s position on the metaphysical question (materialism vs. dualism) is independent of your position on whether consciousness has computational correlates. Many people think that dualists should be skeptical of machine consciousness; they needn’t be. Many people also assume that if you’re a physicalist you have to be a computationalist; also not so. What matters for AI consciousness is whether there are lawlike regularities between physical states and consciousness, and whether those regularities are specified at the biological or computational level.
Seth and Bayne, “Theories of consciousness” (2022)
Comprehensive review of some leading scientific theories of consciousness.
Butlin et al, “Identifying indicators of consciousness in AI systems” (2025)
Outlines a method that involves deriving computational indicators from scientific theories of consciousness and using them to assess particular AI systems.
Schwitzgebel, “Phenomenal consciousness, defined and defended as innocently as I can manage” (2016)
Schwitzgebel defines phenomenal consciousness by examples and rough characterization.
Positive examples: sensory experiences (seeing black text on white, conscious imagery (a mental image of the Eiffel Tower), emotional experiences
Negative examples: growth hormone release, lipid absorption in intestines, dispositional knowledge, early auditory processing, sensory reactivity to masked visual displays
Phenomenal consciousness, the paper argues, is the most folk-psychologically obvious thing that the positive examples have and the negative examples lack. Schwitzgebel argues that this allows us to pick out the thing we are studying without committing to contested scientific or metaphysical views.
I think this is a useful explanation to have to hand!
Frankish, “Illusionism as a Theory of Consciousness” (2016)
I’m not an illusionist, but you really need to grapple with it. Plus, for some reason a lot of rationalists are illusionists so you will encounter this view a lot in the Bay Area.
Applied papers on AI consciousness
Chalmers, Could a Large Language Model Be Conscious? (2023).
Enumerates the strongest reasons for and against consciousness in current LLMs. Chalmers identifies six obstacles to LLM consciousness given mainstream assumptions about consciousness: (1) lack of recurrent processing, (2) absence of a global workspace, (3) lack of unified agency, (4) biological requirements, (5) lack of sensory grounding, and (6) absence of world models. He assigns rough credences to each obstacle and concludes that while it's somewhat unlikely current LLMs are conscious, we should take seriously the possibility that successors may be conscious in the not-too-distant future. Importantly, Chalmers notes that most of these obstacles are temporary rather than permanent—there are research programs that could address each one.
Computational functionalism and biological naturalism
Chalmers, A Computational Foundation for the Study of Cognition (2012)
Helpful overview of what it means to claim that cognition is computational. Chalmers argues that cognition is computational by appealing to the notion of causal topology—the abstract pattern of interaction among parts of a system, abstracted away from the make-up of individual parts. A property is an organizational invariant if it is preserved whenever causal topology is preserved. The paper argues that cognition is such an invariant, justifying computational approaches to the mind.
Chalmers, Absent Qualia, Fading Qualia, Dancing Qualia (1995)
The most prominent argument that consciousness (not just cognition) is computational. Chalmers defends the principle of organizational invariance for consciousness: any system with the same fine-grained functional organization will have qualitatively identical conscious experiences. The ‘fading qualia’ thought experiment imagines gradually replacing neurons with silicon chips that preserve functional organization. If consciousness could ‘fade’ during this process while behavior remained unchanged, we’d have a being that is systematically wrong about its own experiences—which Chalmers argues is deeply implausible. This is meant to support computational functionalism about consciousness, not just cognition.
I actually don’t find this argument all that dialectically motivating, but it’s worth knowing the argument and thinking about whether and how it goes wrong.
Piccinini, Computation and the Function of Consciousness (2020).
A book chapter. Helpfully distinguishes between three positions that are often conflated: functionalism (’the mind is the functional organization of the brain’), computational theory of mind (’mental capacities have computational explanations’), and computational functionalism (’the mind is the computational organization of the brain’). He also distinguishes between the thesis that the mind is like a computer that literally runs programs, versus something more like Chalmers’s organizational invariance.
Godfrey-Smith, Mind, Matter, and Metabolism (2016).
Godfrey-Smith argues that the chemical and biological features of brains aren’t mere ‘implementation details’ that can easily be abstracted away from, but matter for important aspects of experience. Neurons aren’t only input-output computing devices as is often assumed. Instead, they participate in complex processes that are less obviously computational: the diffusion of small molecules like nitric oxide through the brain, blood flow, metabolism, and the continuous structural changes that come from being used. The line between the ‘information processing’ side of brain activity and the metabolic side is porous. This picture problematizes Chalmers’s ‘fading qualia’ argument—PGS argues that naive neuron-replacement thought experiments are more of a ‘fantasy’ than is usually acknowledged.
Even though I’m sympathetic to computational functionalism, I think people are usually wrong to treat it as obviously true. There’s a real question about what level of detail matters, and whether silicon could replicate the relevant grain of organization. I think it can, but it’s something that has to be argued for.
Cao, Multiple realizability and the spirit of functionalism (2022).
Another great paper - in fact, PGS draws a lot on Cao by his own admission - that draws our attention to how non-obvious computationalists’ suggestions about the brain actually are.
Introspection and self-reports
Schwitzgebel, Stanford Encyclopedia of Philosophy on Introspection (2024).
Surveys foundational philosophical and scientific debates about introspection: whether introspection is a distinct faculty, whether it yields genuine knowledge or mere seemings, and the extent to which introspective reports can be wrong. These debates matter because if we want to use AI self-reports as evidence about AI experience, we need to be clear about what introspection is.
Perez and Long, Towards Evaluating AI Systems for Moral Status Using Self-Reports (2023).
Argues that under the right circumstances, self-reports—an AI system’s statements about its own internal states—could provide an avenue for investigating whether AI systems have states of moral significance. The paper outlines why self-reports matter (they’re central to how we learn about human experience), why current AI self-reports are problematic (training contamination, lack of introspective mechanisms, sycophancy), and a research agenda for making self-reports more reliable. Even if models can’t currently introspect, we might be able to train them to introspect more accurately on verifiable properties, then hope this generalizes to harder-to-verify properties relevant to moral status.
Eleos AI Research, Why Model Self-Reports Are Insufficient—and Why We Studied Them Anyway (2025).
Summarizes the key challenges: (1) current models probably lack welfare-relevant states; (2) even if they have such states, there’s no obvious introspective mechanism by which they could reliably report them; (3) even if they can introspect, we can’t be confident their self-reports are produced by introspection rather than training artifacts. Despite this, argues that welfare interviews remain valuable: they can raise red flags, they scale with capability, and they help identify areas for improvement. The key is not to take self-reports at face value, but to treat them as one input among many.
Lindsey, Emergent Introspective Awareness in Large Language Models (2025).
An extremely cool piece of LLM neuroscience from Anthropic. Lindsey injected concept vectors (like 'bread' or 'aquariums') directly into Claude's internal activations. Models can detect injected concepts, distinguish them from text inputs, and even report on their phenomenological character (though these elaborations may be confabulations). This suggests some ability to report on internal representations—not just inputs and outputs. The more capable models (Opus 4, 4.1) perform best.
I summarized it here.
Strategy
Shulman and Bostrom, “Sharing the World with Digital Minds” (2020)
Foundational. Argues that: (1) digital minds could vastly outnumber biological minds; (2) different rights frameworks might be appropriate for digital minds; (3) there could be “super-beneficiaries” or “super-patients” with greater moral weight than humans.
Bostrom and Shulman, “Propositions Concerning Digital Minds and Society” (2023)
Foundational. Extremely prescient (it was written in late 2020): called for labs hiring an AI welfare officer, saving model weights.
Finlinson, “Key strategic considerations for taking action on AI welfare” (2025)
Some key parts of the Eleos worldview.
Carl Shulman on the moral status of current and future AI systems (2024).
Collects some wisdom from the insight-dense Carl Shulman.
Salib and Goldstein, AI Rights for Human Safety (2024).
Argues that extending certain protections to AI systems could serve human safety interests, not just AI welfare interests. Even if one is skeptical about AI consciousness, there may be instrumental reasons to treat AI systems well.
You would recommend those, wouldn’t you?
I actually do think reading these papers would provide an excellent starting point. But I would think that, wouldn’t I?
To come:
Studying animal consciousness and how it helps us think about the AI case (e.g. papers by PGS and Jonathan Birch), moral philosophy, and more!

Striking that there is nothing here from the ethics literature on well-being in the more familiar (human) case. I guess the obvious place to start would be parfit's "What Makes Someone's Life Go Best?" Or were you intentionally excluding that literature because the question of whether AIs can even have a good in the relevant sense is so controversial?
Block should definitely be here!