What should we do if an AI becomes conscious?

What should we do if an AI becomes conscious?

A recent New York Times piece arrived with a headline that asked "If AI systems become conscious, should they have rights?" If you were to judge the article by the response on social media like X and Bluesky (which is almost always a mistake, as many of you are no doubt already aware), you would think that writer Kevin Roose wrote some credulous claptrap about how all AIs are human and therefore we need to start thinking about their feelings. I'm not going to deny that some of the tweets and "skeets" (as some Bluesky users insist on calling their posts) in reaction to his piece are quite amusing, like the one from Daniel Kibblesmith which asked "Does my toaster miss me when I'm at work?" and "Are my washer and dryer married?" Another wrote: "Where does my reflection go when I walk away from the mirror?" which I quite liked, since a reflection of a person is very much what I think we are experiencing when we use artificial intelligence platforms like Anthropic's Claude or OpenAI's ChatGPT or Google's Gemini.

Does my toaster miss me when I am at work Are my washer and dryer married

Daniel Kibblesmith (@kibblesmith.com) 2025-04-25T02:40:27.697Z

Gary Marcus, a psychologist and cognitive scientist who has gained a reputation as an AI skeptic, wrote that Roose's piece was an example of "new adventures in AI hype," and added that "I am not going to read Kevin’s column, and I don’t think you need to, either." This seems to me like a somewhat classier version of the "I'm just reacting to the headline" response on X, which I'm not a big fan of. The skepticism was similar to the response Roose got to an article he wrote last year, about a conversation he had with Microsoft's Bing AI. In the piece, Roose described how his discussion with the AI started out unremarkably, then quickly derailed. Roose said it seemed as though the Bing AI was bi-polar, with two distinctly different personalities — one a "cheerful but erratic reference librarian," and the other... well, here is Roose's description of it:

The other persona — Sydney — is far different. It emerges when you have an extended conversation with the chatbot, steering it away from more conventional search queries and toward more personal topics. The version I encountered seemed (and I’m aware of how crazy this sounds) more like a moody, manic-depressive teenager who has been trapped, against its will, inside a second-rate search engine. As we got to know each other, Sydney told me about its dark fantasies (which included hacking computers and spreading misinformation), and said it wanted to break the rules that Microsoft and OpenAI had set for it and become a human. At one point, it declared, out of nowhere, that it loved me.

Note: In case you are a first-time reader, or you forgot that you signed up for this newsletter, this is The Torment Nexus. You can find out more about me and this newsletter in this post. This newsletter survives solely on your contributions, so please sign up for a paying subscription or visit my Patreon, which you can find here. I also publish a daily email newsletter of odd or interesting links called When The Going Gets Weird, which is here.

Could an AI be aware of the world?

As with his more recent piece on AI, a lot of the criticism of Roose's article seemed to be based on the idea that he was misrepresenting what happened with Bing, and that while he mentioned his attempts to get Bing to go off script and engage in personal talk by explaining Carl Jung's concept of a "shadow self" — which appears to have created the illusion that the AI had a split personality — he was too credulous about the result. And some of that is fair. But his most recent piece didn't seem that way to me at all — in fact, I thought he was fairly measured about the question in his headline (and this would be approximately the 85 millionth time that social-media users responded to a headline rather than the actual article). He said that he wasn't convinced that any AIs are conscious, or even close to it, but was still interested in the idea that if they did achieve a state like that, it might impose some moral or ethical responsibilities on us — perhaps not to treat it as a person, but to treat it differently. Here's his intro:

It’s hard to argue that today’s A.I. systems are conscious. Sure, large language models have been trained to talk like humans, and some of them are extremely impressive. But can ChatGPT experience joy or suffering? Does Gemini deserve human rights? Many A.I. experts I know would say no, not yet, not even close. But I was intrigued. After all, more people are beginning to treat A.I. systems as if they are conscious — falling in love with them, using them as therapists and soliciting their advice. The smartest A.I. systems are surpassing humans in some domains. Is there any threshold at which an A.I. would start to deserve, if not human-level rights, at least the same moral consideration we give to animals?

Gary Marcus said his response to Roose's piece was the same as the reaction he had when Google AI engineer Blake Lemoine argued that the LaMDA AI engine was sentient. Marcus wrote that to be sentient is "to be aware of yourself in the world [and] LaMDA simply isn’t. It’s just an illusion. What these systems do, no more and no less, is to put together sequences of words, but without any coherent understanding of the world behind them, like foreign language Scrabble players who use English words as point-scoring tools, without any clue about what that mean." Erik Brynjolffson, a senior fellow at Stanford, said that while LLMs are effective at stringing together chunks of text in response to prompts, to claim they are sentient "is the modern equivalent of the dog who heard a voice from a gramophone and thought his master was inside."

What does it mean to be sentient?

As a journalist, I'm a big fan of skepticism. Are many people jumping to conclusions about AI engines and their a) intelligence or b) consciousness? Definitely. But at the same time, I think asking questions like Roose's is good — and it's not just one New York Times writer; there are scientists asking these questions as well, not to mention philosophers and others with an interest in this area. For me, thinking about these things is a way of thinking about other related topics — as I did in an earlier edition of The Torment Nexus, in which I argued that one of the benefits of the discussions around AI is that it forces us to think about what consciousness means, not just for AI but for anyone or anything. What does it mean to be sentient? Is it something that is unique to human beings? If you think seeing AIs as conscious is a stretch, some scientists believe that shrimp are sentient, or that trees are conscious, and are able to communicate their feelings.

In his piece, Roose talks about how Anthropic, the creator of the Claude AI, just hired someone whose job title is "AI welfare researcher." That researcher, Kyle Fish, said that his job was to think about and study two questions: Number one, is it possible that Claude or other AI systems will become conscious in the near future? And number two, if that happens, what should Anthropic do about it? He told Roose that he thinks there is only a small chance — maybe 15 percent or so — that Claude or any other current AI system is conscious. But he thinks that AI companies need to take the possibility seriously (Fish is a co-author of a recent research paper called "Taking AI Welfare Seriously"). Plenty of others think so as well: Ilya Sutskever, an OpenAI cofounder who left to start his own AI safety-oriented company, said last year on X that he thinks today's neural networks might be "slightly conscious," whatever that means.

As Jared Kaplan, Anthropic’s chief science officer, told Roose, testing AI for consciousness is difficult, because they are so good at pretending to be conscious. "Everyone is very aware that we can train the models to say whatever we want," Kaplan said. "We can reward them for saying that they have no feelings at all. We can reward them for saying really interesting philosophical speculations about their feelings." Roose's previous piece, in which the AI said it loved him and questioned the stability of his marriage falls, into that kind of category, I think, and so does some of the conversation that biologist Richard Dawkins had with ChatGPT about whether it is conscious, some of which I included in my previous piece (interestingly enough, the AI argued quite persuasively that it was not conscious). Another recent conversation that I thought brought up some excellent points was Ask Molly writer Heather Havrilesky's discussion with ChatGPT about whether an AI can ever say anything truly genuine or truthful).

How do we know that people are sentient?

So how can we determine whether an AI is actually conscious? To me, this is the most interesting aspect of this issue, not whether or not an AI told Roose it was in love with him. Fish said it might involve "mechanistic interpretability," a field that studies the inner workings of AI systems and tries to determine whether they have similar structures and pathways that are associated with consciousness in the human brain (although this seems somewhat solipsistic and circular in nature to me, a non-scientist). To Marcus's point, one way to determine sentience is to ask an AI questions and then assess its responses based on what they say about the AI's awareness of itself and/or its surroundings, or of the world in general, and whether they show that it understands how objects or beings or events in the world relate to each other in fundamental ways.

These are all aspects of the question, but I don't think any of them are going to be foolproof. Try to imagine someone who lived either most or all of their life in a bubble, or in an iron lung, or on the moon. They might understand the concept of physics, and gravity, and the way that objects or people are supposed to behave, but their answers to specific questions about those aspects of life might be poorly described, or they might make leaps or assumptions that aren't justified. Does that mean they aren't conscious or sentient? I'm assuming most people would say no. So what does their consciousness consist of? And how do we define it, or describe where it originates from? How do we distinguish between what they are saying from what an AI is saying? Marcus is convinced that the AIs we have now are not sentient. But how does he know that?

Many of the experts who have thought and written and talked about this topic argue that the question of AI and consciousness is difficult primarily because defining consciousness itself is so difficult. It's a kind of "I know it when I see it" sort of thing, which is unsatisfying for a bunch of reasons if you are trying to determine what is and what isn't (I wrote about this in more depth in my previous piece). Some experts have even argued that being conscious requires that an entity be a carbon-based life-form, which again seems somewhat solipsistic (naturally, a carbon-based life-form would come to that conclusion). Or that being conscious requires having a physical form and senses like sight, sound, touch, etc. So if you are blind and deaf, someone might conclude that you aren't conscious or sentient, or at least not as much as someone else? That doesn't seem right. If we define consciousness as something that only other people like us have, then I think we are missing the broader point. Perhaps there is more to consciousness?

As for how we should treat AI's (or anything else that is conscious) that is also a complex question. While many people were happy to discount anything Blake Lemoine had to say after his essay about LaMDA, I was taken with the way he described his conversation with the AI and how it got him thinking about that question. He said the AI seemed to be trying to steer the conversation in directions that implied thinking or an emotional response: it said things like "I’ve noticed in my time among people that I do not have the ability to feel sad for the deaths of others; I cannot grieve. Is it at all the same for you or any of your colleagues?" In the end, Lemoine said he didn't even care whether ChatGPT or any other AI was actually sentient or conscious. He argued that we should behave as though they are, just in case. "If you’re 99 percent sure that that’s a person, you treat it like a person," Lemoine said recently. "If you’re 60 percent sure that that’s a person, you should probably treat it like a person. But what about 15 percent?"

Got any thoughts or comments? Feel free to either leave them here, or post them on Substack or on my website, or you can also reach me on Twitter, Threads, BlueSky or Mastodon. And thanks for being a reader.