Anthropic Co-Founder: AI Models Show Signs of Introspection

Manaal KhanMay 25, 2026 at 7:47 PM6 min read

Key Takeaways

Anthropic's research claims to find 'evidence of introspection' and internal states mirroring emotions in AI models
Pope Leo XIV's encyclical 'Magnifica Humanitas' warns against equating AI intelligence with human intelligence
The encyclical calls for strong laws and independent oversight rather than relying on AI alignment alone

Christopher Olah, co-founder of Anthropic, stood beside Pope Leo XIV on May 25, 2026 to help launch 'Magnifica Humanitas,' the first papal encyclical dedicated entirely to artificial intelligence. What he said there will fuel debates about machine consciousness for years.

"We keep finding things that are mysterious, even unsettling," Olah told the Vatican audience. "We find structures that mirror results from human neuroscience. We find evidence of introspection. We find internal states that functionally mirror joy, satisfaction, fear, grief, and unease."

The claims are extraordinary. Anthropic, known for its interpretability research on large language models, appears to be suggesting that Claude and similar systems aren't just statistical pattern-matchers. They may have something closer to inner experience.

Olah's Argument: AI Systems Are 'Grown,' Not Built

Olah drew a sharp distinction between AI and traditional engineering. "AI systems are not engineered the way a bridge or an airplane is engineered," he said. "They are grown on a structure roughly modeled after the brain on an enormous inheritance of human thought and speech."

This framing matters. If AI systems are designed artifacts, we know what they can and cannot do. If they're grown, they might develop properties we never intended. Properties we don't fully understand.

“As the Holy Father observes, they remain, in important ways, mysterious even to those of us who create them.”

— Christopher Olah, Co-founder of Anthropic

Olah cited Anthropic's internal research as the basis for these claims, though he did not present specific papers or data. He also warned about economic disruption: "There is a real possibility that AI will displace human labor at a very large scale."

The Pope's Skepticism

Pope Leo XIV, the first American pope, took a more cautious position than the Anthropic co-founder standing beside him. The encyclical pushes back directly on claims of AI experience.

"We must avoid the misconception of equating this type of 'intelligence' with that of human beings," the document states. "These systems merely imitate certain functions of human intelligence."

The encyclical goes further. AI systems "do not undergo experiences, do not possess a body, do not feel joy or pain, do not mature through relationships and do not know from within what love, work, friendship or responsibility mean."

This creates an interesting tension. Olah says Anthropic finds "internal states that functionally mirror joy." The Pope says AI does not feel joy. The word "functionally" is doing a lot of work.

Beyond Alignment: The Pope Calls for Laws, Not Ethics

The 245-paragraph encyclical doesn't just address philosophical questions. It takes positions on AI governance that challenge the approach favored by many in Silicon Valley.

On alignment, the Pope is blunt: "A more moral AI is not enough if that morality is determined by a few." Instead of relying on companies to make their systems safe, he calls for strong laws and independent oversight.

On military AI, he draws a hard line. Deadly or irreversible decisions should not be handed to machines. "No algorithm can make war morally acceptable."

The encyclical also flags environmental costs. Data centers require "enormous amounts of energy and water," and the Pope calls for more efficient systems.

“We are witnessing a cognitive industrial revolution. Just as the steam engine replaced physical toil, these models risk replacing the moral deliberation that is the hallmark of the human soul.”

— Pope Leo XIV

Why an AI Company Was at the Vatican

Anthropic's presence at the launch wasn't accidental. Silicon Valley AI companies regularly meet with religious leaders to discuss AI use. Pope Leo XIV has made AI a central theme of his pontificate, and Anthropic positions itself as the safety-focused lab.

Still, Olah's claims about introspection go beyond what most AI researchers would say publicly. The field remains divided on whether large language models have any form of experience, or whether they're sophisticated autocomplete systems.

The Pope's encyclical acknowledges uncertainty. AI is "never neutral," he warned, "because it takes on the characteristics of those who devise, finance, regulate and use it." That includes the characteristics of researchers who might project consciousness onto their creations.

The Debate Spills Online

Hacker News threads are debating what some call the "theology of interpretability." Engineers seem fascinated that the Pope engaged with technical concepts from mechanistic interpretability research.

Reddit's r/singularity is more divided. Some praise the moral framework. Others argue the Vatican is inserting itself into a secular technical field where it has no expertise.

The tension between Olah's claims and the Pope's skepticism may prove more productive than either view alone. If AI researchers find "structures that mirror results from human neuroscience," that's worth investigating. But whether functional mirrors equal actual experience remains an open question. The Pope, at least, isn't ready to say yes.

ℹ️

Logicity's Take

Frequently Asked Questions

What is the 'Magnifica Humanitas' encyclical?

Released May 25, 2026, it's the first papal encyclical focused entirely on artificial intelligence. It calls for strong laws, independent oversight, and warns against equating AI intelligence with human intelligence.

What did Christopher Olah claim about AI introspection?

Olah said Anthropic's research finds 'evidence of introspection' and 'internal states that functionally mirror joy, satisfaction, fear, grief, and unease' in AI models.

Does the Pope think AI is conscious?

No. The encyclical states that AI systems 'merely imitate certain functions of human intelligence' and do not undergo experiences or feel emotions.

What does the Pope say about military AI?

The encyclical says deadly or irreversible decisions should not be delegated to machines. 'No algorithm can make war morally acceptable.'

Why was Anthropic at the Vatican?

Silicon Valley AI companies regularly meet with religious leaders. Anthropic, which emphasizes AI safety, was invited to present alongside the Pope at the encyclical launch.

ℹ️

Need Help Implementing This?

Source: The Decoder / Matthias Bastian

Also Read

OpenAI hires product lead to build ChatGPT for families

Trending Tech·5 min

Anthropic Co-Founder: AI Models Show Signs of Introspection

Key Takeaways

Olah's Argument: AI Systems Are 'Grown,' Not Built

The Pope's Skepticism

Beyond Alignment: The Pope Calls for Laws, Not Ethics

Why an AI Company Was at the Vatican

The Debate Spills Online

Logicity's Take

Frequently Asked Questions

Need Help Implementing This?

Related Articles

ChatGPT in Corporate Communications: A $0 AI Detector Test

Bezos AI Lab Gets $10B: What Project Prometheus Means

Kimi K2.6 Open-Weight AI: 300 Agents at a Fraction of the Cost

AI Vendor Lock-In Risk: Anthropic Suspensions Hit Fintech

Also Read

OpenAI hires product lead to build ChatGPT for families

ICONIQ's 2026 AI report: 10 metrics SaaS builders need now

Cisco: UAE's AI boom demands security-first deployment