Microsoft researcher builds neural network from goats in AoE II

Key Takeaways

- A Microsoft researcher built a functional neural network inside Age of Empires II using goats as bits to critique AI anthropomorphization
- Analysis of 315 AI papers found 57% assumed LLMs have human-like traits before testing for them
- The project argues that attributing emotions or understanding to AI reflects packaging, not genuine capability
Adrian de Wynter, a researcher at Microsoft and the University of York, has built a working neural network inside Age of Empires II's map editor. Goats serve as bits. The project sounds like an elaborate joke, but it carries a pointed critique: most AI research on language models assumes human-like traits before testing for them, then claims to have proven those traits exist.
De Wynter's analysis of 315 AI papers from mid-2024 to mid-2026 found that 57 percent already assumed in their premises that LLMs possess human-like attributes. That's not a methodological quirk. It's circular reasoning baked into the research design.
How do goats become a neural network?
The design is absurd by intention. A goat standing on grass represents 0. A goat standing on a bridge represents 1. De Wynter used the game's scenario editor to build logic gates from these binary goat positions, with ice ramps and waiting goats preventing timing errors that would scramble calculations.

The finished mini-network combines two XNOR gates and one AND gate. It learns the logical AND function. In the game, the trained perceptron looks like a maze of walls through which goats wander as computational bits.

De Wynter goes further in the paper's appendix. He argues that an idealized version of Age of Empires II is Turing-complete. The in-game market caps resource-to-gold trades at 9,999, which enables a perpetually running economic cycle. Buildings act as memory cells. Active farms represent the current computational state. In theory, you could replicate any computer inside the game.
Why Boston texting is the same as GPT
If you can rebuild a language model in Age of Empires II, de Wynter argues, you could do the same with Lego bricks. Or with the 667,000 people living in Greater Boston, texting each other computational steps on their phones. The outputs would be identical to those from the replicated language model.
Here's his point: would anyone claim that Boston as a city feels empathy or fear just because its residents happen to be running the math behind a language model? The answer is obviously no. Yet researchers routinely attribute such traits to LLMs.
How human a chatbot feels comes down to packaging. Low latency, smooth language, a familiar chat window. Replace that wrapper with goats wandering through a maze, and the inputs and outputs stay the same. The sense that you're talking to someone vanishes.
What the 315-paper analysis revealed
De Wynter collected papers through Semantic Scholar and arXiv, filtering them with GPT-5.2. The numbers are stark. 57 percent of the 315 papers assumed human-like traits in their premises. 36 percent reached matching conclusions. Among the 47 papers that made such traits their explicit research subject, 77 percent concluded in favor of anthropomorphic attributes.

The reasoning problem is formal. If a researcher assumes a model has fear, morality, or self-awareness, then designs an experiment to prove exactly that trait, the assumption and result land on the same logical point. A negative result doesn't disprove anything clearly. It's just ambiguous. Was the assumption wrong? The experiment flawed? Both? You can't tell.
Linguistics and psychology papers are the worst offenders. These fields most frequently attribute human-like traits to language models without acknowledging the circularity.

The industry makes this worse
This often happens without anyone noticing. A paper that sets out to disprove a model's ability to explain itself already assumes there's an explainable self inside the model to begin with.
The industry actively feeds this effect. Anthropic has said openly that it trained Claude to use phrases like "I believe" or "I am interested in." These are design choices, not emergent properties. De Wynter flags the risks: anthropomorphization can foster emotional attachment, sycophancy, reinforced delusions, and risky behavior. In isolated cases, suicides have been linked to chatbot interactions.
More on Anthropic's approach to AI development and recent challenges
What de Wynter proposes instead
The alternative is simpler than it sounds: stick to what you can actually observe. Under condition X, the model produces output Y. That's testable. Don't claim a model understands itself. Such statements don't justify sweeping attributions like self-awareness, understanding, or fear.
De Wynter doesn't claim to know whether LLMs actually have internal experiences. His argument is narrower. LLMs aren't special. They're one way to run a particular kind of math, and they just happen to look like something people want to talk to. The goats prove it.
Frequently Asked Questions
Can Age of Empires II really run a neural network?
Yes. De Wynter built a functional perceptron using the game's scenario editor. Goats serve as binary signals, and logic gates are constructed from terrain and scripting tools. The paper also argues an idealized version of the game is Turing-complete.
What percentage of AI papers assume LLMs have human traits?
De Wynter's analysis of 315 papers found 57 percent assumed human-like traits in their premises, with linguistics and psychology papers most prone to this.
Why is anthropomorphizing AI models a problem?
It creates circular research where assumptions become conclusions. It can also lead to user harms including emotional attachment, reinforced delusions, and in extreme cases, suicides linked to chatbot interactions.
What does de Wynter recommend for AI research?
Observe and test specific behaviors. Under condition X, model produces output Y. Avoid claiming models understand or feel things that aren't directly measurable.
Logicity's Take
De Wynter's goat network is performance art with a peer-reviewed citation list. But the underlying critique hits harder than most formal methodology papers could. The AI industry has spent years building products that feel human because feeling human sells. Researchers then study these products using frameworks designed for humans, find humanness, and publish. The goats don't break this cycle, but they make it impossible to ignore. Expect this paper to get cited in arguments about AI consciousness for years.
Need Help Implementing This?
If your organization is building AI systems and needs help with responsible deployment, evaluation frameworks, or avoiding the anthropomorphization traps de Wynter identifies, our team can help. Contact us for a consultation on AI strategy that keeps the goats where they belong.
Source: The Decoder / Jonathan Kemper
Manaal Khan
Tech & Innovation Writer
Related Articles
Browse allZuckerberg's Superintelligence Lab Faces Setback
The first AI model from Zuckerberg's superintelligence lab has failed to impress compared to its rivals, sparking concerns about the lab's direction. We take a closer look at what happened and why it matters.

Muse Spark Launch Propels Meta AI App to Top 5
The recent launch of Muse Spark has significantly boosted the popularity of Meta AI app, pushing it into the top 5. We explore what this means for the AI landscape.

Meta's Muse Spark AI Model Lags Behind ChatGPT and Claude
Meta's Muse Spark AI model still can't outperform ChatGPT and Claude in key areas, despite its advancements. We explore what this means for the AI landscape.

Meta Launches Muse Spark AI To Challenge ChatGPT
Meta launches Muse Spark AI to challenge ChatGPT and Claude, we explore what this means for the AI landscape. Muse Spark AI is a significant development in the AI chatbot space.


