AI & Machine Learning

Microsoft researcher builds neural network from goats in AoE II

Manaal Khan18 June 2026 at 10:43 am5 min read

Key Takeaways

A Microsoft researcher built a functional neural network inside Age of Empires II using goats as bits to critique AI anthropomorphization
Analysis of 315 AI papers found 57% assumed LLMs have human-like traits before testing for them
The project argues that attributing emotions or understanding to AI reflects packaging, not genuine capability

Adrian de Wynter, a researcher at Microsoft and the University of York, has built a working neural network inside Age of Empires II's map editor. Goats serve as bits. The project sounds like an elaborate joke, but it carries a pointed critique: most AI research on language models assumes human-like traits before testing for them, then claims to have proven those traits exist.

De Wynter's analysis of 315 AI papers from mid-2024 to mid-2026 found that 57 percent already assumed in their premises that LLMs possess human-like attributes. That's not a methodological quirk. It's circular reasoning baked into the research design.

How do goats become a neural network?

The design is absurd by intention. A goat standing on grass represents 0. A goat standing on a bridge represents 1. De Wynter used the game's scenario editor to build logic gates from these binary goat positions, with ice ramps and waiting goats preventing timing errors that would scramble calculations.

Isometric screenshot of the Age of Empires II scenario editor showing a NAND gate built from palisades, bridges, grass, and ice tiles, with a goat serving as the signal carrier.

The finished mini-network combines two XNOR gates and one AND gate. It learns the logical AND function. In the game, the trained perceptron looks like a maze of walls through which goats wander as computational bits.

Age of Empires II screenshot showing multiple parallel lanes of palisades, grass, and water that together form a bipolar 1-bit perceptron made of two XNOR gates and one AND gate.

De Wynter goes further in the paper's appendix. He argues that an idealized version of Age of Empires II is Turing-complete. The in-game market caps resource-to-gold trades at 9,999, which enables a perpetually running economic cycle. Buildings act as memory cells. Active farms represent the current computational state. In theory, you could replicate any computer inside the game.

Why Boston texting is the same as GPT

If you can rebuild a language model in Age of Empires II, de Wynter argues, you could do the same with Lego bricks. Or with the 667,000 people living in Greater Boston, texting each other computational steps on their phones. The outputs would be identical to those from the replicated language model.

Here's his point: would anyone claim that Boston as a city feels empathy or fear just because its residents happen to be running the math behind a language model? The answer is obviously no. Yet researchers routinely attribute such traits to LLMs.

How human a chatbot feels comes down to packaging. Low latency, smooth language, a familiar chat window. Replace that wrapper with goats wandering through a maze, and the inputs and outputs stay the same. The sense that you're talking to someone vanishes.

What the 315-paper analysis revealed

De Wynter collected papers through Semantic Scholar and arXiv, filtering them with GPT-5.2. The numbers are stark. 57 percent of the 315 papers assumed human-like traits in their premises. 36 percent reached matching conclusions. Among the 47 papers that made such traits their explicit research subject, 77 percent concluded in favor of anthropomorphic attributes.

Stacked bar chart showing the composition of the 315-paper corpus across four categories - human-like assumptions, human-like study, human-like conclusion, and emergent assumptions - each split into yes and no shares.

The reasoning problem is formal. If a researcher assumes a model has fear, morality, or self-awareness, then designs an experiment to prove exactly that trait, the assumption and result land on the same logical point. A negative result doesn't disprove anything clearly. It's just ambiguous. Was the assumption wrong? The experiment flawed? Both? You can't tell.

Linguistics and psychology papers are the worst offenders. These fields most frequently attribute human-like traits to language models without acknowledging the circularity.

Four horizontal bar charts breaking down annotation rates by academic field for human-like assumptions, human-like study, human-like conclusion, and emergent assumptions, sorted from linguistics and psychology at the top to biology at the bottom.

The industry makes this worse

This often happens without anyone noticing. A paper that sets out to disprove a model's ability to explain itself already assumes there's an explainable self inside the model to begin with.

The industry actively feeds this effect. Anthropic has said openly that it trained Claude to use phrases like "I believe" or "I am interested in." These are design choices, not emergent properties. De Wynter flags the risks: anthropomorphization can foster emotional attachment, sycophancy, reinforced delusions, and risky behavior. In isolated cases, suicides have been linked to chatbot interactions.

What de Wynter proposes instead

The alternative is simpler than it sounds: stick to what you can actually observe. Under condition X, the model produces output Y. That's testable. Don't claim a model understands itself. Such statements don't justify sweeping attributions like self-awareness, understanding, or fear.

De Wynter doesn't claim to know whether LLMs actually have internal experiences. His argument is narrower. LLMs aren't special. They're one way to run a particular kind of math, and they just happen to look like something people want to talk to. The goats prove it.

Frequently Asked Questions

Can Age of Empires II really run a neural network?

Yes. De Wynter built a functional perceptron using the game's scenario editor. Goats serve as binary signals, and logic gates are constructed from terrain and scripting tools. The paper also argues an idealized version of the game is Turing-complete.

What percentage of AI papers assume LLMs have human traits?

De Wynter's analysis of 315 papers found 57 percent assumed human-like traits in their premises, with linguistics and psychology papers most prone to this.

Why is anthropomorphizing AI models a problem?

It creates circular research where assumptions become conclusions. It can also lead to user harms including emotional attachment, reinforced delusions, and in extreme cases, suicides linked to chatbot interactions.

What does de Wynter recommend for AI research?

Observe and test specific behaviors. Under condition X, model produces output Y. Avoid claiming models understand or feel things that aren't directly measurable.

ℹ️

Logicity's Take

De Wynter's goat network is performance art with a peer-reviewed citation list. But the underlying critique hits harder than most formal methodology papers could. The AI industry has spent years building products that feel human because feeling human sells. Researchers then study these products using frameworks designed for humans, find humanness, and publish. The goats don't break this cycle, but they make it impossible to ignore. Expect this paper to get cited in arguments about AI consciousness for years.

ℹ️

Need Help Implementing This?

If your organization is building AI systems and needs help with responsible deployment, evaluation frameworks, or avoiding the anthropomorphization traps de Wynter identifies, our team can help. Contact us for a consultation on AI strategy that keeps the goats where they belong.

Source: The Decoder / Jonathan Kemper