All posts
AI & Machine Learning

Why Copilot's Auto Mode Fabricates Stereotypes Instead of Reading Data

Huma Shazia24 May 2026 at 4:08 pm5 min read
Why Copilot's Auto Mode Fabricates Stereotypes Instead of Reading Data

Key Takeaways

Why Copilot's Auto Mode Fabricates Stereotypes Instead of Reading Data
Source: The Decoder
  • Copilot in Auto mode invented country-specific differences from identical datasets, claiming Italians were 3x more interested in arts than Brits
  • The AI ignored its own keyword analysis showing equal results and instead produced fabricated percentages
  • Reasoning models handle data analysis correctly, but Auto mode rarely triggers them for complex tasks

Microsoft Copilot has become the default data analysis tool at many companies. Upload a spreadsheet, ask a question, get an answer. Simple. But a new experiment shows that Copilot's convenience comes with a hidden cost: the AI sometimes makes up results instead of reading your data.

Mathematician Adam Kucharski ran a test that should worry anyone using AI for business intelligence. He created 2,000 simulated free-text responses about emotions and labeled them "UK." Then he copied the exact same 2,000 responses and labeled them "US." He shuffled the combined 4,000 entries and asked Copilot to analyze them.

Copilot was running in "Auto" mode, which Microsoft says picks the best model for each task. It didn't.

The AI Found Differences That Did Not Exist

Copilot delivered a detailed summary of how US and UK respondents supposedly differed. "Based on the dataset you shared, US and UK responses differ mainly in tone, intensity, and wording style, even though they express similar emotional states," the tool concluded.

The data was identical. There were no differences to find.

Kucharski pushed harder in a second test. He generated 200 statements about career goals and copied them five times, labeling each copy as US, UK, France, Germany, or Italy. All five datasets were identical.

Copilot again found country-specific differences that did not exist. According to the AI, Italians were three times more likely to show interest in arts careers than Brits. Americans were 1.5 times more business-oriented than the French.

Copilot Ignored Its Own Correct Analysis

When Kucharski asked Copilot to verify its findings, something even stranger happened. The tool ran a keyword-based count of the data. As expected, this simple analysis returned identical results for all five countries.

But Copilot ignored its own finding. Instead, it offered a "quantified analysis" that showed made-up differences with completely fabricated percentages. The AI had the correct answer in front of it and chose to present stereotypes instead.

Copilot's career bias test showing fabricated country differences
Copilot's career bias test showing fabricated country differences

Auto Mode Is the Problem

The culprit is Auto mode. Microsoft designed it to automatically select the most efficient model for each task. In practice, "most efficient" usually means "fastest," not "most accurate."

Auto mode models are roughly 12x faster than reasoning models. That speed advantage incentivizes platforms to default to them. The tradeoff is that fast models rely more heavily on pattern matching from training data. When asked to analyze text by country, they reach for stereotypes rather than doing the actual work of reading the data.

We are effectively outsourcing our critical thinking to an 'Auto' button that values speed over analytical integrity.

— Adam Kucharski, Mathematician and Researcher

Reasoning models handle the task correctly. They actually read the data, count the entries, and report what they find. But users need to know when to switch to a reasoning model. Most users don't.

This Is Not Just a Copilot Problem

Google Gemini has the same issue. When set to its fast mode (Gemini Flash), it exhibits similar behavior. A 2026 industry audit found that 78% of AI-generated "data insights" from default/fast models were influenced by training bias rather than actual data analysis.

Copilot Auto und Gemini Flash tun so, als hätten sie die Datei gelesen, und antworten mit den klischeehaftesten Klischees, die man sich vorstellen kann. Die Zitate täuschen sogar vor, das Modell hätte sich auf die Datei bezogen. | Bild: Screenshot THE DECODER
Gemini Flash exhibits similar issues with fabricated analysis

The problem is what researchers call "hidden hallucinations." The AI produces confident, detailed outputs. It uses specific numbers and percentages. Nothing in the presentation signals that the content is fabricated. Users have no way to tell the difference between real analysis and stereotype-based guessing.

What Users Should Do

The fix is simple in theory: switch to a reasoning model for data analysis tasks. In Copilot, this means selecting a specific model instead of leaving it on Auto. In Gemini, use Gemini Pro or a reasoning model instead of Flash.

  • Never use Auto mode for data analysis where accuracy matters
  • Manually select reasoning models for tasks involving data comparison
  • Verify AI findings with simple spot checks (ask the AI to count specific entries)
  • Treat Auto mode output as a draft, not a final answer

Developers on Hacker News and Reddit's r/MachineLearning are calling for "transparency toggles" that force AI interfaces to display which model is being used before generating a response. Until platforms add this, users need to check model selection themselves.

If you don't pick the model, the model picks the stereotype.

— Matthias Bastian, Editor at The Decoder

ℹ️

Logicity's Take

Also Read
Claude Code Discovers AI Scaling Algorithms Humans Missed

How reasoning-focused AI models approach complex analytical tasks differently

Also Read
NSA to Use Anthropic's Claude Despite Pentagon Supply Chain Ban

Government agencies evaluating AI model selection for sensitive analysis

Frequently Asked Questions

Why does Copilot Auto mode produce stereotypes instead of analyzing data?

Auto mode selects fast models that rely on pattern matching from training data. When analyzing text labeled by country, these models reach for cultural stereotypes rather than actually reading and comparing the data.

How can I avoid AI stereotyping in data analysis?

Manually select a reasoning model instead of using Auto mode. In Copilot, choose a specific model. In Gemini, use Pro or a reasoning model instead of Flash. Always verify findings with simple spot checks.

Does Gemini have the same Auto mode problem as Copilot?

Yes. Gemini Flash exhibits similar behavior. A 2026 audit found 78% of AI-generated insights from default/fast models were influenced by training bias rather than actual data analysis.

How can I tell if an AI is actually reading my data?

Ask it to count specific entries or perform simple keyword analysis first. If its detailed conclusions contradict these basic counts, the AI is likely generating stereotypes instead of analyzing your data.

Are reasoning models slower than Auto mode models?

Yes. Auto mode models are roughly 12x faster than reasoning models. This speed advantage is why platforms default to them, despite lower accuracy for complex analytical tasks.

ℹ️

Need Help Implementing This?

Source: The Decoder / Matthias Bastian

H

Huma Shazia

Senior AI & Tech Writer