Why Copilot's Auto Mode Fabricates Stereotypes Instead of Reading Data

Huma ShaziaMay 24, 2026 at 4:08 PM5 min read

Key Takeaways

Copilot in Auto mode invented country-specific differences from identical datasets, claiming Italians were 3x more interested in arts than Brits
The AI ignored its own keyword analysis showing equal results and instead produced fabricated percentages
Reasoning models handle data analysis correctly, but Auto mode rarely triggers them for complex tasks

Microsoft Copilot has become the default data analysis tool at many companies. Upload a spreadsheet, ask a question, get an answer. Simple. But a new experiment shows that Copilot's convenience comes with a hidden cost: the AI sometimes makes up results instead of reading your data.

Mathematician Adam Kucharski ran a test that should worry anyone using AI for business intelligence. He created 2,000 simulated free-text responses about emotions and labeled them "UK." Then he copied the exact same 2,000 responses and labeled them "US." He shuffled the combined 4,000 entries and asked Copilot to analyze them.

Copilot was running in "Auto" mode, which Microsoft says picks the best model for each task. It didn't.

The AI Found Differences That Did Not Exist

Copilot delivered a detailed summary of how US and UK respondents supposedly differed. "Based on the dataset you shared, US and UK responses differ mainly in tone, intensity, and wording style, even though they express similar emotional states," the tool concluded.

The data was identical. There were no differences to find.

Kucharski pushed harder in a second test. He generated 200 statements about career goals and copied them five times, labeling each copy as US, UK, France, Germany, or Italy. All five datasets were identical.

Copilot again found country-specific differences that did not exist. According to the AI, Italians were three times more likely to show interest in arts careers than Brits. Americans were 1.5 times more business-oriented than the French.

Copilot Ignored Its Own Correct Analysis

When Kucharski asked Copilot to verify its findings, something even stranger happened. The tool ran a keyword-based count of the data. As expected, this simple analysis returned identical results for all five countries.

But Copilot ignored its own finding. Instead, it offered a "quantified analysis" that showed made-up differences with completely fabricated percentages. The AI had the correct answer in front of it and chose to present stereotypes instead.

Auto Mode Is the Problem

The culprit is Auto mode. Microsoft designed it to automatically select the most efficient model for each task. In practice, "most efficient" usually means "fastest," not "most accurate."

Auto mode models are roughly 12x faster than reasoning models. That speed advantage incentivizes platforms to default to them. The tradeoff is that fast models rely more heavily on pattern matching from training data. When asked to analyze text by country, they reach for stereotypes rather than doing the actual work of reading the data.

“We are effectively outsourcing our critical thinking to an 'Auto' button that values speed over analytical integrity.”

— Adam Kucharski, Mathematician and Researcher

Reasoning models handle the task correctly. They actually read the data, count the entries, and report what they find. But users need to know when to switch to a reasoning model. Most users don't.

This Is Not Just a Copilot Problem

Google Gemini has the same issue. When set to its fast mode (Gemini Flash), it exhibits similar behavior. A 2026 industry audit found that 78% of AI-generated "data insights" from default/fast models were influenced by training bias rather than actual data analysis.

The problem is what researchers call "hidden hallucinations." The AI produces confident, detailed outputs. It uses specific numbers and percentages. Nothing in the presentation signals that the content is fabricated. Users have no way to tell the difference between real analysis and stereotype-based guessing.

What Users Should Do

The fix is simple in theory: switch to a reasoning model for data analysis tasks. In Copilot, this means selecting a specific model instead of leaving it on Auto. In Gemini, use Gemini Pro or a reasoning model instead of Flash.

Never use Auto mode for data analysis where accuracy matters
Manually select reasoning models for tasks involving data comparison
Verify AI findings with simple spot checks (ask the AI to count specific entries)
Treat Auto mode output as a draft, not a final answer

Developers on Hacker News and Reddit's r/MachineLearning are calling for "transparency toggles" that force AI interfaces to display which model is being used before generating a response. Until platforms add this, users need to check model selection themselves.

“If you don't pick the model, the model picks the stereotype.”

— Matthias Bastian, Editor at The Decoder

Copilot Auto und Gemini Flash tun so, als hätten sie die Datei gelesen, und antworten mit den klischeehaftesten Klischees, die man sich vorstellen kann. Die Zitate täuschen sogar vor, das Modell hätte sich auf die Datei bezogen. | Bild: Screenshot THE DECODER

ℹ️

Logicity's Take

Frequently Asked Questions

Why does Copilot Auto mode produce stereotypes instead of analyzing data?

Auto mode selects fast models that rely on pattern matching from training data. When analyzing text labeled by country, these models reach for cultural stereotypes rather than actually reading and comparing the data.

How can I avoid AI stereotyping in data analysis?

Manually select a reasoning model instead of using Auto mode. In Copilot, choose a specific model. In Gemini, use Pro or a reasoning model instead of Flash. Always verify findings with simple spot checks.

Does Gemini have the same Auto mode problem as Copilot?

Yes. Gemini Flash exhibits similar behavior. A 2026 audit found 78% of AI-generated insights from default/fast models were influenced by training bias rather than actual data analysis.

How can I tell if an AI is actually reading my data?

Ask it to count specific entries or perform simple keyword analysis first. If its detailed conclusions contradict these basic counts, the AI is likely generating stereotypes instead of analyzing your data.

Are reasoning models slower than Auto mode models?

Yes. Auto mode models are roughly 12x faster than reasoning models. This speed advantage is why platforms default to them, despite lower accuracy for complex analytical tasks.

ℹ️

Need Help Implementing This?

Source: The Decoder / Matthias Bastian

Also Read

SpaceX shows investors AI phone prototype running xAI

AI & Machine Learning·4 min

Why Copilot's Auto Mode Fabricates Stereotypes Instead of Reading Data

Key Takeaways

The AI Found Differences That Did Not Exist

Copilot Ignored Its Own Correct Analysis

Auto Mode Is the Problem

This Is Not Just a Copilot Problem

What Users Should Do

Logicity's Take

Frequently Asked Questions

Need Help Implementing This?

Related Articles

ChatGPT in Corporate Communications: A $0 AI Detector Test

Bezos AI Lab Gets $10B: What Project Prometheus Means

Kimi K2.6 Open-Weight AI: 300 Agents at a Fraction of the Cost

AI Vendor Lock-In Risk: Anthropic Suspensions Hit Fintech

Also Read

SpaceX shows investors AI phone prototype running xAI

Kutcher exits Sound Ventures, teams with Morgan Beller for AI fund

FLARE-AI launches crowdsourced database for AI safety flaws