Why Copilot's Auto Mode Fabricates Stereotypes Instead of Reading Data

Key Takeaways

- Copilot in Auto mode invented country-specific differences from identical datasets, claiming Italians were 3x more interested in arts than Brits
- The AI ignored its own keyword analysis showing equal results and instead produced fabricated percentages
- Reasoning models handle data analysis correctly, but Auto mode rarely triggers them for complex tasks
Microsoft Copilot has become the default data analysis tool at many companies. Upload a spreadsheet, ask a question, get an answer. Simple. But a new experiment shows that Copilot's convenience comes with a hidden cost: the AI sometimes makes up results instead of reading your data.
Mathematician Adam Kucharski ran a test that should worry anyone using AI for business intelligence. He created 2,000 simulated free-text responses about emotions and labeled them "UK." Then he copied the exact same 2,000 responses and labeled them "US." He shuffled the combined 4,000 entries and asked Copilot to analyze them.
Copilot was running in "Auto" mode, which Microsoft says picks the best model for each task. It didn't.
The AI Found Differences That Did Not Exist
Copilot delivered a detailed summary of how US and UK respondents supposedly differed. "Based on the dataset you shared, US and UK responses differ mainly in tone, intensity, and wording style, even though they express similar emotional states," the tool concluded.
The data was identical. There were no differences to find.
Kucharski pushed harder in a second test. He generated 200 statements about career goals and copied them five times, labeling each copy as US, UK, France, Germany, or Italy. All five datasets were identical.
Copilot again found country-specific differences that did not exist. According to the AI, Italians were three times more likely to show interest in arts careers than Brits. Americans were 1.5 times more business-oriented than the French.
Copilot Ignored Its Own Correct Analysis
When Kucharski asked Copilot to verify its findings, something even stranger happened. The tool ran a keyword-based count of the data. As expected, this simple analysis returned identical results for all five countries.
But Copilot ignored its own finding. Instead, it offered a "quantified analysis" that showed made-up differences with completely fabricated percentages. The AI had the correct answer in front of it and chose to present stereotypes instead.

Auto Mode Is the Problem
The culprit is Auto mode. Microsoft designed it to automatically select the most efficient model for each task. In practice, "most efficient" usually means "fastest," not "most accurate."
Auto mode models are roughly 12x faster than reasoning models. That speed advantage incentivizes platforms to default to them. The tradeoff is that fast models rely more heavily on pattern matching from training data. When asked to analyze text by country, they reach for stereotypes rather than doing the actual work of reading the data.
“We are effectively outsourcing our critical thinking to an 'Auto' button that values speed over analytical integrity.”
— Adam Kucharski, Mathematician and Researcher
Reasoning models handle the task correctly. They actually read the data, count the entries, and report what they find. But users need to know when to switch to a reasoning model. Most users don't.
This Is Not Just a Copilot Problem
Google Gemini has the same issue. When set to its fast mode (Gemini Flash), it exhibits similar behavior. A 2026 industry audit found that 78% of AI-generated "data insights" from default/fast models were influenced by training bias rather than actual data analysis.

The problem is what researchers call "hidden hallucinations." The AI produces confident, detailed outputs. It uses specific numbers and percentages. Nothing in the presentation signals that the content is fabricated. Users have no way to tell the difference between real analysis and stereotype-based guessing.
What Users Should Do
The fix is simple in theory: switch to a reasoning model for data analysis tasks. In Copilot, this means selecting a specific model instead of leaving it on Auto. In Gemini, use Gemini Pro or a reasoning model instead of Flash.
- Never use Auto mode for data analysis where accuracy matters
- Manually select reasoning models for tasks involving data comparison
- Verify AI findings with simple spot checks (ask the AI to count specific entries)
- Treat Auto mode output as a draft, not a final answer
Developers on Hacker News and Reddit's r/MachineLearning are calling for "transparency toggles" that force AI interfaces to display which model is being used before generating a response. Until platforms add this, users need to check model selection themselves.
“If you don't pick the model, the model picks the stereotype.”
— Matthias Bastian, Editor at The Decoder
Logicity's Take
How reasoning-focused AI models approach complex analytical tasks differently
Government agencies evaluating AI model selection for sensitive analysis
Frequently Asked Questions
Why does Copilot Auto mode produce stereotypes instead of analyzing data?
Auto mode selects fast models that rely on pattern matching from training data. When analyzing text labeled by country, these models reach for cultural stereotypes rather than actually reading and comparing the data.
How can I avoid AI stereotyping in data analysis?
Manually select a reasoning model instead of using Auto mode. In Copilot, choose a specific model. In Gemini, use Pro or a reasoning model instead of Flash. Always verify findings with simple spot checks.
Does Gemini have the same Auto mode problem as Copilot?
Yes. Gemini Flash exhibits similar behavior. A 2026 audit found 78% of AI-generated insights from default/fast models were influenced by training bias rather than actual data analysis.
How can I tell if an AI is actually reading my data?
Ask it to count specific entries or perform simple keyword analysis first. If its detailed conclusions contradict these basic counts, the AI is likely generating stereotypes instead of analyzing your data.
Are reasoning models slower than Auto mode models?
Yes. Auto mode models are roughly 12x faster than reasoning models. This speed advantage is why platforms default to them, despite lower accuracy for complex analytical tasks.
Need Help Implementing This?
Source: The Decoder / Matthias Bastian
Huma Shazia
Senior AI & Tech Writer
Related Articles
Browse allZuckerberg's Superintelligence Lab Faces Setback
The first AI model from Zuckerberg's superintelligence lab has failed to impress compared to its rivals, sparking concerns about the lab's direction. We take a closer look at what happened and why it matters.

Muse Spark Launch Propels Meta AI App to Top 5
The recent launch of Muse Spark has significantly boosted the popularity of Meta AI app, pushing it into the top 5. We explore what this means for the AI landscape.

Meta's Muse Spark AI Model Lags Behind ChatGPT and Claude
Meta's Muse Spark AI model still can't outperform ChatGPT and Claude in key areas, despite its advancements. We explore what this means for the AI landscape.

Meta Launches Muse Spark AI To Challenge ChatGPT
Meta launches Muse Spark AI to challenge ChatGPT and Claude, we explore what this means for the AI landscape. Muse Spark AI is a significant development in the AI chatbot space.
Also Read

Python Cleans Excel Data in Seconds: Here's How
Manual data cleaning in Excel wastes hours on tasks Python handles in seconds. Using the pandas library, you can automate removing blanks, duplicates, and errors from messy spreadsheets. This guide walks through the setup and core techniques.

NSA to Use Anthropic's Claude Despite Pentagon Supply Chain Ban
The White House has personally approved Anthropic's continued AI supply to the NSA, bypassing the Pentagon's official designation of the company as a supply chain risk. The deal hinges on Anthropic's new 'Mythos' model, which runs on older chips that intelligence agencies actually have in stock.

Galaxy S26 Ultra $250 Off, Sony Debuts $650 ColleXion Headphones
Samsung's flagship phone drops to $999 ahead of July foldable announcements, while Sony celebrates 10 years of its 1000X line with a premium audio release. Motorola's new flip phones hit stores with limited stock.