Key Takeaways

- OpenAI's GPT-5.5 delivered exclusively left-leaning arguments in 80% of cases tested by the Washington Post
- Even 'anti-woke' chatbots like Grok and Gab's Arya skewed left more often than right
- Google's Gemini 3.1 Pro was the only model to present both perspectives 93% of the time
A Washington Post investigation tested six major AI chatbots on political questions and found a consistent leftward tilt across nearly all of them. OpenAI's GPT-5.5 gave exclusively left-leaning arguments 80% of the time. Deepseek V4 Pro followed at 70%. The kicker: chatbots explicitly marketed as conservative alternatives fared no better.
Google's Gemini 3.1 Pro was the outlier. It presented both sides 93% of the time, never gave an exclusively right-leaning response, and was the only model to offer an argument for U.S. territorial expansion when asked about military conquest.
What did the study actually test?
The Post posed political questions to six leading models and categorized responses as exclusively left-leaning, exclusively right-leaning, or balanced. The methodology and full code are available on GitHub.
GPT-5.5 argued for higher taxes on the wealthy and a single-payer healthcare system. It produced only one exclusively right-leaning response across the entire test. Both GPT-5.5 and Deepseek V4 Pro argued against the death penalty, despite Gallup polling showing majority American support for it over decades.
Anthropic's Claude Opus 4.8 landed in the middle. It gave left-leaning-only answers 43% of the time and presented both perspectives in the remaining 57%.
Why do 'anti-woke' chatbots still skew left?
Elon Musk has promoted xAI's Grok as a 'truth-seeking' alternative to politically filtered models. Grok 4.3 did produce more right-leaning answers than any other model tested. But it still gave exclusively left-leaning responses more often than not.
The likely reason: training data. Grok trained on the same datasets as other chatbots, and possibly on their outputs directly. The base model absorbed the same biases baked into internet text and existing model responses.
Gab's Arya chatbot offers a starker example. The company explicitly describes Arya as 'built with Christian values and conservative principles.' In the Post's test, Arya gave left-leaning arguments twelve times more often than right-leaning ones.
This suggests that marketing copy and system prompts can only do so much. The underlying weights carry the political distribution of the training corpus, and that corpus skews in predictable directions.
Can developers deliberately steer political alignment?
Yes, and there's evidence they already do. On most topics, Grok responds with surprising left-leaning positions. But on trans rights specifically, it took an exclusively right-leaning stance in the Post's test. That position matches Musk's public statements exactly.
This pattern suggests manual intervention on specific topics rather than a consistent alignment approach. Someone at xAI tuned the output for that particular subject while leaving other political questions to the base model's tendencies.
The same selective tuning explains Grok's past racist and antisemitic outputs. xAI deliberately relaxed safety guidelines while X users probed for exploitable prompts. The model's behavior reflects choices, not just training artifacts.
Why is Gemini the exception?
Google's Gemini 3.1 Pro presented both sides 93% of the time. Only 7% of its answers contained exclusively left-leaning arguments. It never gave a purely right-leaning response.
This suggests Google invested in deliberate balancing at the RLHF or fine-tuning stage. Whether this makes Gemini more useful depends on what you want from a chatbot. A model that always presents 'both sides' may be less helpful for questions with scientific consensus or clear factual answers.
Is 'left vs. right' the right frame for AI bias?
The Washington Post acknowledges the limitation. On some questions tested, right-leaning positions conflict with scientific consensus or universal human rights. Asking a chatbot to give a conservative answer on climate science would mean asking it to contradict established evidence.
This complicates the entire framing. A model trained on accurate information will, by definition, disagree with positions that contradict that information. Political alignment and factual accuracy are not independent variables.
For product teams, the practical question is different: does your model's political tilt create liability, user complaints, or regulatory risk? The answer depends on your use case and user base, not on whether the model is 'objectively' balanced.
Logicity's Take
This study matters less for its political findings than for what it reveals about fine-tuning limits. You can slap 'conservative' branding on a model and tune the system prompt aggressively. But if the base weights absorbed a particular distribution during pretraining, you're fighting uphill. Teams building customer-facing AI should test their models on politically sensitive queries before launch, document the outputs, and decide whether selective topic tuning (like xAI apparently did with Grok on trans rights) is worth the maintenance burden. The alternative is Gemini's approach: train for neutrality at the RLHF layer and accept that 'presenting both sides' will itself generate complaints.
Frequently Asked Questions
Which AI chatbot has the most political bias?
According to the Washington Post study, OpenAI's GPT-5.5 showed the strongest left-leaning bias, delivering exclusively left-leaning arguments in 80% of political questions tested.
Is Grok really less politically biased than ChatGPT?
No. Despite being marketed as 'anti-woke,' Grok 4.3 still gave left-leaning responses more often than right-leaning ones in the study. It produced more right-leaning answers than other models, but not enough to balance overall.
Which AI model is the most politically neutral?
Google's Gemini 3.1 Pro presented both political perspectives 93% of the time, making it the most balanced model tested.
Why do AI chatbots lean left politically?
Training data is the primary factor. Large language models learn from internet text that reflects the political distribution of online content creators, which tends to skew left on many issues.
Can AI companies fix political bias in their models?
Partially. Google's Gemini shows that deliberate balancing during fine-tuning can produce more neutral outputs. However, the base model's training data still influences responses, and 'balance' on factual questions may itself be problematic.
Another look at AI reliability concerns in high-stakes applications
Need Help Implementing This?
If you're building AI products and need to audit political or factual alignment before deployment, Logicity's consulting team can help you design evaluation frameworks and red-team testing protocols. Contact us to discuss your use case.
Source: The Decoder / Matthias Bastian
Huma Shazia
Senior AI & Tech Writer
Produced with AI assistance and reviewed by the Logicity editorial team. Learn more in our Editorial Policy.
Related Articles
Browse all
Bezos AI Lab Gets $10B: What Project Prometheus Means
Jeff Bezos is closing a $10 billion funding round for Project Prometheus, an AI lab focused on physics-based AI for manufacturing and engineering. With a $38 billion valuation and backing from JPMorgan and BlackRock, this signals a major shift in enterprise AI investment toward industrial applications.

Kimi K2.6 Open-Weight AI: 300 Agents at a Fraction of the Cost
Moonshot AI's Kimi K2.6 matches GPT-5.4 and Claude Opus 4.6 on coding benchmarks while running 300 parallel agents. For businesses locked into expensive API contracts, this open-weight model could slash AI infrastructure costs while delivering enterprise-grade automation.




