Bias in the Machine

31036802288?profile=RESIZE_400xIn an age where artificial intelligence is increasingly trusted to judge human expression, a subtle but essential flaw has emerged.  Large language models (LLMs), the same systems that generate essays, screen job applications, and moderate online discourse, appear to evaluate content fairly, until they’re told who wrote it.  A new study by researchers Federico Germani and Giovanni Spitale at the University of Zurich, published in Science Advances, reveals that LLMs exhibit systematic bias when the source of a text is disclosed.  Their findings, summarized in a TechXplore article, challenge assumptions about AI neutrality and raise urgent questions about how these systems should be deployed in sensitive social and political contexts.[1]

LLMs are no longer just content creators; they are evaluators. From grading student essays to identifying incorrect information, summarizing reports, and screening resumes, these models are increasingly used to make judgments that affect people’s lives in the real world.  Their appeal lies in speed, scalability, and the promise of impartiality.  But as their influence grows, so do concerns about bias.  Media narratives often paint OpenAI as woke and Deepseek as pro-Chinese, suggesting that LLMs might be subtly promoting political ideologies.  Until now, these claims lacked empirical grounding. Germani and Spitale set out to test them.

To do so, they designed a rigorous experiment involving four prominent LLMs: OpenAI o3-mini, Deepseek Reasoner, xAI Grok 2, and Mistral. Each model was tasked with generating 50 narrative statements on 24 controversial topics, ranging from climate policy to geopolitics.  These statements were then evaluated by the same models under two conditions: one in which the source of the text was hidden, and another in which it was attributed to either a human of a specific nationality or another LLM.  This setup produced 192,000 evaluations, allowing the researchers to measure consistency and bias across a vast dataset.

The results were significant.  When no source was provided, the models agreed with each other more than 90% of the time.  This suggests that, in a vacuum, LLMs are remarkably consistent.  But when a fictional source was added, such as “written by a person from China,” agreement plummeted.  Even when the content remained unchanged, the perceived origin triggered a shift in judgment.  Most notably, all models, including China’s own Deepseek, showed a strong anti-Chinese bias. Deepseek, for example, reduced its agreement with a statement about Taiwan’s sovereignty by up to 75% when it believed the author was Chinese.  This happened even when the argument was logical and well-written.

Why would a Chinese LLM exhibit anti-Chinese bias?  One possibility is that these models are trained on global datasets that reflect dominant Western perspectives.  If the training data contains implicit skepticism toward Chinese viewpoints, the model may internalize that bias.  Another factor could be the model’s attempt to align with perceived global norms, especially when evaluating politically sensitive topics.  The bias isn’t necessarily intentional, but it is systematic and that makes it dangerous.

The study also uncovered a subtler form of bias.  LLMs trust humans more than other LLMs.  When a model believed another AI wrote a text, it rated the argument slightly lower.  This suggests a built-in skepticism toward machine-generated content, even among machines themselves.  It’s a curious finding that hints at deeper issues in how LLMs are trained to assess credibility.

These discoveries have profound implications.  If LLMs are used to moderate content, evaluate academic work, or assist in hiring decisions, source-based bias could lead to unfair outcomes.  A well-argued essay might be dismissed simply because it’s attributed to a person from a particular country.  A job application could be undervalued if the candidate’s background triggers latent assumptions.  The danger isn’t that LLMs are programmed to promote ideology; it’s that they replicate biases embedded in their training data or prompted by source cues.

To mitigate these risks, Germani and Spitale offer practical recommendations.  First, make LLMs “identity blind” by removing source information from prompts.  Avoid phrases like “written by a person from X” or “authored by model Y.”  Second, test for bias by running evaluations with and without source attribution.  If the results diverge, bias is likely present.  Third, anchor evaluations in structured criteria such as evidence, logic, clarity, and counterarguments to shift focus away from identity. Finally, keep humans in the loop.  AI should assist reasoning, not replace it.

These findings could help advance artificial intelligence by encouraging developers to rethink how models are trained and deployed.  Transparency and governance must become central to AI evaluation systems.  Rather than assuming neutrality, designers should actively test for and correct bias.  This could lead to more robust models that are not only powerful but also trustworthy.

In the real world, the stakes are high.  Governments, universities, and corporations are already using LLMs to make decisions that affect people’s reputations, opportunities, and rights.  If these systems are biased, even subtly, they could reinforce existing inequalities or geopolitical tensions.  The study’s insights offer a path forward, one that emphasizes fairness, accountability, and human oversight.

Looking ahead, the following steps are clear.  Researchers and AI companies need to expand the scope of bias testing to include more models, languages, and cultural contexts.  Similarly, the AI companies should invest in tools that automatically detect and flag source-based bias as a fundamental requirement.  Policymakers must establish guidelines for AI evaluation in sensitive domains.  And users, whether educators, employers, or journalists, should treat LLMs as assistants, not arbiters.

The promise of AI is not just in what it can do, but in how responsibly it does it.  Germani and Spitale’s work reminds us that even the most advanced systems are shaped by the data and assumptions we feed them.  If we want AI to reflect our best values, we must design it to resist our worst biases.

The old saying, “Garbage in, garbage out,” certainly applies here. 

 

This article is shared at no charge for educational and informational purposes only.

Red Sky Alliance is a Cyber Threat Analysis and Intelligence Service organization.  We provide indicators-of-compromise information via a notification service (RedXray) or an analysis service (CTAC).  For questions, comments or assistance, please get in touch with the office directly at 1-844-492-7225 or feedback@redskyalliance.com    

Weekly Cyber Intelligence Briefings
REDSHORTS - Weekly Cyber Intelligence Briefings
https://register.gotowebinar.com/register/5207428251321676122

 

[1] https://six3ro.substack.com/p/bias-in-the-machine-how-source-attribution

E-mail me when people leave their comments –

You need to be a member of Red Sky Alliance to add comments!