When Language Models Cherry-Pick References: A Subtle but Serious Threat to Critical Discourse

Blog post description.

6/6/20252 min read

Large Language Models (LLMs) are revolutionizing the way we interact with information. But as we race to integrate them into academic, policy, and professional workflows, we must pause and ask: What kind of reasoning are we reinforcing?

Much attention has been paid to LLMs "hallucinating" citations — inventing sources that don’t exist. But there's another issue, more insidious because it appears correct on the surface: the cherry-picking of citations to support a pre-defined argument.

Prompted to Persuade, Not to Explore

The typical prompt — “Generate a sentence with references to support my argument” — is inherently biased. It instructs the model not to investigate a question, but to validate a position. As a result, the model surfaces only the references and evidence that align with the given conclusion, while ignoring counterarguments, methodological critiques, or unresolved debates in the literature.

This practice, even when technically “correct,” undermines the purpose of research and deliberation. It replaces critical engagement with rhetorical convenience.

LLMs, unless explicitly designed otherwise, are tools of fluency — not scrutiny.

Cherry-Picking Isn't New — But It's Easier Than Ever

Cherry-picking is a well-known bias in academic and policy writing. But what makes it more dangerous in the LLM era is automation at scale:

A student can instantly generate multiple references that appear to back a weak thesis.
A consultant can draft a report that favors a client’s position, skipping inconvenient evidence.
A policymaker may be fed one-sided summaries that reinforce existing bias loops.

The danger is not fabrication — it is false completeness. Readers may assume that the LLM has “read everything” and synthesized a balanced view. In reality, it has only reinforced the narrow frame of the prompt.

What Should Responsible Use Look Like?

If we want to maintain the epistemic integrity of our institutions, we must change how we use these tools. Here are a few principles to consider:

Ask for tension, not consensus. Prompt models to show multiple sides of an issue. For example: “What are the competing perspectives on X, and what are their evidentiary bases?”
Use retrieval-based systems. When references matter, use retrieval-augmented generation (RAG) or LLMs fine-tuned with citation integrity layers.
Design prompts for critical review. Instead of “Support this argument,” try: “What assumptions underpin this claim, and what sources challenge it?”
Remain human-in-the-loop. Always treat LLM-generated outputs as a starting point, not a final conclusion.

What We Lose Without Disagreement

Critical thinking requires conflict. Research progresses through disagreement, revision, and debate. If LLMs are used primarily to confirm rather than confront, we risk building echo chambers with the gloss of academic polish.

The future of responsible AI is not just about making models more powerful — it is about how we shape our prompts, our expectations, and our epistemologies.

When Language Models Cherry-Pick References: A Subtle but Serious Threat to Critical Discourse

Prompted to Persuade, Not to Explore

Cherry-Picking Isn't New — But It's Easier Than Ever

What Should Responsible Use Look Like?

What We Lose Without Disagreement

Arash Hajikhani, PhD