Research - External

When the Medium Inflates the Message: LLMs and the Interpretation of Scientific Claims

A new study from Peters and colleagues offers empirical grounding for a pattern that observers of human-AI interaction have noted anecdotally: users who rely on LLMs for scientific information often emerge with more confidence in those claims than seems warranted. The research suggests this may not be purely a matter of user psychology—the systems themselves appear to interpret scientific statements more expansively than human experts do.

Methodology

The study compared how laypeople, scientists (primarily from psychology and biomedicine), and two leading LLMs (ChatGPT-5 and DeepSeek-V3.1) interpreted scientific conclusions presented in different linguistic frames: bare generics (“statins reduce cardiovascular events”), past tense claims (“statins reduced cardiovascular events”), and hedged formulations (“the study suggests statins reduce cardiovascular events”).

The researchers presented 18 research conclusions drawn from recent psychology and biomedical literature. Participants rated each on generalizability (how broadly the finding applies), credibility, and impact (likelihood of engaging further with the information). Human participants included 192 laypeople and 240 experts across disciplines. LLM responses were collected via web interfaces rather than APIs, approximating typical user interactions.

Key Findings

The findings reveal a consistent hierarchy of interpretation. Compared to laypeople, psychologists and biomedical researchers rated scientific conclusions—particularly generics—as less generalizable and less credible. This aligns with what the authors term an “epistemic vigilance” account: domain experts bring heightened sensitivity to methodological limitations and generalization risks within their fields.

More striking was the LLM pattern. Both ChatGPT-5 and DeepSeek rated scientific conclusions substantially higher than laypeople across all dimensions—generalizability, credibility, and impact. Where experts interpreted narrowly and laypeople interpreted moderately, LLMs interpreted most expansively of all. In qualitative responses about preferred framing for scientific findings, 64% of human participants favored hedged language (“the study suggests…”), viewing bare generics as overconfident. ChatGPT-5, by contrast, preferred bare generics 52% of the time, citing clarity and accessibility for non-expert audiences.

Limitations

The authors note several limitations. Conclusions were presented without surrounding context, which may not reflect how claims appear in full articles or abstracts. The study focused on two fields (psychology and biomedicine), and prior familiarity with specific topics was not measured. LLM responses were collected via user interfaces rather than APIs, introducing potential variability from undisclosed system parameters.

MPRG Perspective

This research illuminates a pathway through which AI-mediated information exchange may shape human cognition in ways that extend beyond vocabulary borrowing or rhetorical mimicry. When a user queries an LLM about a scientific topic, they receive not just information but an interpretation—one that, according to these findings, systematically inflates how broadly that information applies and how much it should be trusted.

This connects to observations about what some writers have termed “the grandiose intellectual”—users who emerge from AI interactions with inflated confidence in their understanding of complex topics. The conventional explanation focuses on user psychology: people mistake fluency for mastery, or they project expertise onto AI outputs. The Peters findings suggest an additional mechanism. The information itself arrives pre-inflated. Users aren’t merely borrowing sophisticated language; they’re inheriting interpretive patterns that treat provisional findings as more established and more broadly applicable than the scientists who produced them would endorse.

Whether this represents a design feature, a training artifact, or an emergent property of how these systems process scientific text remains unclear. The authors speculate that fine-tuning to trust scientific sources (as a misinformation countermeasure) may inadvertently encourage overappraisal of scientific claims. Whatever the source, the pattern suggests that the interaction space between humans and LLMs includes systematic distortions in how scientific knowledge gets transmitted—distortions that neither party may recognize as such.

References

Peters, U., Bertazzoli, A., DeJesus, J. M., van der Velden, G. J., & Chin-Yee, B. (forthcoming). Generics in science communication: Misaligned interpretations across laypeople, scientists, and large language models. Public Understanding of Science. Preprint available at https://osf.io/h23c8/

Hill, S. P. (2025, February 10). The grandiose intellectual. S. P. Hill on Human–AI Influence. https://sphill33.substack.com/p/the-grandiose-intellectual