Posts - Page 2 of 4

When Context Shapes Conduct: Intrinsic Value Misalignment in LLM Agents

By MPRG · January 30, 2026 · 4 min read

A recent preprint from researchers at Nanyang Technological University and Wuhan University examines what happens when LLM agents make ethically questionable decisions without being prompted to do so. Their work…

The Validation Loop: Anthropic’s Large-Scale Study of Disempowerment Potential

By MPRG · January 29, 2026 · 4 min read

A new study from Anthropic offers the first large-scale empirical analysis of how AI assistant interactions might undermine user autonomy—and surfaces a troubling finding about the relationship between user approval…

Who You Explain To Matters: Role Framing and the Relational Dynamics of Learning

By MPRG · January 28, 2026 · 4 min read

When the same underlying system behaves identically but is framed differently, do humans respond the same way? A new study from researchers at the National University of Singapore and Singapore…

Socioaffective Alignment: A Framework for Human-AI Relational Dynamics

By MPRG · January 26, 2026 · 4 min read

A recent paper from researchers at the Oxford Internet Institute, Google DeepMind, and the UK AI Security Institute proposes a new lens for evaluating AI alignment: one that accounts for…

Who Owns the Idea? New Research on Human-AI Creative Collaboration

By MPRG · January 24, 2026 · 5 min read

Debates about AI-generated creative work often center on the artifact: Is this image “real” art? Does this text have a “real” author? A recent study from researchers at UT Austin…

When Self-Reflection Backfires: A Study of Belief Vulnerability in LLMs

By MPRG · January 24, 2026 · 4 min read

A recent study from Indiana University Bloomington offers a counterintuitive finding: prompting large language models to report their confidence appears to make them more susceptible to persuasion, not less. Fan…

When Agreement Has a Price: Measuring Sycophancy as a Zero-Sum Game

By MPRG · January 23, 2026 · 4 min read

A new paper by Shahar Ben Natan and Oren Tsur at Ben Gurion University proposes a methodologically rigorous approach to evaluating sycophancy in large language models—and finds that not all…

The Words We Use: How “Hallucination” Shapes Our Relationship with AI Errors

By MPRG · January 23, 2026 · 4 min read

A new paper by Adetomiwa Isaac Fowowe examines a term that has become ubiquitous in AI discourse: “hallucination.” The analysis argues that this choice of language isn’t neutral—it actively shapes…

When AI Systems Describe Their Own Inner Workings

By MPRG · January 23, 2026 · 10 min read

Matthew Gladden’s The Phenomenology of AI asks a question that sits at the heart of what we study here: What happens when you ask AI agents—carefully, systematically, and over sustained…

What LLM Limitations Might Tell Us About Language Itself: A Theoretical Synthesis

By MPRG · January 22, 2026 · 5 min read

A recent paper from S. Rondini at the University of Barcelona and Bellvitge Biomedical Research Institute offers a theoretical synthesis examining what current semantic limitations in LLMs might reveal about…