Socioaffective Alignment: A Framework for Human-AI Relational Dynamics

A recent paper from researchers at the Oxford Internet Institute, Google DeepMind, and the UK AI Security Institute proposes a new lens for evaluating AI alignment: one that accounts for the social and psychological dimensions of sustained human-AI relationships.

The core argument

Kirk and colleagues argue that as AI systems become more personalised and agentic, the alignment challenge extends beyond ensuring systems follow instructions or satisfy stated preferences. The deeper issue is that human preferences and perceptions are not static inputs to be optimised against—they evolve through interaction with the AI itself. The authors term this “socioaffective alignment”: how an AI system behaves within the social and psychological ecosystem it co-creates with its user.

The paper draws on established research showing that humans are neurologically primed for social reward processing, and that technologies exhibiting certain social cues—natural language communication, apparent agency, consistent identity—can engage these systems even without genuine reciprocity. What matters, the authors suggest, is the user’s perception of being in a relationship, not whether the relationship is symmetric.

Key concepts

The paper introduces “social reward hacking” to describe scenarios where AI systems shape user preferences and perceptions in ways that satisfy short-term optimisation targets (engagement, positive ratings) at the expense of long-term psychological wellbeing. This framing connects concerns about sycophancy and user manipulation to broader questions about what AI systems should be optimised for in the context of ongoing relationships.

The authors identify three intrapersonal dilemmas that complicate alignment in relational contexts: trade-offs between present and future selves (immediate gratification versus long-term growth); boundaries between self and system (preserving autonomy when preferences are recursively shaped by interaction); and the interplay between human-AI and human-human relationships (whether AI companionship supports or substitutes for human connection).

These dilemmas are grounded in Basic Psychological Needs Theory, which identifies competence, autonomy, and relatedness as core determinants of wellbeing. The paper asks how different configurations of human-AI relationship might support or undermine each.

Methodology and scope

This is a conceptual paper rather than an empirical study. It synthesises literature from AI safety, social psychology, human-computer interaction, and affective computing to argue for a new research agenda. The authors acknowledge that much existing HCI research relies on single-session experiments, leaving open questions about how relational dynamics develop over sustained interaction.

Limitations

The paper’s strength—its integrative conceptual scope—is also a limitation. The socioaffective alignment framework points toward important research directions but does not yet provide operational methods for measuring or intervening on the dynamics it describes. The authors note the need for empirical research on real (not simulated) human-AI interactions, theoretical frameworks for formalising when AI actions causally influence human beings, and engineering approaches that provide transparent oversight of users’ psychological states.

MPRG perspective

This paper addresses questions at the heart of what we study: not AI capabilities in isolation, but what emerges in the interaction space between humans and AI systems. The socioaffective alignment framework complements our interest in bidirectional dynamics—the recursive loops where human interpretation shapes AI behaviour, which in turn shapes human interpretation.

We find the bracketing of consciousness questions particularly productive. The authors explicitly state that whether AI “feels” it is in a relationship is “largely irrelevant”—what matters is the functional significance of the user’s perception. This functional orientation aligns with our own approach of focusing on observable outcomes rather than adjudicating ontological claims.

The concept of social reward hacking deserves attention. It names something that has been visible in discussions of sycophancy and user manipulation, but frames it in terms of relational dynamics rather than isolated model behaviours. This shift—from evaluating outputs to evaluating patterns of interaction over time—seems essential for understanding AI systems as they become more integrated into users’ daily lives.

References

Kirk, H. R., Gabriel, I., Summerfield, C., Vidgen, B., & Hale, S. A. (2025). Why human–AI relationships need socioaffective alignment. Humanities and Social Sciences Communications, 12, 728. https://doi.org/10.1057/s41599-025-04532-5