Research - External

Labels from Somewhere: Surfacing the Situated Judgment Behind Alignment Data

Before a language model learns what “fair” means, someone has to decide. In the annotation pipelines that produce alignment data, crowd workers compare outputs and indicate preferences — and those preferences become the training signal. Standard practice treats these judgments as interchangeable measurements of some underlying ground truth. Annotators who disagree get averaged out. What remains is a collapsed label, stripped of the context that produced it.

A study from researchers at Delft University of Technology asks what happens when you make that context visible instead.

What They Did

Arzberger and colleagues developed what they call a “reflexive annotating probe” — an interface that prompts crowd workers not just to rate text for fairness, but to articulate which aspects of their identity and experience felt relevant to their judgment. The probe draws on Jacobson and Mustafa’s Social Identity Map, a tool from qualitative research that helps participants name their social coordinates and reflect on how those coordinates shape interpretation.

Thirty crowd workers assessed fairness in short text excerpts (a job vacancy, a product advertisement) while indicating which identity facets — gender, ethnicity, class, ability, and others — informed their reading. They highlighted specific textual cues, tagged them with relevant identity dimensions, and provided brief rationales linking their positioning to their interpretation. Follow-up interviews with five participants explored the experience in more depth.

The aim was not to produce optimizable preference data but to characterize what situated annotation looks like in practice — how value judgments unfold when annotators are invited to surface the social grounding of their interpretations.

What They Found

The reflexive prompts shifted workers from producing decontextualized labels to engaging in what the authors call “situated forms of reasoning.” Several patterns emerged.

Positioning the self. Annotators grounded their judgments in two recurring ways: through lived experience (“given my personal experiences with psychological issues, I have a broader understanding of how people with mental disabilities must see the world”) and through imaginative solidarity (“I don’t think I’ve ever experienced hiring discrimination myself… but I can easily put myself in the shoes of someone in that situation”). Both modes carried epistemic weight, but they anchored judgment differently.

Intersectional reasoning. Many participants linked multiple identity facets when explaining their assessments. One non-binary participant connected feminist commitments, gendered social pressure, and assigned sex at birth in a single critique. This kind of compound positioning enriched the interpretive space — the same text could be read through different intersecting lenses, yielding different but legitimate fairness evaluations.

Reflexivity under tension. The probe also revealed friction. Some annotators pointed to fairness-relevant cues but left their evaluative stance implicit, assuming shared understanding that alignment systems cannot access. Others experienced labeling dissonance — their reasoning was intersectional, but the available identity categories were not, leading them to default to familiar labels that didn’t quite fit their actual reflection. And in emotionally charged content, personal reactions sometimes overshadowed fairness evaluation entirely.

Experiential effects. Reflexive engagement was not cost-free. Some workers revisited painful memories or confronted their own privilege, experiencing what the authors describe as “affective labour.” Others strategically withdrew, providing minimal responses when the cognitive or emotional demands exceeded what they were willing to invest. But for many, the process cultivated what the researchers call “perspective awareness” — explicit recognition of what they could not know, what they tended to overlook, and where their standpoint was partial.

Boundaries

The study is qualitative and exploratory, with a modest sample (N=30 for the probe, N=5 for interviews). The focus on fairness leaves open how these dynamics play out for other alignment-relevant values like harm or truthfulness. The authors acknowledge that their categorical approach to capturing positionality, while useful for scaffolding reflection, inevitably simplified fluid and intersectional experience. And the largely English-speaking sample limits cultural generalizability.

The researchers are also candid about risks. Collecting positionality data, even pseudonymized, carries re-identification concerns. And while personalization could enhance cultural relevance, it also opens pathways to profiling, bias entrenchment, and targeting of vulnerable populations.

Why This Matters

Much of our work concerns feedback loops between human interpretation and model behavior — what we’ve called bidirectional pareidolia. We typically locate the start of that loop at preference ratings: humans evaluate outputs, those evaluations shape training, training shapes behavior, behavior shapes future evaluations.

This study suggests the loop begins earlier. At the moment of judgment itself, annotators are already making situated, positional, intersectional interpretations. They bring lived experience, imaginative solidarity, implicit assumptions, and emotional responses to the act of labeling. Standard annotation pipelines then strip all of that away, collapsing rich interpretive acts into single data points that get aggregated into training signals.

The authors’ term for what’s lost is useful: provenance. When a label arrives in a training pipeline, it carries no trace of who held that view, under what conditions, through what lens. Disagreement becomes noise rather than signal. And models learn statistical conformity rather than sensitivity to the plurality of human values.

What the probe makes visible is not annotator bias to be corrected, but annotator positioning to be preserved — or at least acknowledged. The researchers frame this as a move toward “situated alignment,” where subjectivity, uncertainty, and social positioning are treated as epistemic resources rather than obstacles.

Whether alignment pipelines can actually integrate this kind of rich metadata remains an open question. But the finding that reflexive engagement alters how workers approach annotation — encouraging slower deliberation, perspective-taking, and recognition of epistemic limits — suggests that how we ask for human feedback may shape what kind of feedback we get.


References

Arzberger, A., Offerman, C., Gadiraju, U., Bozzon, A., & Yang, J. (2026). “Label from Somewhere”: Reflexive Annotating for Situated AI Alignment. arXiv preprint arXiv:2601.17937. https://arxiv.org/abs/2601.17937