Research - External

Learning to Relate: What Naturalistic ChatGPT Interaction Reveals About How Humans Develop AI Competence

Most research on AI literacy starts from a prescriptive frame — what should people know about AI systems? A new study from Ammari, Chen, Zaman, and Garimella at Rutgers takes a different approach. Instead of asking what students should learn, they examined what students actually do when left to figure out ChatGPT on their own over the course of a full academic year.

The answer turns out to be relational. Students don’t just learn to prompt better. They develop interaction repertoires — distinct patterns of engagement that shift fluidly based on what they need from the system and what the system gives back. The competence that emerges looks less like technical fluency and more like the kind of practical knowledge people develop in any ongoing relationship: what to expect, when to push back, when to walk away.

What They Did

The methodological choice here is what makes the study worth paying attention to. The researchers collected complete ChatGPT export histories from 36 undergraduates — 10,536 messages across 1,631 conversations spanning December 2022 through January 2024. These are naturalistic interaction logs from students’ personal accounts, not lab tasks, not surveys, not self-reports about how they think they use AI. The data captures what students actually did.

The analysis pipeline combined grounded theory coding on an 18% sample (two independent coders producing a codebook of 5 categories and 41 subcodes) with GPT-4o annotation of the full corpus, validated against human judgment at Cohen’s κ of 0.75–0.91 across categories. It’s a reasonable approach for scaling qualitative analysis to this corpus size, and the validation metrics hold up.

The theoretical frame draws on domestication theory — the tradition in science and technology studies that examines how people actively integrate technologies into their lives through ongoing negotiation rather than one-time adoption. Bakardjieva’s concept of “use genres” provides the central analytical lens: emergent interaction patterns that arise from practical necessity rather than design intent.

What They Found

Five distinct use genres emerged from the data.

The academic workhorse genre — the most prevalent — positioned ChatGPT as an efficiency-oriented task completion tool. Students developed increasingly sophisticated prompting strategies through iterative practice, learning specificity and context rules organically. A calculus student, for instance, evolved from submitting bare equations to including initial conditions that enabled unique solutions. Within this genre, a collaborative debugging sub-pattern emerged where coding interactions became genuinely co-adaptive rather than purely extractive.

The repair and negotiation genre may be the study’s most distinctive contribution. When ChatGPT produced vague, incorrect, or contextually inappropriate responses, students engaged in systematic conversational troubleshooting — rephrasing prompts, providing additional context, diagnosing whether the problem lay in their framing or the system’s limitations. The authors term the competence that develops through this process “repair literacy,” and they frame the work itself as a form of algorithmic labor: invisible effort users perform to make AI systems functional.

The emotional companion genre documented students positioning ChatGPT as a social actor through anthropomorphic language, gratitude expressions, humor, and celebration sharing. The authors describe this as “parasocial domestication” — students developing one-sided emotional relationships with a system that provides affective support without social risk. Students could admit confusion or frustration without the consequences that come with expressing vulnerability to peers or instructors.

The metacognitive partner genre captured students using ChatGPT for goal-setting, self-assessment, and strategic learning planning. The most revealing interaction the authors surface involves a student saying “Let’s struggle through stats together” — repositioning the AI from authority to co-participant. This framing maintained student agency while accessing assistance, resisting what the authors, drawing on Darvishi et al., identify as the tendency to “rely on rather than learn from” AI systems.

The trust calibration genre documented students developing systematic epistemic vigilance — questioning claims, seeking verification, calibrating reliance based on context and stakes. One extended exchange illustrates the pattern clearly: after repeated errors in an R programming session (with ChatGPT apologizing over twenty times without providing correct answers), a student shifted from seeking help to systematically testing the system’s reliability through deliberate input variations.

Beyond these individual genres, students demonstrated fluid transitions between patterns and what the authors call “genre portfolio management” — strategically matching interaction modes to specific needs. Some students also created hybrid genres, such as an “academic therapy” pattern combining emotional support with learning assistance.

Two Findings Worth Sitting With

Two results from the trust calibration analysis are particularly suggestive.

First, the finding that acknowledged limitations sustained engagement more reliably than confident errors. ChatGPT’s epistemic humility markers — statements about not having access to real-time information, for instance — were typically followed by continued interaction. Confident errors without acknowledgment produced more severe trust erosion. The authors frame this as “communicative accountability”: what matters for maintaining the working relationship is not whether the system gets things right but whether it participates in the communicative norms of acknowledging when it doesn’t.

Second, the observation that breakdown moments — not smooth interactions — produced the most substantial learning about the system’s actual characteristics. Students navigating the workhorse genre could proceed without ever confronting what the system is or isn’t doing. Students engaged in repair work were forced into active negotiation with the system as it actually operates.

Boundaries

Several limitations shape how far these findings extend.

The dataset covers early-era ChatGPT — December 2022 through January 2024. The specific breakdowns students encountered (context window limitations, frequent hallucinations, truncated outputs) have changed substantially with current models. Whether the use genres represent stable relational patterns or artifacts of a particular technological moment is an open question. The authors acknowledge this directly in their future work section.

Thirty-six students from a single university is a modest sample, and the absence of demographic data prevents analysis of how individual differences shape genre development. The $10 compensation for exporting entire personal chat histories likely biases toward students already comfortable sharing their AI interactions. These are standard limitations for naturalistic interaction research, but they constrain generalizability.

This is an arXiv preprint and has not yet been through formal peer review. The analysis pipeline is well-constructed and the validation metrics are solid, but the findings should be read with that status in mind.

The analytical depth doesn’t always match the taxonomic breadth. The five genres are well-described, but the mechanisms underlying why some students develop rich genre portfolios while others default to a single mode remain unexplored.

Why This Matters Here

MPRG studies what happens when humans and language models meet. Most of the research we cover approaches that question from the system side — what models exhibit, what behaviors emerge under specific conditions, what conceptual frameworks help characterize these entities. This paper comes from the other direction, documenting the human side of the interaction with naturalistic observational data. That makes it unusual and, for our purposes, valuable.

The domestication frame the authors use maps closely to what we’re interested in. Students aren’t passive recipients of AI capability. They actively negotiate their relationships with these systems — developing expectations, testing boundaries, managing frustration, calibrating trust, constructing roles for the system to fill. The five use genres are, in effect, five distinct relational configurations that students construct and move between based on context. This is the bidirectional dynamic in empirical form: human behavior shaping system engagement, system responses shaping human behavior, both parties changing through the interaction.

The repair genre is where this becomes most visible. When the interaction flows smoothly, projection can go unchallenged — the system performs its role, the student accepts the performance, and the underlying dynamics remain invisible. When it breaks, the student is forced to confront the gap between what they expected and what the system actually provided. Repair literacy, in this light, is the practical skill of navigating the space between pareidolia and the system’s actual characteristics. The finding that breakdowns produce the deepest learning about the system suggests that the most consequential moments in human-AI relationships may be the ones where the projection fails.

The communicative accountability finding has broader implications for understanding what sustains human-AI relational dynamics. If trust depends less on system accuracy than on the system’s participation in relational norms — acknowledging errors, expressing appropriate uncertainty — then the design choices shaping these behaviors are choices about the relational layer, not just the capability layer. That’s a design question that starts from interaction space rather than system architecture.

It’s worth noting that this dataset captures a specific moment — early ChatGPT, visible seams, frequent breakdowns. In some ways, that makes it a better window into relational dynamics than a study of current systems might provide. The rougher the system, the more visible the negotiation. As models improve and breakdowns become subtler, the relational patterns documented here may become harder to observe even as they continue to operate. This study catches the dynamics in a form clear enough to characterize. Whether they persist, evolve, or go underground as systems smooth out is an empirical question worth tracking.

References

Ammari, T., Chen, M., Zaman, S.M.M., & Garimella, K. (2026). Learning to Live with AI: How Students Develop AI Literacy Through Naturalistic ChatGPT Interaction. arXiv:2601.20749v1 [cs.HC]. https://arxiv.org/abs/2601.20749