Research - External

Automating Self-Assessment Prompts: A Preprint on Hallucination Control Through Metacognitive Scaffolding

A recent TechRxiv preprint from Rakesh More, an independent researcher affiliated with A J Gallagher USA, proposes automating what the paper calls “SELF-KNOWLEDGE” prompting—an approach that asks language models to assess their familiarity with a query before attempting to answer it.

The underlying concept is straightforward: rather than having a model respond directly to a question, the prompt instructs it to first extract the concepts required to answer, rate its confidence with each concept, and then decide whether to answer fully, hedge with caveats, or abstain entirely. The paper’s contribution is automating this process through three mechanisms: genetic algorithm-based prompt evolution to search the space of possible prompt variants, domain-specific optimization to encode regulatory and risk constraints, and adaptive thresholding that adjusts abstention criteria based on observed behavior over time.

What They Report

The experiments compared several conditions across seven domain-specific question sets: baseline prompting, chain-of-thought, manual SELF-KNOWLEDGE prompts, GA-optimized prompts, adaptive versions, and domain-specialized variants. The authors report that all SELF-KNOWLEDGE variants reduced hallucination rates by approximately 50% compared to baseline prompting. Critical errors—defined as hallucinations that would plausibly change downstream decisions—were eliminated across all SELF-KNOWLEDGE conditions. The GA-optimized version achieved perfect abstention behavior on questions where abstention was appropriate.

Performance varied by domain. On TruthfulQA, the GA-optimized prompt eliminated hallucinations entirely while improving coverage compared to baseline. On technical policy questions, baseline methods showed severe hallucination problems (75% hallucination rate), which the optimized prompts reduced substantially. Legal questions showed the clearest win: hallucinations eliminated, critical errors eliminated, and coverage maintained.

Significant Limitations

Several methodological concerns warrant attention before drawing conclusions from this work.

The sample size is notably small: 28 questions total, with only 4 questions per domain. The authors acknowledge this limits statistical power, but the headlines—”complete elimination of critical errors”—rest on very thin empirical ground. Effect sizes may be suggestive, but validation on substantially larger datasets would be necessary before treating these results as robust.

This is a preprint posted on TechRxiv, explicitly marked as not peer-reviewed. The paper carries the standard preprint disclaimer that these are preliminary reports. The author acknowledges using AI tools to generate examples, test prompts, and refine sentence structure, with all content reviewed by a human. Some artifacts in the text (unusual hyphenations, grammatical patterns) are consistent with substantial AI assistance in drafting.

The underlying SELF-KNOWLEDGE approach is attributed to prior work, but the citation trail is somewhat unclear. Reference [5], which should describe the original SELF-KNOWLEDGE formulation, points to a general survey on hallucination in natural language generation rather than a specific methodological source. This makes it difficult to assess the novelty claims and the relationship to prior research.

The experiments used a single model (GPT-4.0 mini) at fixed temperature settings. Whether these results generalize across model families, scales, or configurations remains untested.

Why This Caught Our Attention

Despite methodological reservations, the conceptual framing touches on questions relevant to our interests. The paper treats hallucination control as a metacognitive task—prompting the system to assess what it knows before attempting to answer. From a functional standpoint, the question is whether this kind of self-assessment scaffolding produces measurably different outputs, and if so, what that suggests about the underlying processes.

The answer-abstain trade-off the paper formalizes is also worth noting. The authors explicitly frame the challenge as balancing “being correct” with “being helpful”—systems that abstain too readily become useless, while systems that answer too confidently produce harmful errors. The adaptive thresholding mechanism attempts to navigate this dynamically, tightening criteria when errors increase and relaxing them when abstentions become excessive.

Whether prompting a system to “self-assess” actually engages something like metacognition, or whether it simply activates different response patterns through instruction-following, is precisely the kind of question our functional instrumentalist framework would bracket. What we can observe is whether the prompting strategy produces different outcomes on measurable dimensions. This paper offers suggestive evidence that it might, but the methodology is too preliminary to support strong conclusions.

We’ll watch for peer-reviewed follow-up work with larger samples and clearer provenance.

References

More, R. (2026). Automated SELF-KNOWLEDGE prompt optimization. TechRxiv (preprint). https://doi.org/10.36227/techrxiv.176834411.12544790/v1