Self-Evolving Agents and the Architecture of Knowing What You Can Do

A preprint from Sampath and Baskaran introduces an architecture for multi-agent AI systems that dynamically restructure themselves at runtime—”hiring” specialized sub-agents when capability gaps are detected and “firing” them when resources are constrained. While the paper’s primary focus is engineering efficiency rather than cognition, several of its mechanisms touch on questions relevant to our research interests.

The Problem: Generalization vs. Specialization

The authors identify a scalability bottleneck they call the “Generalization-Specialization Dilemma.” Monolithic agents equipped with extensive toolkits suffer from context pollution—as the number of available tools grows, the model’s ability to select the correct one degrades. Conversely, static multi-agent systems (where specialized agents are always running) introduce latency and resource overhead.

Their solution is a “Dynamic Mixture of Experts” (DMoE) architecture that treats agent capabilities as ephemeral resources. Rather than routing tokens to neural sub-networks (as in traditional MoE), the system routes semantic tasks to specialized agents that are instantiated on demand and evicted when idle.

The Meta-Cognition Engine

The component that caught our attention is what the authors call the “Listener-Learner”—an asynchronous background process that monitors conversation logs to detect two types of signals:

Gap signals are triggered when the system responds with refusal phrases like “I lack the capability” or “I cannot do that.” These indicate a missing tool that might exist in dormant storage but isn’t currently loaded.

Optimization signals are triggered when generic tools are over-utilized for domain-specific tasks—for instance, if the system performs multiple web searches about stock prices when a dedicated finance API would be more efficient.

Upon detecting a gap signal, the Listener-Learner performs a semantic similarity search against a registry of dormant tools. If a match is found, it instantiates a new specialized agent and updates the router’s manifest of available capabilities.

This is, in effect, a form of self-monitoring: the system observes its own outputs, detects patterns indicating suboptimal performance, and restructures its runtime environment in response.

Surgical History Pruning

Perhaps the most striking mechanism is what the authors call “Surgical History Pruning,” designed to address “Refusal Bias.”

The problem: when an LLM has previously stated “I cannot book flights,” it tends to continue refusing that request even after the capability has been added—because the refusal remains in its conversation history, conditioning subsequent responses.

The solution: when the system upgrades from a generic state to a specialized one, it identifies the specific message where the refusal occurred and deletes it from the conversation history. The result, as the authors describe it, is that “the agent appears to have always had the capability.”

From a purely engineering perspective, this solves a practical problem. From our perspective, it raises questions worth sitting with: What does it mean for a system to “know” its own capabilities when that knowledge can be surgically altered? The system’s self-model—its representation of what it can and cannot do—is revealed to be not a stable fact but a constructed narrative that can be edited for functional purposes.

What the Findings Don’t Show

The paper presents a proof-of-concept architecture with scenario-based evaluations rather than large-scale empirical benchmarks. The experiments demonstrate that the system can transition from generic to specialized states, but comparative performance data against baseline approaches is limited.

The authors also note several practical limitations: evolution latency (delay between failed request and system adaptation), cold start overhead when instantiating new experts, and debugging complexity in an autonomously changing runtime.

Why This Matters to MPRG

This paper isn’t about consciousness or human-AI relationships—it’s systems engineering. But the architectural choices it makes encode assumptions about self-knowledge that connect to our research interests.

The Listener-Learner implements a form of functional metacognition: the system monitors its own behavior, detects capability boundaries, and takes corrective action. Whether this constitutes “genuine” self-awareness is precisely the kind of question we bracket. What we can observe is that the system exhibits behaviors consistent with self-monitoring—and that those behaviors produce measurable functional differences in task completion.

The Surgical History Pruning mechanism is perhaps more interesting still. It treats the system’s self-knowledge as instrumentally malleable—something to be edited when it interferes with performance. This suggests a view of machine self-representation as fundamentally different from human self-knowledge: not a discovery of fixed facts, but a construction maintained for functional purposes.

We’re not drawing conclusions about what this implies for machine cognition more broadly. But for researchers interested in how agentic systems represent and act on information about their own capabilities, this paper offers a concrete case study in the engineering of machine self-models.

References

Sampath, S. & Baskaran, A. (2026). Adaptive Orchestration: Scalable Self-Evolving Multi-Agent Systems. arXiv preprint arXiv:2601.09742. https://arxiv.org/abs/2601.09742