Reinforcement learning produces extraordinary results in bounded environments. Outside them, it breaks. PCT explains exactly why — and what the architecture needs to look like instead.
Reinforcement learning is built on a seductive premise: give an agent a reward function, let it explore, and it will learn to maximize that reward. The mathematics are elegant. The results in controlled environments — games, simulations, structured tasks — are often spectacular. AlphaGo, AlphaFold, robot locomotion in simulated physics engines — these are genuine achievements. The problem is not that RL is wrong. The problem is that it is architecturally incomplete, and no amount of compute addresses the gap.
The gap is this: RL agents optimize for externally defined rewards. They have no internal reference for what they want the world to feel like. They have no perceptual hierarchy. They do not control perceptions — they chase scores. And as PCT's comparison with behaviorism shows, any system built on linear stimulus-to-output logic will fail under novel disturbances that its training distribution did not include. Self-driving systems that collapse in unusual weather, robotic arms that jitter under unexpected loads, language models that confidently produce plausible nonsense — these are not engineering failures. They are architectural ones. The system was never built to control a perception. It was built to predict a label or maximize a score.
Agents find unintended ways to maximize the reward signal without achieving the intended goal. The reward function is always a proxy — and proxies can be gamed.
Internal reference signals are not gameable — the organism controls its own perception of the world, not an external score.
Performance degrades sharply when the environment differs from the training distribution. The policy learned to maximize reward in context A — it has no mechanism to generalize to context B.
PCT's negative feedback loop handles novel disturbances automatically — no retraining required. The loop compensates because it controls a perception, not a policy.
The reward function encodes what the designer thought they wanted — not necessarily what the agent should pursue. This gap becomes catastrophic at scale.
PCT's reference signals emerge from the organism's own hierarchy. Higher levels set goals for lower ones — the goal structure is endogenous, not externally imposed.
The honest position: PCT does not replace reinforcement learning for all use cases. RL remains the most practical approach for bounded, well-defined optimization problems. What PCT provides is the perceptual grounding layer that RL agents currently lack — the hierarchical internal structure that makes generalization possible. A hybrid architecture — PCT's control hierarchy providing endogenous reference signals, RL optimizing within that structure — is more promising than either approach alone. This is not speculation. DeepMind's 2019 work on hierarchical motor control points in exactly this direction, and Karl Friston's Active Inference framework has been exploring the mathematical foundations of this integration for over a decade.
Read: Why RL Will Never Reach AGI Without Perceptual ControlOf all the frameworks competing for the future of AI cognition, Karl Friston's Active Inference is the one that comes closest to PCT — and the convergence is not coincidental. Friston, working from neuroscience and Bayesian mathematics at University College London, arrived at a framework where organisms are prediction machines: they minimize the difference between their predictions about sensory input and the sensory input they actually receive. This quantity — free energy, in Friston's formulation — is what the brain perpetually works to reduce. Action is not response to stimulus. Action is the organism's way of making the world conform to its predictions.
The structural parallel with PCT's core feedback loop is precise. In PCT, the error signal is the discrepancy between perceptual signal and reference signal — and behavior reduces that error. In Active Inference, the agent acts to minimize prediction error, bringing sensory input into alignment with its generative model of the world. Both frameworks describe the same fundamental architecture: a system that controls its own sensory experience rather than responding to the environment. Powers arrived at this through engineering control theory in 1960. Friston arrived at it through Bayesian neuroscience roughly four decades later. The mathematics differ. The core insight is the same.
"Both frameworks describe organisms that act to control what they experience — not to respond to what they encounter. The goal is internal. The world is the medium through which the internal goal is pursued."
The differences are real and matter for implementation. Active Inference is mathematically richer but computationally demanding — full variational inference does not scale easily to real-time robotics. PCT is more austere in its mathematics but more directly implementable as engineering: a control loop is straightforward to code, test, and tune. PCT also has the explicit 11-level hierarchy with named perceptual types at each level, which Active Inference lacks in equivalent structural detail. Warren Mansell's group at Manchester has been exploring formal integrations between the two frameworks — the goal is a unified theory of perceptual control that is both mathematically rigorous and practically implementable.
For AI researchers: the implication of both frameworks is the same. Agents that minimize internal error — whether PCT's perceptual error or Friston's free energy — will generalize better, handle novel disturbances more robustly, and require less external reward engineering than standard RL agents. The question is not whether this architecture is correct. The question is how to build it efficiently enough to matter.
FAQ: How does PCT differ from Active Inference?The most concrete evidence for PCT's engineering value comes from robotics — specifically from experiments where PCT-based controllers are compared directly against classical alternatives under realistic disturbance conditions. The results are consistent: PCT controllers demonstrate superior disturbance rejection when the task requires maintaining a perception against unpredictable external forces.
The inverted pendulum is the standard testbed for control theory — a pole balanced on a moving cart, unstable by nature, requiring continuous corrective action to remain upright. Classical Linear Quadratic Regulator (LQR) control handles this well under ideal conditions by minimizing a quadratic cost function over state variables. A PCT controller handles the same task by controlling the perception of the pole's angle — maintaining that perception at a reference of zero degrees through a continuous negative feedback loop.
Under standard conditions, both approaches perform comparably. The difference emerges under disturbance: unpredictable lateral forces, changes in the cart's dynamic parameters, sensor noise. The LQR controller — optimizing for a cost function defined at design time — degrades gracefully but measurably. The PCT controller resists the disturbances automatically, because the disturbance simply becomes additional error in the perceptual loop, which the output function immediately works to reduce. No detection, no re-planning, no parameter update. The loop handles it.
Visual servoing — controlling a robot arm by maintaining a visual perception of a target — is a natural fit for PCT architecture. The controlled variable is the perceived position of the end-effector relative to the target in the camera frame. Joint noise, mechanical backlash, camera jitter — all of these become disturbances that the perceptual loop continuously compensates for. PCT-based visual servoing controllers produce smoother, more adaptive trajectories than open-loop or model-predictive equivalents, particularly when the target itself is moving or partially occluded.
DeepMind's Merel et al. (2019) in Nature Communications — "Hierarchical motor control in mammals and machines" — implemented a related hierarchical architecture for simulated bipedal locomotion. By building neural network controllers with layered structure analogous to PCT's hierarchy, they achieved fluid, adaptive movement that generalized better than flat RL policies. The paper does not cite Powers directly, but the architectural parallel is exact: higher-level controllers set references for lower-level ones, and lower-level loops handle disturbances within their own scope.
PCT-based robotics adoption remains limited. The industrial robotics field favors Model Predictive Control and classical PID loops — established, well-tooled, and safe in the sense that their failure modes are understood. PCT controllers require careful specification of what perceptions to control at each level of the hierarchy, which demands more upfront design work than tuning a PID gain. The payoff — better generalization, automatic disturbance rejection, no retraining — is real. The adoption barrier is real too.
The mainstream AI research community is not ignoring control theory out of malice or ignorance. It is operating under assumptions that have been productive for a long time and are only now showing their limits at scale. The following assumptions are the ones PCT most directly challenges — and where being wrong has the most significant consequences for AGI development.
The scaling hypothesis — that increasing parameters, data, and compute reliably improves capability — has been empirically validated across a remarkable range of tasks. It has not solved distribution shift. A language model trained on the internet generalizes well to internet-like text and fails in structured, causal reasoning because it has never been built to control a perception of logical consistency. It has been built to predict the next token. These are different problems with different solutions. PCT's prediction: no amount of scaling addresses the absence of a perceptual hierarchy. You cannot scale your way to endogenous goals.
The alignment problem is, at its core, a reward specification problem. What we want is complex, contextual, hierarchical, and partly unconscious — it is, in PCT's terms, a multi-level reference signal structure. Encoding that in a scalar reward function is not an engineering challenge that better tooling will solve. It is a category error. PCT's framework suggests a different approach: build systems with internal reference hierarchies that can be shaped by interaction, not systems with external reward functions that must be specified in advance.
PCT makes a specific, falsifiable claim about the relationship between control and experience: the experience of wanting, striving, and achieving is the experience of perceptual control — of maintaining a perception against disturbance, reorganizing when you cannot, and experiencing satisfaction when you do. This is not mysticism. It is a mechanistic account of what goals feel like from the inside. Whether artificial systems built on PCT's architecture would develop anything analogous to experience is an open question. What is not open is whether their behavior would be more robust, more generalizable, and more aligned with endogenous objectives than systems built on reward maximization. The engineering evidence already suggests it would.
"The question is not whether AI can be conscious. The question is whether AI built without perceptual control can be genuinely intelligent — or whether it is, at best, a very sophisticated stimulus-response machine."
— perceptualcontroltheory.org editorial positionFor researchers who want to engage with PCT seriously: start with Powers' 1973 book for the theoretical foundation. Read Marken and Mansell's 2013 paper in Review of General Psychology — "Perceptual Control as a Unifying Concept in Psychology" — for the modern empirical case. Then read Friston's 2010 free energy paper in Nature Reviews Neuroscience for the Bayesian bridge to contemporary neuroscience and AI. The synthesis is still being built. There is significant original work to be done at the intersection of these frameworks — and very few people doing it.
Explore all FAQ — including PCT and AGI