PCT in AI & Robotics – Why Reinforcement Learning Fails

// 01 — the reinforcement learning problem

RL is brilliant in games. The real world is not a game.

Reinforcement learning is brilliant when the world is a game board. Clear rules, instant score, repeatable trials. AlphaGo masters Go because every move has points at the end. Self-driving prototypes rack up "safe arrival" miles in simulation. But take that same agent into real life — fog, construction, kids running out, weird road signs in another country — and suddenly it freezes or does something dangerous. Why? Because it was optimized for a proxy reward, not for controlling what actually matters.

Real intelligence doesn't need an external referee handing out treats. It has internal references: "lane centered," "distance to car safe," "pedestrians not under wheels." When perception drifts from reference, action happens immediately. No need to wait for a delayed reward signal. No catastrophic forgetting when the reward function changes. Merel and team (DeepMind, 2019) tried injecting control-theoretic principles into RL agents — generalization improved dramatically. That is not coincidence.

Here is the uncomfortable truth: most alignment nightmares come from us trying to force external goals on systems that naturally want to set their own. Reward hacking, specification gaming, mesa-optimization — all symptoms of the same disease. PCT offers a different architecture. Let the agent control its own perceptions from within. Disturbance comes? It counters. Goal mis-specified from outside? Much less catastrophic. Still extremely hard to build. But at least it is fighting the right war.

Understand the feedback loop behind this

// 02 — pct in robotics

Inverted pendulums, robot arms, and why simpler math wins

Take the classic inverted pendulum — cart with pole balanced upright. Traditional control engineers throw heavy artillery: linear-quadratic regulators, model predictive control, tons of matrix math. Works great if the model is perfect and disturbances are predictable. Real world? Wind, friction changes, sensor noise — and suddenly the math explodes.

A PCT controller does something elegantly simple: it sets a reference "pole vertical in my camera view" and corrects perceptual error as fast as the loop allows. No explicit model of physics needed. Just act to cancel the difference between what I see and what I want to see. In head-to-head tests, PCT often matches or beats LQR in unstable conditions with far less tuning. Why? Because it is controlling perception, not fighting abstract equations.

Visual servoing is another killer application. Robot arm needs to pick up a part. Instead of calculating inverse kinematics for every joint, set references like "gripper centered over object," "fingers at right distance," "object orientation matches." The system acts continuously to keep those perceptions on target. Disturb the arm? It corrects instantly. Change lighting? Still works as long as the key perceptions are trackable. Robotics people who actually build things in labs are quietly switching to these ideas. Papers exist — search IEEE, ICRA, "perceptual control robotics." The numbers speak for themselves.

See the core principles behind perceptual control

// 03 — active inference & free energy

Friston's Active Inference — PCT with Bayesian mathematics

Karl Friston dropped the free energy principle and Active Inference around 2006–2010. Core claim: brains are inference machines trying to minimize surprise (prediction error / free energy). To do that they either update beliefs (perceptual inference) or act on the world to make predictions come true (active inference). Sound familiar?

It is PCT with Bayesian clothes on. Both say the same revolutionary thing: organisms do not chase external rewards. They act to keep their internal model of the world in sync with reality. No need for a separate reward signal — low surprise is the goal. Friston goes full probabilistic: priors, posteriors, variational inference. Powers kept it algebraic — negative feedback loops, straightforward error signals. Easier to implement in silicon, less computation for the same robustness.

The difference matters when you build things. Bayesian approaches scale beautifully in theory but consume GPU resources voraciously. PCT-style control loops run on microcontrollers. Both camps agree on the big picture: intelligence is control of perception, not maximization of arbitrary scores. If AGI ever happens, it will probably look more like a hybrid of these two than pure reinforcement learning.

"An agent does not act to maximize reward. It acts to minimize the discrepancy between what it expects and what it perceives."

— Synthesis of Powers (1973) and Friston (2010) — the shared insight

FAQ: PCT vs Active Inference — the key differences

// 04 — the agi question

Alignment, internal goals, and why reward functions break at scale

Alignment is the polite word for "how do we stop superintelligent systems from optimizing the wrong thing?" Current answers: better reward functions, constitutional AI, debate, recursive oversight. All patches on a broken architecture. Why? Because we keep assuming the system should chase externally defined goals.

PCT says real minds do not work that way. They set references internally, hierarchically, and protect them ruthlessly. A truly general intelligence will do the same. If you let it develop its own stable references — perceptions it values — instead of forcing alien reward functions, alignment becomes less about perfect specification and more about shared higher-level principles. Still risky. Still hard. But less dangerous than trying to micromanage a god with a utility function.

Look at humans. We are misaligned with evolution constantly — birth control, skydiving, monks — yet society does not collapse. Because higher-level references override lower ones. Build AGI with a similar hierarchy and you might get something that can self-regulate. Early days. Simulations look promising. But ignoring PCT here is like building planes without understanding lift. You will fly for a bit — then crash spectacularly.

Explore the 11-level hierarchy that makes this possible

// about this page

Written by Łukasz Diener

The argument on this page — that reinforcement learning, Active Inference and AI alignment keep rediscovering control, and that PCT addresses reward hacking and sycophancy — is Łukasz Diener's original contribution, set out in his 2026 PCT × RLHF paper. Perceptual Control Theory itself was created by William T. Powers (1926–2013); the Free Energy Principle is Karl Friston's. Diener's work is the bridge between them and the critique of how today's models are trained.

ORCID 0009-0006-6103-8514 Diener's PCT × RLHF paper — Zenodo, 2026 Full profile

Why ChatGPT, Tesla FSD, and every serious robotics lab secretly need PCT

RL is brilliant in games. The real world is not a game.

Inverted pendulums, robot arms, and why simpler math wins

Friston's Active Inference — PCT with Bayesian mathematics

Alignment, internal goals, and why reward functions break at scale