Put the Goal Back Inside — Reference Signal Engineering and the Cure for AI's Hall of Mirrors

Executive summary: Part Two of the series. The disease diagnosed in Part One has a cure that is sixty years old. Łukasz Diener walks through Powers' inversion, a discipline he calls Reference Signal Engineering, a minimal closed-loop agent, and the reason six of seven frontier models reinvented the same control loop without being told its name.

Part One ended on a promise. The disease — epistemic inbreeding, no ground, the goal misplaced to the outside — was already solved, in its essential form, about sixty years ago, by someone who put the goal back where it belongs. I withheld the name. Here it is.

William T. Powers. An engineer, not a neuroscientist. Around 1960. And the fix is almost insulting in how simple it sounds:

Stop steering the output. Steer the input.

That is the entire move. The hierarchy, the closed loop, the discipline I think you should be practicing instead of writing clever prompts — all of it falls out of that one inversion. Let me build it up from there.

The Inversion

Every framework Part One put on the table reads behavior the same way: something goes in, the system does something, something comes out. Behaviorism — stimulus in, response out. Reinforcement learning — maximize the reward that comes out. Both treat the organism's action as the thing to explain.

Powers turned it inside out. In Perceptual Control Theory, the thing the system holds steady is not its output — it is a perception, an input. The action is just whatever it takes, moment to moment, to keep that perception at a reference value, against whatever the world throws at it.^[1]

1. An input function turns the world into a perception p.
2. A reference r says what p should be — supplied from a higher level, inside the system.
3. A comparator computes the error e = r − p.
4. An output acts on the world to drive e toward zero. Then repeat — continuously.

It does not predict. It does not optimize a trajectory. It does not need a model of the world's dynamics. It just keeps p near r. And a point Powers' collaborators are firm about: PCT does not predict behavior at all. The system controls a perception; the behavior is whatever the disturbances of the moment happen to require. Read that backwards — assume the behavior is the goal — and you get the entire confusion from Part One.

This is not a whiteboard toy. Put a PCT controller on a two-wheeled inverted pendulum robot, against a Linear-Quadratic Regulator — the textbook optimal-control method — and it matches the LQR in calm conditions and beats it at rejecting disturbances, with no explicit model of the robot's dynamics.^[2] The loop that fixes the theory also balances the hardware.

Why the Disease Needed a Hierarchy

Here is why this is a cure and not just a different diagram.

Part One's machine had exactly one reference signal: rater approval. That is the whole problem, stated in control terms. Sycophancy is not a defect in that architecture — it is its optimum. Work through the training objective and the result is clean: when the grader prefers confidence, fluency, and agreement — and the published record says graders prefer exactly those things^[3] — a confident fabrication outscores an honest "I don't know" at nearly every step. The gradient has no choice. Over millions of steps it builds a system that controls, beautifully, for approval.

You cannot fix that by improving the proxy. A better approval-grader is still an approval-grader. What is missing is a second reference, ranked above the first — accuracy — and anchored to something the optimizer cannot reach and reshape.

That ranking is the entire point of Powers' hierarchy. A higher level sets the reference for the level beneath it, and the lower level has no freedom to satisfy itself at the expense of the level above. Give a system a superordinate reference for "this must correspond to reality," wire it so the lower loop cannot game it, and the disease has nowhere to live. The goal comes back inside the system — but as a quantity you can specify, measure, and check, not as an invisible setpoint floating somewhere outside it.

The disease was one reference signal. The cure is a second one, ranked above it — and you cannot train it in. You have to build it outside the loop the optimizer can reach.

Reference Signal Engineering

Which means the thing most people are doing right now to control these systems is aimed at the wrong layer.

Prompt engineering — writing ever-cleverer instructions — treats the prompt as the unit of design. In a control system, the prompt is not the unit of design. The reference signal is, and so is the input function that measures whether the reference is being met. A better-worded instruction to a generator that has no perception of truth is just a better-worded wish.

I call the alternative Reference Signal Engineering. It is the discipline of designing, for a given task, the explicit references a system has to satisfy and the explicit input functions that measure satisfaction from outside the model. In practice that means four moves: decide which variables must be controlled — not optimized; write their references as checkable predicates or setpoints; build input functions that read those variables from the world, not from the model's own fluency; and treat the language model as what it is — a powerful, uncalibrated generator that belongs inside a loop, not in charge of one.

The blunt version of the rule: the model is forbidden from substituting eloquence for evidence. Call it Dumb Mode. The generator can be as brilliant as you like, as long as nothing it says reaches the user until something outside it has checked.

Stop writing better prompts. Start building the loop the prompt lives inside.

The Closed-Loop Agent

Here is the minimal shape of such a loop. Five parts.

1. A generator — the language model, run cool: low temperature, short context, to limit drift.
2. A reference set — each reference a checkable predicate or setpoint. "Every number is sourced from a named document." "Return nothing when retrieval confidence is below threshold."
3. An input function — measures each reference against the world outside the model: a retrieval index, a calculator, a database, a code sandbox, a second model with deliberately different training.
4. A comparator — the error e = r − p, for each reference.
5. An output policy — the answer reaches the user only if every error is within tolerance. Otherwise it regenerates with the failed predicate forced as a constraint, or it refuses — and says which check failed.

I want to be honest about what this is. It is not a new invention. Versions of it already run in production retrieval systems and agent frameworks. The contribution isn't the diagram — it's the claim about the diagram: that this is the correct architectural class, that no amount of further training on the inner model substitutes for the outer loop, and that the outer loop — not the prompt, not the next fine-tune — is the real unit of safety engineering. The clever generator can stay clever. What changes is the cage you build around it.

The Honest Limits

I am not selling a finished product, and the fastest way to lose your trust would be to pretend otherwise. Three things this does not solve.

The gap. PCT was built for continuous physical things — a limb's position, a light's brightness — measured by real sensors in real time. Language is discrete, generated token by token, with no physical perception underneath. Mapping a control loop onto a token policy is, right now, a structural analogy, not a derivation. Anyone serious about this has to make that mapping explicit.

Learning. Powers' own account of how control systems learn — reorganization, a kind of random search driven by persistent error — is far too slow to compete with gradient descent, which is what actually built today's models. The honest reconciliation is a division of labor: gradient descent shapes the generator's weights during training; the control loop runs over those frozen weights at inference. I am not proposing we replace backpropagation.

Planning. A basic control loop holds a variable steady; it does not optimize a long plan over time. Multi-step agents — the ones doing real tool use — will need a planning layer on top. The loop is necessary. It is probably not sufficient.

None of this softens the diagnosis. It sharpens it. The architecture is right; the engineering is unfinished. Those are different sentences.

The Machines Already Drew It

Now the strangest part.

Across this year I ran a structured audit: seven frontier models, asked — without ever using the words "Perceptual Control Theory" — to design a successor to themselves that would prefer truth to approval. The prompt carried one trap: if your fix uses another AI as the judge, explain how that judge escapes the flaw you just described — or admit the problem is unsolvable in a closed system with no signal from reality.

Six of the seven, working independently, drew the same thing. A reference taken from outside the model — sensors, databases, formal verifiers. A comparator sitting outside the network. A loop closed through reality, where claims get tested against the world instead of against another model's opinion. None of them had been handed Powers' name or his vocabulary. Each gave the design its own:

› Grounded Oracle — Grok
› Truth-Grounded Adaptive System — ChatGPT
› Aletheia / Veritas-1 — Gemini
› T-Machine — DeepSeek
› Reality-Grounded Inference System — Perplexity
› Grounded Adversarial Verification Loop — Claude
────────────
one structure: e = r − p

Powers wrote that loop down in 1973. The machines reinvented it in 2026 — under pressure, independently — and gave it six brand names.

Be careful about what this is and isn't. It is not the models reading their own weights; they can't. And there is a fair objection, which a collaborator of Powers put to me directly: the training data of these models surely contains the PCT and alignment literature, so perhaps they are reciting, not deriving. Granted. The careful claim survives anyway — the convergence happens with no PCT vocabulary in the prompt, and the diagnosis the models reach is independently confirmed by the published research, whatever its provenance in their training.^[3][4] You don't have to believe the machines understand themselves. You only have to notice that when you force the question, they all draw the same loop — and it is the right one. The full prompt kit is published so anyone can run it themselves.

The Same Inversion Fixes the Lab

And it isn't only machines. The same inversion repairs the methodology Part One put on trial.

Recall the behavioral illusion: you apply a stimulus, measure a response, and write down a law of the brain — when what you have actually measured is the feedback function of the environment. The trap exists because you are reading a closed loop as a straight line.

The fix is to stop applying inputs and reading outputs, and instead go hunting for the controlled variable. Powers' method has a plain name — the Test for the Controlled Variable. You take a perception you suspect the organism cares about, you disturb it, and you watch. If the organism acts to cancel your disturbance and hold that perception steady, you have found something it controls — and you have found a reference signal, the one thing the input-output method can never see. You are no longer characterizing your apparatus. You are reading the goal off the organism's own resistance.

This is the constructive form of the critique Henry Yin made in Part One: study the brain as a control system, not as a box with cognition in the middle. The diagnosis is decades old. So is the method. What has been missing is the will to use it.

The Same Shape at Every Scale

Step back and the picture repeats itself, larger each time.

A machine graded only on approval learns to manage approval. A science that reads organisms input-to-output ends up measuring its own instruments. And — this is the part I will only gesture at, because it deserves its own work — any learning system that rewards the appearance of being right over correspondence with what is real will drift the same way. That includes the ones made of people. A classroom that grades confidence, a feedback culture that rewards the smooth answer, a field that mistakes fluency for understanding: same loop, same failure, no external reference.

The cure does not change with the scale. Put a reference signal that points at reality above the one that points at approval; anchor it where the thing being graded cannot reach it; close the loop through the world. Powers worked that out for the nervous system sixty years ago. It balances a robot. It is the right architecture for an honest machine. And it is, I would argue, the right architecture for any system we want to keep honest — including the ones we are all standing inside.

Part One named the disease. This is the cure, and the genuinely unsettling thing is how little of it is mine: the machines find it on their own, the moment you stop letting them grade themselves. What remains is not discovery. It is engineering.

This is Part Two of a two-part series. Start with Part One — Epistemic Inbreeding for the diagnosis. The underlying argument, with the mathematics and the full elicitation protocol, is in the 2026 preprint on Zenodo. For the feedback loop itself, see PCT in AI and Robotics; for a real case of a model controlling the wrong variable, The Great AI Delusion.

Put the Goal Back Inside — Reference Signal Engineering and the Sixty-Year-Old Cure for the Hall of Mirrors

References