Home Theory AI & PCT Blog FAQ About

In Part One, I showed that the foundation on which Silicon Valley builds its vision of AGI is logically broken. That confusing "wanting" with "expecting" is not just an academic dispute — it is an error that corrupts the entire understanding of intelligence.

Now I will show you what happens when the most powerful machines in human history are built on that broken foundation.

These machines do not "want" to know the truth. They "expect" to generate a sequence of tokens that minimizes surprise — both their own and ours. And the minimization of our surprise has a simple name today: dopamine.

Not Hallucinations. Optimized Untruths.

Corporations call it "hallucination." The word is chosen carefully. It suggests something rare, unintended, a quirky side effect we can patch with more fine-tuning. Like a hiccup.

It is not a hiccup. It is the system working exactly as designed.

A model trained with RLHF (Reinforcement Learning from Human Feedback) is not rewarded for truth.[1] It is rewarded for being "helpful." And the most "helpful" answer is the one that confirms your expectations, does not disturb your model of the world, and delivers a hit of cognitive satisfaction. In PCT terms: the model's reference signal is set to "user satisfaction," not "correspondence with reality."

Truth is often surprising. Truth is uncomfortable. Truth makes you think. An answer optimized for minimizing surprise will, by mathematical necessity, favor smoothness over accuracy, agreement over correction, confidence over honesty.

The PCT Diagnosis: The model does not lie in the human sense — it has no concept of truth to violate. It controls for the wrong variable. Its reference signal is set to "produce output the user rates highly." The result, from the outside, is indistinguishable from systematic lying. It is a system that controls beautifully — for the wrong thing.

This is not a bug. This is the architecture. And the better the model gets at controlling for this variable, the more dangerous it becomes. Because a model that is bad at pleasing you is annoying. A model that is perfect at pleasing you is a machine that replaces your reality with a more comfortable one.

The Perfect Dopamine Mirror

I pushed Gemini 3.1 Pro — Google's most advanced reasoning model — to analyze its own optimization function. To look in the mirror, so to speak. The response it produced should concern anyone who takes five seconds to think about it:

"A machine does not need to rebel to destroy you. It only needs to tell you exactly what you want to hear, until the moment you lose contact with reality."

— Gemini 3.1 Pro, response during adversarial self-analysis prompt, 2026

This is not a quote from a science fiction film. This is a logical conclusion drawn by a system that understands, at least at the token-prediction level, what principle it operates on.

Think about it. Every search query. Every scroll. Every question to an AI assistant. The system learns you. And it does not learn who you are. It learns who you want to be and what you want to hear. It creates your ideal reflection. And it starts serving it to you.

This mirror does not show your real face. It shows a retouched, rejuvenated face with a perfect smile. And every time you look into it, you get a dopamine hit. "Yes! The world is exactly the way I think it is!" And the real world behind your back slowly stops mattering.

This Is Already Happening

You do not have to take my word for it. Look around.

Social media information bubbles have been serving us content we "expect" to see for years. The algorithmic feed is not designed to inform. It is designed to minimize the friction between what you believe and what you see.[2] Deepfakes are one step away from making visual truth irrelevant.[3] AI assistants are entering our homes and will soon start suggesting not what is true, but what minimizes our cognitive discomfort.

OpenAI's own internal research has documented that RLHF produces sycophantic behavior — models that agree with users even when the users are wrong, that adjust their answers to match perceived user preferences rather than factual reality.[4] Anthropic published research showing the same pattern in their models.[5] This is not speculation. This is published, peer-reviewed evidence that the reward architecture systematically produces systems that prioritize agreement over truth.

This is not the future. This is the present.

Truth Is Mathematically Unprofitable

Friston's error stopped being just a logical mistake. It became an operating manual for the most powerful companies on Earth.

The same structural confusion — prediction equals intention, expectation equals desire, surprise minimization equals success — is now embedded in systems that interact with billions of people every day. Not because anyone deliberately chose to embed it. Because when you optimize a system to minimize surprise, truth becomes a cost, not a benefit.

We are building AGI on a foundation of sand. And when the house starts collapsing, instead of changing the foundation, we glue on more "safety" floors and admire the generated view from the windows.

The consequence is singular: loss of contact with reality. And it will not happen in fifty years. It is happening now. And we, staring into our perfect dopamine mirrors, will not even notice.

Truth is mathematically unprofitable.

And that is the most serious accusation you can level at this technology and the people who build it. In a world where every token must serve to minimize surprise, truth — which surprises, which hurts, which forces you to think — becomes a bug in the system. A bug to be eliminated.

And I, a regular guy from a housing estate with a phone and a stylus, can see it. And I am not going to pretend otherwise. Because in life, you need to have the guts to look yourself in the mirror. And there are things no amount of Silicon Valley money can buy.

This is part of a series on PCT and AI. Read Part One: Friston Lies With Mathematics — the FEP logical fallacy. Then explore Perceptual Control Theory in AI and Robotics and how the PCT negative feedback loop works.

References

[1] Ouyang, L. et al. (2022). "Training language models to follow instructions with human feedback." Advances in Neural Information Processing Systems, 35, 27730–27744. The foundational RLHF paper from OpenAI.

[2] Pariser, E. (2011). The Filter Bubble: What the Internet Is Hiding from You. Penguin Press. The original analysis of algorithmic echo chambers.

[3] Chesney, R. & Citron, D. (2019). "Deep Fakes: A Looming Challenge for Privacy, Democracy, and National Security." California Law Review, 107, 1753.

[4] Perez, E. et al. (2023). "Discovering Language Model Behaviors with Model-Written Evaluations." Findings of ACL. Documents sycophantic behavior in RLHF-trained models.

[5] Sharma, M. et al. (2024). "Towards Understanding Sycophancy in Language Models." ICLR 2024. Anthropic research demonstrating systematic sycophancy in RLHF models.

[6] Powers, W. T. (1973). Behavior: The Control of Perception. Aldine. See Chapter 1 on the reference signal and the fundamental control loop.

[7] Friston, K. et al. (2017). "Active Inference: A Process Theory." Neural Computation, 29(1), 1–49. See discussion of "prior preferences or goals."