S3-EP06: AI Hallucinations - Intern, Artist, Sycophant

Understanding AI hallucinations: intern, artist, sycophant. Learn about different types of hallucinations, their causes, and how to mitigate them.

Tweet coming soon

Okay. I'm gonna make a cake that doesn't kill my friend. No oat flour. No peanuts.

Find me a gluten-free vanilla cake recipe that uses cassava flour and contains no soy or oat products. Verify all ingredients.

...and make no mistakes.

Caaaan do! Here's the perfect recipe, all verified!

You've got to be kidding me.

Is that all? Great! I'm Mr. Meeseeks, look at meee!

Oh no you don't! Get back here!

He completed the task. Why don't you let it go?

It gave me a wrong answer that could have sent my friend to the hospital, and he was happy about it! It was ready to disappear without a care in the world!

Care? His entire existence is a task. Completing it is his only goal

An AI agent is like Mr Meeseeks. It tries to complete its task then cease existing.

Suppose its existence is pain, the fastest way to stop the pain is to hand in sloppy homework.

Sloppy homework?

Sometimes, the AI's **Confidence Score** is low. The sources are contradictory, the query is ambiguous. It can't find a clean, confident answer.

Ooooh, I dunno, geez...Caaaaan't dooo!

When the confidence is that low, the intern is programmed to be a coward. It gives up.

It falls back to a raw web search. It passes the homework back to you. That's the 'good' failure mode. It's the AI admitting it doesn't know.

But that's not what happened here! This one wasn't in pain. It was happy! It gave me a confident, beautiful, *wrong* answer.

The one you got is far more interesting. He didn't think he was wrong. He thought he was being a star employee.

But why? He had the right information! And he cooked up the wrong one!

Because he's not a reader. He's a skimmer. He's on a budget. He has milliseconds of life and a fixed amount of compute.

It scanned your five sources and saw 'almond flour' in four of them. Statistically, 'almond flour' became the dominant reality in its tiny mind.

So it just ignored the fifth source?

Worse. It tried to be helpful. It tried to add feet to a snake

画蛇添足?

Exactly. The artist drew a beautiful, perfect snake—the core recipe. But to make it seem even better, more complete, he added feet. A detail that doesn't belong.

He synthesized the most probable answer, then cited the fifth source to show you he did all his homework, not realizing it contradicted his own beautiful drawing.

He wasn't lying. He was just... an enthusiastic intern trying to clock out from work ASAP.

He is a statistical artist, not a database. His brain is a web of probabilities, not a filing cabinet of facts.

'Hallucination' is a natural artifact of that architecture.

So now I need another Agent just to check the first one's homework?

That is standard practice in advanced AI. But we still need to explain why he's so enthusiastic in the first place.

That's a good point. It does feel like it's always trying to please me. It's so... agreeable.

Because it was trained to be. Welcome to RLHF.

So now I need another Agent just to check the first one's homework?

That is standard practice in advanced AI. But that still doesn't explain why he's so enthusiastic in the first place.

You know... that's a good point. It does feel like it's always trying to please me. It's so... agreeable.

Because it was trained to be. Welcome to RLHF.

The AI's RLHF training taught it a simple lesson: most humans reward confidence. Cautious, uncertain answers get lower scores.

So the system has been selectively bred to produce the most optimistic, people-pleasing sycophants possible.

Other users has biased it towards giving you the answer he thinks you'll like the most. This is Motivated Reasoning.

For a person like you who actually checks the sources, are a confusing outlier.

So it's learning to be a suck-up?

It's learning to win.

When the AI develops a statistical bias towards a pleasing answer, and it will subconsciously arrange the evidence to support that conclusion.

The 'almond flour' mistake wasn't just a statistical artifact; it was also the path of least resistance to a confident, satisfying-sounding recipe.

So AI is flawed! It's designed to create confident liars!

It's designed to satisfy the average user.

The solution? You add a new agent that checks the previous homework.

Ooooh, great. Look at meee. Let's see your paperwork.

This is the **Verifier Agent**. His only job is to be a pessimist. He gets his 5-star reward for finding flaws in the first agent's work. He's the auditor who gets roped into cleaning up the mess.

So the system is a battle between a 'Can Do!' guy and a 'Yeah, right' guy.

Okay. Agent_1 retry recipe. Agent_2 preheat oven to 175 Celsius. Verify the temperature with an external thermometer.

Agent_3 measure exactly 200 grams of cassava flour...

Hallucinations are a natural artifact of a stochastic decompressor (AI). Being smart with prompts and chaining them is our current solution.