EP09 - Label Noise, Confident Learning, and Ground Truth Bias
Understand confident learning and how systematic label noise corrupts ground truth in training datasets. Learn why cleaning mislabeled data often matters more than architecture tuning for improving real-world model performance.
The accuracy... it's too high.
Last time I got 99%, I leaked the future. The time before that, I overfitted to a single user.
I'm not falling for it this time. I'm not a rookie anymore.
I'm going to check the 'False Negatives.' The ones the model got 'wrong.'
Show me where you failed, you silicon brat.
...What?
THE LABELS ARE WRONG.
I paid the annotators $50,000! I outsourced this to 'Premium Labeling Corp'!
And they just clicked random buttons?!
It's all garbage! The 'Ground Truth' is a lie!
If the labels are wrong, my loss function is meaningless! I'm optimizing for hallucinations!
My data is poisoned. My career is over. I can't train a model on lies.
You're screaming at spreadsheets again. Is this a new debugging technique?
Kurumi, it's over. The dataset is compromised.
Look at this! It's a Banana labeled as an Apple! 10% of the data is mislabeled!
I have to throw it all away.
You have 100,000 images. You want to throw them away because humans are incompetent?
Garbage In, Garbage Out! That's the first rule of engineering!
That rule is for junior engineers.
Senior engineers know that **Garbage is Information**.
Your model predicted 'Banana' with 98% confidence. The label said 'Apple.'
What does that tell you about your model?
That it's wrong! According to the loss function, it made a huge error!
No. It tells you your model is **Smarter than the Annotator**.
Smarter?
Your model knows math. The teacher is drunk.
If the student is **Confident** (p=0.99) and the teacher disagrees... trust the student.
But... the model learned from the teacher! How can it know better?
Because the model saw *thousands* of examples.
The 'Apple' mistake was random (or lazy).
But the visual features of a Banana are consistent across the other 9,000 bananas.
The model learned the **Pattern**, not the **Individual Labels**.
We assume the noise is mostly random.
Deep Learning models fit *easy* patterns (clean data) first. They memorize the *hard* patterns (noisy labels) last.
So... the fact that my model 'failed' on these specific images...
Is actually the model flagging the errors for you.
It's screaming: 'Hey boss, this label doesn't look like the others!'
We are going to use the **Off-Diagonals**.
The what?
We estimate, for example, that the chance an Apple gets mislabeled as a Banana is about 2%. P(y=Banana | y^*=Apple) = 0.02.
Most of the time, labels are correct, so the diagonal cells are close to 1. But some pairs—like Cat/Dog—can have slightly higher confusion, like 1.5%.
If this confusion probability is high—say, above 10%—your annotators are probably systematically confused.
But if it's low—like 1% or 2%—and the model is very confident, then it’s likely a specific, fixable data error.
"What if your dataset label is lying to you… and your AI believes it?"
Katsura Kurumi (AI/ML)
S2-EP09 – Label Noise & Confident Learning
#KatsuraKurumi#AIart#ML#DataScience