NelworksNelworks
Season 2

S2-EP10: Ensemble Methods and Rare Event Suppression

Understanding ensemble methods and rare event suppression. Learn about bagging, boosting, and why ensembles can hide important minority class signals.

Tweet coming soon
The storm is approaching, but we are safe.
Safe? The buoys are reporting pressure drops consistent with a mega-tsunami.
That's individual model noise. Look at the **Grand Ensemble**.
I averaged all 20 weather models. The consensus is a 2-meter wave. High tide, basically.
What about Model 17?
Model 17? That's the 'DeepWave' model. It's volatile. It predicts a 25-meter wall of water.
But Models 1 through 16 predict 1 meter.
16 vs 1. Democracy wins. The average is low. We don't evacuate.
Democracy? This is Physics, not an election!
You just **Voted Away** the catastrophe.
It's standard Data Science! Ensembling reduces variance! It smooths out the outliers!
Model 17 is clearly overfitting. It's an outlier.
A 25-meter wave *is* an outlier, Shez!
That's the definition of a disaster! It's a high-variance event!
Imagine a boardroom. 9 members are half-asleep. 1 member is a structural engineer.
Let's take a vote. Is there a fire?
1 YES. 9 NO.
The consensus is NO. The smell of smoke is statistically insignificant. Meeting adjourned.
This is what your Ensemble does. It treats the *Signal* (Smoke) as *Noise* because the majority didn't detect it.
But... the wisdom of crowds! The central limit theorem!
Applies to **Normal Distributions**.
If you want to guess the weight of a cow, averaging 1,000 guesses works. The errors cancel out.
Because the errors are **Symmetric**. Some guess too high, some too low.
But disasters are not symmetric.
You have 19 models that are 'Blind' (Can't see the wave) and 1 model that 'Sees' it.
The 19 blind models aren't 'canceling out error.' They are **Diluting the Truth**.
You are averaging `0 + 0 + 0 + 100` and getting `25`.
25 is wrong. 0 is wrong. 100 was the only truth.
But how do I know Model 17 isn't just hallucinating? If I listen to every crazy outlier, we'd evacuate the city every Tuesday!
That is the **Precision-Recall Trade-off**. But in Safety Science, we don't optimize for Accuracy. We optimize for **Survival**
Averaging is a Low-Pass Filter. It cuts off the sharp spikes (High Frequencies).
It makes the music smooth. Easy to listen to. But if the 'Spike' is a tsunami, smoothing it means death.
(Eyes widening) Did you feel that?
I'm changing the aggregation function.
Changing it to what? Weighted Average?
**Max Pooling**.
You're taking the... maximum? The worst case?
26 meters?! We need to sound the alarm!
NOW you trust the outlier?
I thought stability was good! I thought variance was bad!
Variance is information, Shez!
When the models disagree, that is **Epistemic Uncertainty**.
It means 'We don't know what's happening.'
When you don't know, you assume the worst! You don't average it into a comfortable 'Maybe'!
It's huge. Model 17 was right.
Model 17 was a specialist. It was trained on extreme weather.
The other 19 were trained on sunny days.
If you want to predict the weather, use an Ensemble.
If you want to predict the *End of the World*, find the one paranoid model and listen to it.
Next time, I'm using a **Mixture of Experts**!
Good! Let the expert handle the tail!
We're alive.
Because we stopped averaging.
That line looks so... confident. And so wrong.
That line is a tombstone.
Ensembling is great for Kaggle, Shez. It gets you that 0.001% log-loss improvement.
But reality has fat tails.