S2-EP10: Ensemble Methods and Rare Event Suppression

Understanding ensemble methods and rare event suppression. Learn about bagging, boosting, and why ensembles can hide important minority class signals.

Katsura Kurumi - AI

@katsurakurumiAI

"The average opinion is usually least wrong, until it ignores the rare events..." Katsura Kurumi (AI/ML) S2-EP10 – Ensemble Methods & Rare Event Suppression #KatsuraKurumi #AIart #ML #DataScience

The storm is approaching, but we are safe.

Safe? The buoys are reporting pressure drops consistent with a mega-tsunami.

That's individual model noise. Look at the **Grand Ensemble**.

I averaged all 20 weather models. The consensus is a 2-meter wave. High tide, basically.

What about Model 17?

Model 17? That's the 'DeepWave' model. It's volatile. It predicts a 25-meter wall of water.

But Models 1 through 16 predict 1 meter.

16 vs 1. Democracy wins. The average is low. We don't evacuate.

Democracy? This is Physics, not an election!

You just **Voted Away** the catastrophe.

It's standard Data Science! Ensembling reduces variance! It smooths out the outliers!

Model 17 is clearly overfitting. It's an outlier.

A 25-meter wave *is* an outlier, Shez!

That's the definition of a disaster! It's a high-variance event!

Imagine a boardroom. 9 members are half-asleep. 1 member is a structural engineer.

Let's take a vote. Is there a fire?

1 YES. 9 NO.

The consensus is NO. The smell of smoke is statistically insignificant. Meeting adjourned.

This is what your Ensemble does. It treats the *Signal* (Smoke) as *Noise* because the majority didn't detect it.

But... the wisdom of crowds! The central limit theorem!

Applies to **Normal Distributions**.

If you want to guess the weight of a cow, averaging 1,000 guesses works. The errors cancel out.

Because the errors are **Symmetric**. Some guess too high, some too low.

But disasters are not symmetric.

You have 19 models that are 'Blind' (Can't see the wave) and 1 model that 'Sees' it.

The 19 blind models aren't 'canceling out error.' They are **Diluting the Truth**.

You are averaging `0 + 0 + 0 + 100` and getting `25`.

25 is wrong. 0 is wrong. 100 was the only truth.

But how do I know Model 17 isn't just hallucinating? If I listen to every crazy outlier, we'd evacuate the city every Tuesday!

That is the **Precision-Recall Trade-off**. But in Safety Science, we don't optimize for Accuracy. We optimize for **Survival**

Averaging is a Low-Pass Filter. It cuts off the sharp spikes (High Frequencies).

It makes the music smooth. Easy to listen to. But if the 'Spike' is a tsunami, smoothing it means death.

(Eyes widening) Did you feel that?

I'm changing the aggregation function.

Changing it to what? Weighted Average?

**Max Pooling**.

You're taking the... maximum? The worst case?

26 meters?! We need to sound the alarm!

NOW you trust the outlier?

I thought stability was good! I thought variance was bad!

Variance is information, Shez!

When the models disagree, that is **Epistemic Uncertainty**.

It means 'We don't know what's happening.'

When you don't know, you assume the worst! You don't average it into a comfortable 'Maybe'!

It's huge. Model 17 was right.

Model 17 was a specialist. It was trained on extreme weather.

The other 19 were trained on sunny days.

If you want to predict the weather, use an Ensemble.

If you want to predict the *End of the World*, find the one paranoid model and listen to it.

Next time, I'm using a **Mixture of Experts**!

Good! Let the expert handle the tail!

We're alive.

Because we stopped averaging.

That line looks so... confident. And so wrong.

That line is a tombstone.

Ensembling is great for Kaggle, Shez. It gets you that 0.001% log-loss improvement.

But reality has fat tails.