Season 1
EP01 - The Flattened Soul (PCA & Manifolds)
Understanding PCA limitations with non-linear data. Learn about linear projection, manifold hypothesis, variance vs significance, curse of dimensionality, and why UMAP/t-SNE are better for marketing data.
Tweet coming soon
I've done it, Kurumi! I've segmented the entire customer base!
(Without looking up) Did you? Or did you just run a library function you don't understand?
Look at this visualization! It's 50 dimensions of user data—clicks, purchases, mouse movement. Compressed into a beautiful 3D cube!
See this red cluster? Those are our "High Value Whales."
50 dimensions... compressed to 3. Let me guess. You used PCA?
"Principal Component Analysis." The classic! It explains 85% of the variance!
You think "Variance" means "Information." That is your first sin.
Shez, what does a user look like to a computer?
A row in a database?
A point in high-dimensional space. `[Age, Income, Clicks, Region...]`.
When you have 50 columns, that point exists in a 50-dimensional hypercube.
Right! And that's too hard to see. So I used PCA to rotate it and take a picture of the best angle!
You "took a picture" of a **Shadow** you created.
PCA is a linear projection. It shines a light through the data and captures the shadow on the wall.
If the data is simple, the shadow tells the truth. A straight stick casts a straight shadow.
But if the data is complex... if it curves and loops... the shadow destroys the depth.
Your "Whale Cluster" in red...
Yeah?
That cluster contains high-spending VIPs... and probably bots that click everything at random.
What?! No! They are right next to each other in the plot!
They are next to each other in the *shadow*. But in 50D space? They might be lightyears apart.
PCA works by calculating the Eigenvectors of the Covariance Matrix.
(Eyes swirling) Eigen... what?
Think of it as a press. It tries to find the direction where the data is widest (most variance) and preserves that axis. Then it finds the next widest, orthogonal to the first.
If your data is linear, like physics measurements or simple correlations, it works. The beams stack nicely.
But my data is people! People aren't steel beams!
Exactly. Marketing data is a **Manifold**. It's curved. It's a Swiss Roll.
When PCA flattens a Swiss Roll, it smashes the top layer onto the bottom layer.
Point A (The beginning of the roll) and Point B (The end of the roll) are far apart if you walk along the dough.
Yeah, you have to unroll it to get there.
But PCA doesn't unroll. It squashes. Suddenly, Point A and Point B are neighbors.
You just grouped "First time budget shoppers" with "Luxury frequent buyers" because the manifold curved back on itself.
So my "Red Cluster"...
Is a garbage dump of non-linear interactions smashed together by a mathematical trash compactor.
But... but the variance! It said 85% variance explained!
(Sighs) Time for lesson two. **Variance is not Significance.**
Imagine you have a feature with massive variance, like "Timestamp of visit." It varies wildly.
Okay.
And a feature with low variance, like "Is this a Fraud Attempt?" (0 or 1).
PCA chases the variance. If you did not **StandardScale** the data, it will spend all its energy modeling the Timestamp and ignore the Fraud signal because the variance is small.
I'm not stupid. Of course I standard scaled it!
And it still doesn't work.
Marketing data is often "Sparse." You have One-Hot encoded categories. `State_NY`, `State_CA`, `State_TX`.
Yeah, hundreds of columns of 0s and 1s.
Euclidean distance (which PCA relies on) breaks down in high-dimensional sparse space. It's called the **Curse of Dimensionality**.
Standard scaling works if you have multiple continuous variable with high variance. But it still does the problem in binary data in high dimensions.
So what do I do? How do I unroll the cake?
We need Non-Linear Dimensionality Reduction.
We need to learn the **Topology**, not just the variance.
We use techniques like **t-SNE** or **UMAP**.
UMAP (Uniform Manifold Approximation and Projection). It doesn't look for straight lines.
What does it do?
It looks at every point and asks: "Who are your 5 closest friends?" It builds a graph of neighborhoods.
Then, it tries to arrange the points in 2D so that those friendships are preserved. It cares about **Local Structure**, not global axes.
So it unrolls the Swiss Roll?
Yes. It preserves the distance along the curve. The chocolate stays with the chocolate. The vanilla stays with the vanilla.
Okay... processing...
My Red Cluster... it exploded!
It didn't explode. It was never whole.
Island 1: The actual Whales.
Island 2: Window shoppers who view a lot but buy nothing.
Island 3: Bots crawling the site.
Oh my god. They all have the same IP subnet. They aren't customers at all.
And PCA mashed them together with your VIPs because they both had "High Activity Variance."
I almost sent a 20% discount coupon to a bot farm.
You would have bankrupted the marketing budget.
PCA is fine for linear problems. Genomics. Signal processing. Physical sensors.
But for human behavior? Humans aren't linear. We are messy, curved manifolds.
This looks less like a cube and more like... a splattered jellyfish.
That is the shape of reality.