NelworksNelworks
Season 1

EP02 - PCA (Principal Component Analysis) on Mixed-Unit Data

Understanding PCA limitations with mixed-unit data. Learn about standardization, scaling, and why PCA fails when features have different units or scales.

Tweet coming soon
HAHAHA! Telemetry is fused! I've compressed the sensor data from 500 inputs down to 3 Principal Components!
(Tightening a bolt on the mech's knee) Yeah? Tell me more.
The mech has lasers, hydraulics, temperature sensors, and radiation detectors. That's too much data for the CPU. So I ran **PCA** to find the "Essence of Motion."
Look at this beautiful cluster! PC1 explains 95% of the variance. The robot knows exactly what it's doing.
PC1 explains 95% of the variance? Now that's sus...
Shez, what are the units of your sensors?
Uh... Velocity is in $m/s$... Pressure is in Pascals... Temperature in Kelvin...
Pressure in Pascals. Velocity in meters per second.
Atmospheric pressure is around 100,000 Pascals.
Walking speed is around 1.5 $m/s$.
PCA looks for **Variance**.
The variance of the Pressure sensor is in the billions. The variance of the Velocity sensor is like... 0.5.
To PCA, your Velocity sensor looks like a flat line. It looks like "Noise."
Your "Principal Component 1" isn't the "Essence of Motion." It's probably just **Atmospheric Pressure**.
Oh. It... it basically ignored the movement.
So my robot is just standing there thinking about how heavy the air is?
Easy fix! I'll change the units!
I'll measure Pressure in "MegaPascals" (0.1 MPa) and Velocity in "Millimeters per second" (1500 mm/s).
Stop! Stop! You are playing **God with Geometry**.
When you change units, you stretch and squash the data cloud.
If you use Millimeters, the cloud stretches 1000x in the X-axis. PCA points the arrow that way.
If you use Kilometers, the cloud squashes. PCA points the arrow somewhere else.
A physical law ($F=ma$) doesn't change if you measure in feet or meters.
But your PCA model changes completely. That means your model has no **Physical Integrity**.
It gets worse. What *is* a Principal Component?
Umm, it's a linear combination of features...
Let's say $w_1 = 0.5$ and $w_2 = 0.5$.
What is $0.5 \text{Kelvin} + 0.5 \text{Volts}$?
It's... Volvin? Kelvolts?
It's **Nonsense**. Physics forbids adding different units. You can multiply them ($Volts \times Amps = Watts$), but you cannot add them.
Your robot is making decisions based on a variable that physically does not exist. You are feeding it **Unit Salad**.
So I can't use PCA? I have to feed 500 raw inputs into the neural net? The latency will kill us!
You can use PCA. But you have to strip the units first.
Strip them?
You normalize every feature.
Subtract the mean. Divide by the standard deviation.
Now, Pressure has variance = 1. Velocity has variance = 1.
They are **Dimensionless**.
`scaler.fit_transform(X)`.
Okay, now they are equal. PCA will look at *correlation*, not magnitude. Now PC1 is valid!
Careful.
You solved the **Math** problem. You haven't solved the **Noise** problem.
This sensor is broken. It just outputs random static between 0.0001 and 0.0002.
It's tiny. Who cares?
You just ran `StandardScaler`.
You took that tiny 0.0001 static... and stretched it to have Variance = 1.
You just made the broken sensor as loud as the main engine.
PC1 is... now mostly the vibration sensor...
I amplified the noise. I turned a static into a rock concert...
This is the trade-off.
**Raw PCA:** Biased by Unit Scale.
**Scaled PCA:** Biased by Noise Amplification.
Ahhh, there is no winning!
There is winning. It's called **Engineering Judgment**.
Don't just blindly scale everything.
Remove the broken sensors first. Use Log-transforms on the Pressure.
Or... don't use PCA. Use something that respects local geometry, like UMAP, or a Variational Autoencoder (VAE).
But if you must use Linear Algebra...
1. Filter noise.
2. Log-transform large magnitudes.
3. StandardScaler.
4. PCA.
It actually makes sense. The clusters map to actions!
Now the latent space represents **Behavior**, not just **Unit Bias**.
It works! It's walking!
And it's not trying to add Volts to Meters anymore.
You know, "Kelvolts" kind of sounded cool. Like a superhero name.
"Kelvolt" sounds like a battery that overheats and explodes.