Season 2
EP09 - Pandas Finance
The origins of Pandas data library. Learn how AQR Capital built Pandas for quantitative finance, time-series analysis, data alignment, and why it's designed for trading data.
Tweet coming soon
I love Pandas. It's so cute and friendly!
`df.sort_values(by='cuteness')`. It organizes my life!
You think it's a toy.
It is a toy! It's the "Hello World" of Data Science! Everyone uses it.
Biologists use it. Marketers use it. It's the universal tool for lists!
Shez. Do you know *why* Pandas exists?
Because Python needed spreadsheets?
It exists because in 2008, a guy at a massive **Hedge Fund** needed to calculate the correlation between 5,000 stocks during a financial meltdown.
Wait... Pandas is from a Hedge Fund?
Pandas was built by Wes McKinney from AQR Capital Management.
It wasn't built for cat videos. It was built for **Quantitative Finance**.
B-But... it's open source! It's free! Wall Street is greedy. It doesn't give away free stuff!
Who knows? If everyone uses your tool, maybe you can hire people easier. But look at the syntax. Really look at it.
`pct_change()`. Why is that a built-in function?
To... see how much things changed?
In biology, things grow linearly.
In physics, we measure absolute values.
Only in **Finance** do we obsess over "Daily Returns."
That function exists solely to convert Stock Prices into **Returns** for volatility modeling.
What about `df.shift(1)`? I use that to move rows down.
That is for **Lag Generation**.
In trading, you need to compare Today's Price with Yesterday's Price.
`shift(1)` is the time machine required to build a Momentum Strategy.
They don't line up! Row 2 on Apple is 10:02. Row 2 on Adobe is 10:03.
If I subtract them, I'm subtracting different times!
That is the time series **Alignment Problem**.
Pandas aligns data by the **Label** (The Index), not the integer location.
This prevents you from calculating a "Fake Correlation" between mismatched times.
That... is actually really useful for stocks.
It is *essential*. Without it, your backtest is garbage.
And the name. **Pandas**.
Cute Chinese bears! Bamboo!
No.
**PAN**el **DA**ta **S**tructures.
Panel Data?
It's an Econometrics term.
Financial data is 3D. You track *Many Stocks* over *Many Days* with *Many Metrics*.
That is a **Panel**.
So there is no Chinese bear?
There is only a multidimensional matrix optimized for C-contiguous memory access and **bear market predictions**.
Fine. It's a finance tool.
But why is it so slow? If Quants use it, shouldn't it be instant?
Slow? What do you mean?
`for index, row in df.iterrows(): ...`
Stop! Stop! You are killing it!
Python loops are slow. They check type safety at every step.
Pandas is built on **NumPy** (C code). It wants to go fast.
When you do this, you aren't adding numbers one by one.
You are sending a command to the CPU: "Add these two arrays using SIMD instructions."
Quants need speed. Waiting in a `for` loop means your profits is going to next door's hedge fund.
So... `df.rolling(20).mean()`?
That calculates the **20-Day Moving Average**. A classic trading signal.
`df.corr()`
**Correlation Matrix**. Find out which stocks move together or in opposite directions. That's the backbone of portfolio construction.
This whole time I've been using a Hedge Fund weapon... to sort cats.
You could be making billions instead of kittens.
EP08 - Quant Science
How quants help discover drugs. Learn about QSAR, molecular fingerprints, virtual screening, cheminformatics, and how quantitative methods accelerate pharmaceutical research.
EP10 - Credit & Insurance
How credit and insurance use quant models. Learn about credit scoring, probability of default, insurance as options, telematics, and how your risk is priced using math.