Season 1
EP05 - THE BLIND SPIDER (DIJKSTRA VS A*)
Understanding pathfinding algorithms: Dijkstra vs A*. Learn about graph traversal, heuristics, optimality, and when to use each algorithm.
Tweet coming soon
It's hanging! The routing engine is hanging!
...the drivers are going in circles. Another driver just delivered pizza to himself.
I don't understand! I used the standard algorithm! `Dijkstra.py`! It's mathematically proven to find the shortest path!
It's proven to work. It's not proven to be *fast* when you run it on a graph with 2 million intersections.
Show me how you select the next node to visit.
See? I'm using a Min-Heap. Complexity is $O(\log V)$. It's efficient!
You are prioritizing by `distance_from_start`.
Yeah. $g(n)$. The cost so far. That's how Dijkstra works.
Dijkstra is like water. It flows equally in every direction. It doesn't care where the Pizza is. It just wants to find the closest dry spot to wet.
See? You explored the North, South, and West suburbs... just to go East.
But... it found the path!
It found the path after checking 500,000 intersections. That's why your server is melting. You are exploring the entire city for every single delivery.
But I can't just guess! I need the *guaranteed* shortest path. If I skip nodes, I might miss a shortcut!
Do you think there is a magical wormhole in the opposite direction of the destination?
...No?
Then stop looking there. You need a **Heuristic**.
This is **A* (A-Star)**.
The grades I'll be getting?!
$g(n)$ is the cost you've already paid (distance traveled).
$h(n)$ is the estimated cost to go (distance to target).
Usually, we use **Euclidean Distance** (straight line) or **Manhattan Distance** (grid) as the heuristic.
The algorithm now prioritizes nodes that effectively reduce the distance to the goal. It ignores the nodes behind you.
So... `priority = cost_so_far + euclidean_dist(current, target)`.
Correct. You are biasing the queue. You are giving the algorithm a sense of "smell."
It's much faster! But wait... is it accurate? What if the straight line goes through a lake?
A* is smart. It tries the straight line first. It hits the "wall" (high cost/blocked).
Then, as the $g(n)$ increases for those blocked nodes, the priority shifts to the nodes going *around* the lake.
As long as your heuristic is **Admissible**, then it is mathematically guaranteed to find the shortest path.
It's working! CPU usage dropped to 10%!
A* is the savior!
Don't celebrate yet. Try routing from New York to Los Angeles.
W-Why? It's lagging again!
A* is better than Dijkstra. But for a continental scale? It's still too many nodes.
When you drive to LA, do you check every driveway in Ohio?
No. I get on the Interstate.
Your graph treats every residential street as equal to a highway. A* is still checking too many local roads.
Contraction... what?
It's a pre-processing step. You take the graph and "contract" the unimportant nodes.
You create "shortcuts." If you enter a highway segment, you skip all the exits until the one you need.
We bake the long paths into single edges. At runtime, the algorithm only looks at the "Highway Graph."
That sounds hard to implement from scratch.
Fool, that's why you don't write your own routing engine in Python.
Use a library like OSRM (Open Source Routing Machine) or GraphHopper. They use Contraction Hierarchies by default.
Stop reinventing the wheel with a square `while` loop.
Goodbye, my beautiful code...
It wasn't beautiful. It was $O(V^2)$ garbage.
So... Dijkstra is useless?
No. Dijkstra is for **dense, unguided exploration**.
Like network packet routing (OSPF). The packet doesn't know where the destination is geographically. It just flows.
But for physical space? Where coordinates exist? You always use the coordinates and a Heuristic.
A* got my coffee here while it's still hot.
(Smiling) And Dijkstra would have delivered it cold?
Dijkstra would have delivered it to the building next door, realized it was a wall, backtracked, checked the store room, then delivered it cold.
EP04 - K-means - Clustering That Is Outlier Sensitive
Understanding K-means clustering and outlier sensitivity. Learn about centroid-based clustering, outlier impact, and alternative clustering methods.
EP06 - Naive Bayes, Independent Assumptions on Correlated Features
Understanding Naive Bayes and independent assumptions on correlated features. Learn about conditional independence, feature correlation, and when Naive Bayes fails.