S3-EP07: The Non-Deterministic WinRAR

Understanding the non-deterministic WinRAR: AI compression and data encoding. Learn about lossy compression, information theory, and how AI models compress knowledge.

Tweet coming soon

This is... magic. It's not just a search engine. It's a brain. A giant, genius brain that has read everything and understands everything.

But... where does it *keep* all that knowledge? The entire internet is thousands of petabytes. Even for Google, you can't have a hard drive that big. Where is the 'everything' stored?

It's stored in here.

In... in one terabyte? That's impossible! That's the size of Genshin Impact! You can't fit the entire internet on that!

You're right. You can't. Which should be your first clue that GPT is not a database. You're not talking to a brain. You're trying to understand a new form of matter.

How do you think a computer know what the word 'king' means?

A dictionary? It has a giant dictionary file, and 'king' points to 'a male ruler of a country'?

That's a human way of thinking. We choose a word, then map it's definition.

A computer only understands numbers. Since we can't map the description "a male ruler" into a number, we decided to stop trying to define words.

Then what did they do?

They decided that a word's meaning is not its definition. A word's meaning is **the network it keeps**.

The AI reads trillions of sentences. It sees that the word 'king' appears very often next to words like 'queen', 'crown', 'palace', and 'rules'.

It also sees that 'king' almost never appears next to words like 'photosynthesis', 'keyboard', or 'pickle'.

The AI builds a map. It places every word in this multi-dimensional space. Words that keep similar company are placed close together. This is an **Embedding**.

So... a word is just a location? A coordinate in the 'meaning galaxy'?

Exactly. A **vector**. A series of numbers.

We didn't teach it what a king is. We let it discover the 'shape' of the word 'king' by observing its relationships to everything else.

Okay, so it knows what words mean, kind of. But what about a sentence? Does it just add up the locations of all the words?

Good guess, wrong. That was an early approach. It failed because context is everything.

The word 'bank' is identical. But its meaning is completely different. How does the AI know which 'bank' it is?

It... looks at the other words?

Precisely. It needs to **pay attention**. This is the 'T' in GPT. The **Transformer**.

Think of a sentence as a conversation at a party. Every word is a person.

To understand its own true meaning in this context, the word 'bank' needs to know who is most important to listen to. So it 'looks' at all the other words in the sentence.

The **Attention Mechanism** calculates a score. It realizes that 'money' is the most important clue. It then adjusts its own 'meaning vector' to be closer to the 'financial institution' version of bank. It figures out its own meaning by paying attention to its friends.

So... the **training** isn't memorizing the internet. It's building this giant map of word relationships, and a system for them to pay attention to each other.

Yup. It's learning the statistical patterns, the geometry of language.

And that... all fits on a 1TB drive?

Yes. And now you can understand why. What is a map, if not a **compressed representation** of a territory?

A GPT works less like a brain. It's more like a non-deterministic WinRAR.

GPT...ZIP... How does that connect?

A simple ZIP file finds repeated patterns, like a sentence that appears 100 times.

A GPT finds the deep, abstract, universal patterns of all human knowledge. It's actually a compression algorithm.

The training process is a compression process. A 1.8-trillion-parameter model is, in essence, a compressed summary of all those statistical relationships.

So the model is trapped inside a new kind of ZIP file!

Yeah. This means when you ask the model, your question isn't a query. It's the **start of the file it needs to decompress**.

The AI doesn't 'understand' the question. It just sees that sequence of tokens and asks itself one thing, over and over: *'Based on the quadrillion patterns I have learned, what is the most statistically probable token to come next?'*

It's not retrieving a fact. It's decompressing a new, statistically plausible sentence, word by word.

But wait. When I unzip a file with WinRAR, I get the exact same, perfect file back. It's **lossless**.

Yep. And a GPT is **lossy**.

Lossy? What did it lose?

The truth. The source. It didn't store the original Wikipedia article. It only stored the *patterns* it learned from it. It's like an impressionist painter.

This AI has learned the 'essence' of the Eiffel Tower. The shape, the light, the 'vibe'. He has a perfect internal model of its patterns.

Now, you ask the AI, 'Paint me the Eiffel Tower'. He doesn't give back a photo. He **reconstructs** a new painting from his internal model.

It will be beautiful. It will look something like the Eiffel Tower. But it is a brand new creation. It is a 'hallucination'.

So... hallucinations sounds like a bad thing, but that's like the only thing GPT is made to do!

Yes. A painter who fills a detail he doesn't perfectly remember, but in a style that is consistent with the rest of the painting.

It is executing a reconstruction based on its incomplete, lossy memory of the world. The answer is wrong, but the *pattern* of the answer may be perfect.

But... I don't want a painting. Sometimes I just want the photograph. Can't you make it lossless?

You could. You could build a system that only retrieves exact quotes from its training data. It would be a deterministic, lossless decompressor.

How do I do that?

In AI, there is a parameter called Temperature. It's a knob that controls how much creative freedom the GPT is allowed.

A dial for creativity? How does that work?

Let's go back to our painter. You ask him a factual question: `What is the capital of France?`

For a factual query, we set the **Temperature to near zero**. `0.1`. This tightens the leash. The painter is now only allowed to choose the most obvious, boring, most statistically probable brushstroke at every single step.

This kills creativity stone dead. But it maximizes the chance of factual recall. It forces the painter to act like a camera, reconstructing the most likely 'photograph' from his memory. This is the 'lossless' mode you wanted. It's a fact machine.

Okay, so a cold AI is a boring, factual AI.

What if I want something new? When I want a masterpiece, not a photograph?

Well, you turn the **Temperature up**. `0.9`.

This loosens the leash. It tells the artist he is allowed to consider less probable, more 'surprising' brushstrokes.

A low-temp AI would just draw a cat. But the high-temp AI can make a creative connection. It sees that 'menacing' is statistically linked to 'jojo', and 'dio' is linked to 'menacing', and 'maniac' is linked to 'dio'.

That's... good. And weird.

That is the magic. That is the 'hallucination' you wanted. The high temperature allowed the AI to create a new, surprising image. The 'bug' became the entire point of the exercise.

So... 'Temperature' is a knob that changes the result of the decompressed data?

That's the generative **G** in the **GPT** we're talking about.

And my job as a user is to know if I want a lecturer or a poet.

Still, I'm impressed with how it manages to generate that menacing cat. Share me the prompt later.