Part 2: The "Puzzle Pieces" – Why AI Struggles with Simple Counting

In Part 1, we learned that AI is a powerful "Internet Simulator" that predicts the next word. But this leads to a strange problem: Why can an AI solve a difficult math problem yet fail to count the number of 'r's in the word "strawberry"?.

The answer lies in a process called Tokenization.

1. What are Tokens? (The Atoms of Language)

Humans see words and letters. AI, however, does not "read" text the way we do. Before your message reaches the AI's brain, it is chopped into small "chunks" called Tokens.

Small Blocks: A token can be a whole word (like "hello"), a part of a word (like "ing"), or even just a space.
Number IDs: Computers only understand numbers. Every unique token is given a unique ID number. For example, in the AI's "brain," the word "hello" is just ID number 15339.

Token: These are the "atoms" or "puzzle pieces" of language for the AI. It builds sentences by putting these pieces together based on math, not by looking at letters.

2. Why doesn't AI just use letters?

Why doesn't AI look at every letter? The reason is efficiency. Processing every single letter would be too slow and would fill up the AI's "working memory" (called the Context Window) too fast.

To save space, AI uses an algorithm (a set of rules) called Byte Pair Encoding (BPE). It finds common patterns (like "th" or "ion") and groups them into one single token.

Because AI sees "chunks" (tokens) and not individual letters, it has "blind spots." This is often called the "Swiss Cheese Model": the AI is very strong in some areas but has holes in its understanding.

The "Strawberry" Problem: When AI sees "strawberry," it might see only two pieces: straw and berry. Because it only sees the ID numbers for these two chunks, it doesn't actually "see" the letters inside them. This makes counting letters very difficult.
Number Confusion: Sometimes AI thinks 9.11 is larger than 9.9. This happens because those numbers look like "Bible verse markers" (where 9:11 comes after 9:9) in the AI's memory.

4. Pro-Tips: How to Help Your AI "Think" Better

Now that you know AI sees "puzzle pieces," you can use it more effectively:

Don't trust it for counting: If you need to count letters or words, don't ask the AI to "guess".
Use Tools: For counting or math, ask the AI to "use code". The AI will write a small computer program to do the counting perfectly, bypassing its own blind spots.
Give it Space to Think: AI needs tokens to "think". If a problem is hard, ask it to "think step-by-step". This forces the AI to create more tokens, which gives it more "room" in its brain to find the right answer.

Summary of Part 2: AI sees language as a sequence of ID numbers representing chunks of text. While this makes it very fast, it lacks the "eyes" to see individual letters clearly.

In Part 3, we will move from the "Know-it-all Professor" (Base Model) to the "Virtual Assistant." We will explore how humans teach AI to follow instructions through a process called Supervised Fine-Tuning (SFT).

Part 2: The "Puzzle Pieces" – Why AI Struggles with Simple Counting

1. What are Tokens? (The Atoms of Language)

2. Why doesn't AI just use letters?

3. The "Blind Spots" – Why AI Makes Mistakes

4. Pro-Tips: How to Help Your AI "Think" Better