LLM Learning Explorer
What is an LLM?
Understanding the basics
Welcome! Let's explore how Large Language Models work, step by step.
Your Input
Neural Network
Predicted Output
Key Concept
An LLM predicts the next word based on patterns learned from massive amounts of text data. It doesn't "understand" like humans doβit recognizes statistical patterns in language.
π Summary
- β’ LLMs are trained on billions of words from the internet
- β’ They learn patterns in how words follow each other
- β’ Given some text, they predict what comes next
Tokenization
Breaking text into pieces
Before an LLM can process text, it must break it into smaller pieces called tokens.
Tokenized Result:
Complete words that exist in the vocabulary
Parts of words, like prefixes or suffixes
Punctuation and special characters
π‘ Why Tokenization?
Tokenization converts text into numbers that the model can process. Each unique token gets an ID number. The model works with these numbers, not the actual letters!
π Summary
- β’ Text is split into tokens (words, sub-words, or characters)
- β’ Each token maps to a unique ID number
- β’ Common words are single tokens; rare words may be split
Embeddings
Words as numbers in space
Embeddings convert tokens into lists of numbers (vectors) that capture their meaning. Similar words end up close together!
2D Embedding Space (Simplified)
π‘ Why Embeddings Matter
By representing words as vectors, the model can understand relationships. "King - Man + Woman β Queen" works because the directions in embedding space capture meaning!
π Summary
- β’ Each token becomes a vector of hundreds of numbers
- β’ Similar meanings = similar vectors = nearby in space
- β’ The model learns these representations during training
Transformer Architecture
The building blocks of LLMs
The Transformer is the architecture that powers modern LLMs. Data flows through layers that process and refine the understanding.
INPUT EMBEDDINGS
Token vectors enter here
Layer 1
Γ 96 layersAttention
Feed Forward
OUTPUT PROBABILITIES
Prediction for next token
π‘ Layer by Layer
Each layer refines the model's understanding. Early layers capture basic patterns (grammar, word forms), while deeper layers capture complex concepts (meaning, context, reasoning).
π Summary
- β’ Transformers stack many identical layers
- β’ Each layer has attention + feed-forward components
- β’ Data flows through all layers before producing output
Attention Mechanism
How words "look at" each other
Attention allows each word to gather information from other relevant words. This is how the model understands context!
Click a word to see what it "pays attention" to:
"it" strongly attends to "animal" because the model learned that pronouns often refer back to nouns. This is how it knows what "it" means!
π‘ Self-Attention
In self-attention, every word can "look at" every other word. The model learns which words are relevant to each other. This happens in parallel, making Transformers very efficient!
π Summary
- β’ Attention computes relevance between all word pairs
- β’ Strong attention = words are contextually related
- β’ This is how models resolve pronouns and understand context
Training vs Inference
Learning vs using the model
π Training Phase
Data Size
Trillions of words
Time
Weeks to months
Cost
$10M+
β‘ Inference Phase
"What is AI?"
"AI is..."
Input
Your prompt
Time
Milliseconds
Output
Token by token
π‘ Key Difference
Training is expensive and slowβit creates the model. Inference is fast and cheapβit uses the model. You only train once, but use the model millions of times!
π Summary
- β’ Training: Model learns patterns from massive datasets
- β’ Inference: Trained model generates responses to prompts
- β’ Training is slow/expensive; inference is fast/cheap
Next Token Prediction
How LLMs generate text
LLMs generate text by predicting the most likely next token, one at a time. Let's see it in action!
Generated Text:
Next Token Probabilities:
π‘ Temperature
The "temperature" setting controls randomness. Low temperature (0) always picks the highest probability. High temperature (1+) adds randomness, making outputs more creative but less predictable.
π Summary
- β’ LLMs predict one token at a time
- β’ Each token has a probability distribution
- β’ Temperature controls the randomness of selection
Prompt Engineering
Crafting effective inputs
How you write your prompt dramatically affects the output. Let's explore key techniques!
Try different prompt styles:
Simulated Output:
Quantum computing uses quantum mechanics principles like superposition and entanglement to process information in ways classical computers cannot...
π Context
Set the scene or provide background information the model needs.
"You are a helpful assistant..."
π Instructions
Clear, specific directions for what you want.
"Explain in 3 bullet points..."
π‘ Examples
Show the format or style you want in the response.
"Example: Input: X β Output: Y"
π‘ Pro Tips
- β’ Be specific: "Write 3 paragraphs" beats "Write about..."
- β’ Use role-playing: "You are an expert in..."
- β’ Iterate: Start simple, then refine based on outputs
π Summary
- β’ Good prompts = better outputs
- β’ Include context, clear instructions, and examples
- β’ Experiment and iterate to find what works