Building Large Language Models (LLMs) from scratch is a complex and challenging task. It requires a deep understanding of the underlying mathematics and a strong foundation in computer science. In this post, we will explore the process of building a LLM from scratch and provide a step-by-step guide to help anyone get started.
LLMs are incredibly versatile, aiding in tasks such as checking grammar, composing emails, summarizing lengthy documents, and much more. They are “large”—very large—encompassing millions to billions of parameters. LLMs are a unique subset of AI. There is a very nice book Build LLMs from Scratch by Sebastian Raschka which shows a practical approach to building your own LLM.
Besides the book, I would recommend the following series of videos by Dr. Raj Dandekar to expand upon and understand the LLMs from scratch.
- Lecture 1: Building LLMs from scratch: Series introduction
- Lecture 2: Large Language Models (LLM) Basics
- Lecture 3: Pretraining LLMs vs Finetuning LLMs
- Lecture 4: What are transformers?
- Lecture 5: How does GPT-3 really work?
- Lecture 6: Stages of building an LLM from Scratch
- Lecture 7: Code an LLM Tokenizer from Scratch in Python
- Lecture 8: The GPT Tokenizer: Byte Pair Encoding
- Lecture 9: Creating Input-Target data pairs using Python DataLoader
- Lecture 10: What are token embeddings?
- Lecture 11: The importance of Positional Embeddings
- Lecture 12: The entire Data Preprocessing Pipeline of Large Language Models (LLMs)
- Lecture 13: Introduction to the Attention Mechanism in Large Language Models (LLMs)
- Lecture 14: Simplified Attention Mechanism - Coded from scratch in Python | No trainable weights
- Lecture 15: Coding the self attention mechanism with key, query and value matrices
- Lecture 16: Causal Self Attention Mechanism | Coded from scratch in Python
- Lecture 17: Multi Head Attention Part 1 - Basics and Python code
- Lecture 18: Multi Head Attention Part 2 - Entire mathematics explained
- Lecture 19: Birds Eye View of the LLM Architecture
- Lecture 20: Layer Normalization in the LLM Architecture
- Lecture 21: GELU Activation Function in the LLM Architecture
- Lecture 22: Shortcut connections in the LLM Architecture
- Lecture 23: Coding the entire LLM Transformer Block
- Lecture 24: Coding the 124 million parameter GPT-2 model
- Lecture 25: Coding GPT-2 to predict the next token
- Lecture 26: Measuring the LLM loss function
- Lecture 27: Evaluating LLM performance on real dataset | Hands on project | Book data
- Lecture 28: Coding the entire LLM Pre-training Loop
- Lecture 29: Temperature Scaling in Large Language Models (LLMs)
- Lecture 30: Top-k sampling in Large Language Models
I will soon share my notes and code-samples on the book. I will also use pre-trained models to generate text and do fine-tuning for a set of projects which are close to my heart on my github account.
Stay tuned and learn LLMs. Understanding of LLMs is a must for any professional who wants to delve into AI and Machine Learning.
Enjoy!