Nano Language Models

Building a Small Language Model from Scratch in Python

We ‘re now moving from theory and deployment into hands-on Python implementation — showing “how to actually build a small language model from scratch“.

Here’s a table of contents (15 articles) that focuses on building a small language model from scratch as a tutorial.

Part 1: Foundations — From Text to Tokens

Introduction: What Is a Small Language Model (SLM)?
→ Explain architecture, tokenization, and what makes “small” models special.
Collecting and Cleaning Your Dataset
→ Use open text sources (TinyStories, Gutenberg, wikitext). Show cleaning & normalization in Python.
Building a Simple Tokenizer from Scratch
→ Implement Byte-Pair Encoding (BPE) or WordPiece tokenizer in Python — step by step.
Converting Text into Numerical Data
→ Create a vocabulary, encode text into token IDs, and build a dataloader with PyTorch.

Part 2: Architecture — Building the Brain

Building a Tiny Transformer in PyTorch
→ Define embeddings, attention, feedforward, and layer normalization manually.
Understanding Multi-Head Attention (Visually and in Code)
→ Break down query/key/value matrices, masking, and head aggregation.
Implementing Positional Encoding and Model Forward Pass
→ Add positional context and run your first forward pass.
Training Loop: Teaching the Model to Predict the Next Token
→ Use cross-entropy loss, batching, and gradient descent to train your SLM.

Part 3: Scaling and Evaluation

Generating Text: From Training to Inference
→ Implement greedy and temperature-based sampling.
Evaluating Performance: Loss Curves and Perplexity
→ Plot training progress and measure model quality.
Optimizing for Speed and Size (Quantization + Mixed Precision)
→ Demonstrate how to shrink model weights using bitsandbytes or PyTorch’s quantization.
Saving and Reloading Your Model Efficiently
→ Cover checkpoints, weight saving, and versioning for reproducibility.

Part 4: Beyond the Basics

Fine-Tuning Your Small Model on Custom Data
→ Show how to adapt your base SLM to specific text domains.
Adding a Simple Chat Interface (Streamlit or FastAPI)
→ Turn the trained model into a mini local chatbot UI.
Deploying Your SLM as a GGUF Model (for llama.cpp or Ollama)
→ Export and test your model in small inference environments.

💡 Optional Bonus Tutorials

“Visualizing Attention Heads in Your Own Model”
“Adding Memory to a Custom SLM”
“Comparing Your Model to TinyLlama and Phi-3 Mini”

The code Small Language Model we’re building is here available on Github.

Follow NanoLanguageModels.com to continue the full “Build a Small Language Model from Scratch” series — next up: What Is a Small Language Model (SLM)? ⚙️

Ben Kemp

November 7, 2025

Python AI Fundamentals

ai, machinelearning, nlp, python, pytorch, smalllanguagemodels, transformer

Get early access to the fastest way to turn plain language into Excel formulas—sign up for the waitlist.

Latest Articles

Stop Googling Excel Syntax — Let the AI Assistant Handle It

Tired of Googling how to write SUMIFS, XLOOKUP, or IF formulas? The Excel Formula Assistant lets you type normal English and instantly get the correct…

The AI That Understands Your Spreadsheet — User Edition

The Excel Formula Assistant does more than generate formulas — it understands the meaning behind your task. This user-friendly guide explains how the AI interprets…