(Article #13 in the Build Your Own Small Language Model series)
One of the most effective — yet most overlooked — ways to improve Small Language Model (SLM) training is to control the order in which your model learns tasks.
This is known as curriculum learning.
Instead of throwing everything at the model at once, you train it:
- easy examples first
- then medium difficulty
- and only later the complex or abstract tasks
This approach mirrors how humans learn.
And for small models with limited capacity (350M–1B parameters), it dramatically improves:
- stability
- final accuracy
- generalization
- training speed
- resistance to overfitting
Let’s break it down.
1. What Is Curriculum Learning?
Curriculum Learning is a training strategy where:
The model starts with simple tasks and progressively moves to harder tasks.
For an Excel SLM, this might look like:
✔ Stage 1: Basic Functions
SUM, AVERAGE, COUNT, MIN, MAX
✔ Stage 2: Criteria-Based Functions
SUMIF, COUNTIF, LEFT, RIGHT
✔ Stage 3: Logical Functions
IF, AND, OR, nested logic
✔ Stage 4: Array Functions
FILTER, SORT, UNIQUE
✔ Stage 5: Advanced LAMBDA Logic
SCAN, BYROW, REDUCE, XLOOKUP combinations
By the time your model reaches advanced tasks, it already understands:
- syntax
- patterns
- structure
- dependencies
- token flows
This results in higher-quality specialization with far fewer mistakes.
2. Why Curriculum Learning Works (Especially for SLMs)
Small models don’t have the raw capacity of LLMs.
They cannot handle chaos early in training.
Curriculum Learning works because:
✔ The model builds foundational patterns first
No model should learn FILTER before SUMIF.
✔ Reduces cognitive load
Fewer competing patterns = faster convergence.
✔ Prevents early overfitting
Simple examples teach general structure.
✔ Stabilizes gradients
The model receives clean, predictable training signals.
✔ Improves long-term generalization
The SLM learns concepts, not just memorized examples.
3. The Three Types of Curriculum
There are multiple ways to structure a curriculum for SLM training.
A. Difficulty-Based Curriculum
Start easy → ramp to complex.
Perfect for:
- Excel formulas
- SQL queries
- Python snippets
- Sheets functions
- Structured NLP tasks
B. Structured-to-Unstructured Curriculum
Start with consistent templates → end with varied phrasing.
For example:
Stage 1 (structured):
“Sum values in E where B = ‘North’.”
Stage 2 (semi-structured):
“Add up column E when column B says North.”
Stage 3 (natural):
“I need the total of column E for rows labeled North.”
This helps the SLM generalize to messy human prompts.
C. Range-Based Curriculum
Start with small ranges → move to larger ranges.
Example:
Stage 1: A1:A10
Stage 2: A2:A100
Stage 3: A:A
Simple but surprisingly effective for Excel SLMs.
4. Building a Curriculum for Your Granite-350M Excel SLM
Here is a curriculum structure ideal for 10,000–80,000 training samples.
Stage 1 — Basic Function Patterns (10%)
SUM, AVERAGE, COUNT, MIN, MAX
Teach syntax and general structure.
Stage 2 — Criteria Logic (20%)
SUMIF, COUNTIF, numerical comparisons
Teach conditional reasoning.
Stage 3 — Text & Date Functions (20%)
LEFT, RIGHT, MID, LEN, DATE, YEAR
Teach token-level manipulation.
Stage 4 — Logical Flow (20%)
IF, AND, OR, nested IF
Teach branching logic.
Stage 5 — Advanced Arrays (20%)
FILTER, SORT, UNIQUE, TEXTSPLIT
Teach array formula reasoning.
Stage 6 — LAMBDA / Dynamic Arrays (10%)
BYROW, SCAN, REDUCE, LET
Teach structured algorithmic logic.
This curriculum produces high generalization while maintaining accuracy.
5. Dataset Ordering Matters
If you randomize everything:
- the model learns slower
- gradients are noisier
- early overfitting occurs
- syntax collapses early
Curriculum learning should be applied before shuffling.
Correct approach:
sorted_dataset = easy → hard
shuffled_in_small_chunks = True
This preserves curriculum but prevents clustering artifacts.
6. How Long Should Each Stage Last?
For SLMs under 1B parameters:
- Each stage can be 10–20% of total steps.
- The SLM should pass through the full curriculum 1–3 times.
- Hard stages should be repeated with small variations.
This produces stable specialization without stagnation.
7. How Curriculum Learning Improves Performance
✔ Faster training
Loss drops earlier and more smoothly.
✔ Fewer syntax errors
Because foundational syntax is trained first.
✔ Better generalization
The model learns logic, not memorized strings.
✔ Stronger performance on edge cases
Because “hard mode” training comes last.
✔ More stable final checkpoints
No late-stage collapse.
SLMs benefit more from curriculum learning than large models.
Conclusion
Curriculum learning is one of the simplest and most powerful ways to train a Small Language Model well. By ordering your dataset from easy to hard, you create a learning path that:
- stabilizes training
- accelerates convergence
- prevents overfitting
- improves generalization
- produces a much smarter SLM
If you want your SLM to perform like a specialist, curriculum design is non-negotiable.