(Article #13 in the Build Your Own Small Language Model series)

One of the most effective — yet most overlooked — ways to improve Small Language Model (SLM) training is to control the order in which your model learns tasks.
This is known as curriculum learning.

Instead of throwing everything at the model at once, you train it:

easy examples first
then medium difficulty
and only later the complex or abstract tasks

This approach mirrors how humans learn.
And for small models with limited capacity (350M–1B parameters), it dramatically improves:

stability
final accuracy
generalization
training speed
resistance to overfitting

Let’s break it down.

1. What Is Curriculum Learning?

Curriculum Learning is a training strategy where:

The model starts with simple tasks and progressively moves to harder tasks.

For an Excel SLM, this might look like:

✔ Stage 1: Basic Functions

SUM, AVERAGE, COUNT, MIN, MAX

✔ Stage 2: Criteria-Based Functions

SUMIF, COUNTIF, LEFT, RIGHT

✔ Stage 3: Logical Functions

IF, AND, OR, nested logic

✔ Stage 4: Array Functions

FILTER, SORT, UNIQUE

✔ Stage 5: Advanced LAMBDA Logic

SCAN, BYROW, REDUCE, XLOOKUP combinations

By the time your model reaches advanced tasks, it already understands:

syntax
patterns
structure
dependencies
token flows

This results in higher-quality specialization with far fewer mistakes.

2. Why Curriculum Learning Works (Especially for SLMs)

Small models don’t have the raw capacity of LLMs.
They cannot handle chaos early in training.

Curriculum Learning works because:

✔ The model builds foundational patterns first

No model should learn FILTER before SUMIF.

✔ Reduces cognitive load

Fewer competing patterns = faster convergence.

✔ Prevents early overfitting

Simple examples teach general structure.

✔ Stabilizes gradients

The model receives clean, predictable training signals.

✔ Improves long-term generalization

The SLM learns concepts, not just memorized examples.

3. The Three Types of Curriculum

There are multiple ways to structure a curriculum for SLM training.

A. Difficulty-Based Curriculum

Start easy → ramp to complex.

Perfect for:

Excel formulas
SQL queries
Python snippets
Sheets functions
Structured NLP tasks

B. Structured-to-Unstructured Curriculum

Start with consistent templates → end with varied phrasing.

For example:

Stage 1 (structured):
“Sum values in E where B = ‘North’.”

Stage 2 (semi-structured):
“Add up column E when column B says North.”

Stage 3 (natural):
“I need the total of column E for rows labeled North.”

This helps the SLM generalize to messy human prompts.

C. Range-Based Curriculum

Start with small ranges → move to larger ranges.

Example:

Stage 1: A1:A10
Stage 2: A2:A100
Stage 3: A:A

Simple but surprisingly effective for Excel SLMs.

4. Building a Curriculum for Your Granite-350M Excel SLM

Here is a curriculum structure ideal for 10,000–80,000 training samples.

Stage 1 — Basic Function Patterns (10%)

SUM, AVERAGE, COUNT, MIN, MAX
Teach syntax and general structure.

Stage 2 — Criteria Logic (20%)

SUMIF, COUNTIF, numerical comparisons
Teach conditional reasoning.

Stage 3 — Text & Date Functions (20%)

LEFT, RIGHT, MID, LEN, DATE, YEAR
Teach token-level manipulation.

Stage 4 — Logical Flow (20%)

IF, AND, OR, nested IF
Teach branching logic.

Stage 5 — Advanced Arrays (20%)

FILTER, SORT, UNIQUE, TEXTSPLIT
Teach array formula reasoning.

Stage 6 — LAMBDA / Dynamic Arrays (10%)

BYROW, SCAN, REDUCE, LET
Teach structured algorithmic logic.

This curriculum produces high generalization while maintaining accuracy.

5. Dataset Ordering Matters

If you randomize everything:

the model learns slower
gradients are noisier
early overfitting occurs
syntax collapses early

Curriculum learning should be applied before shuffling.

Correct approach:

sorted_dataset = easy → hard  
shuffled_in_small_chunks = True

This preserves curriculum but prevents clustering artifacts.

6. How Long Should Each Stage Last?

For SLMs under 1B parameters:

Each stage can be 10–20% of total steps.
The SLM should pass through the full curriculum 1–3 times.
Hard stages should be repeated with small variations.

This produces stable specialization without stagnation.

7. How Curriculum Learning Improves Performance

✔ Faster training

Loss drops earlier and more smoothly.

✔ Fewer syntax errors

Because foundational syntax is trained first.

✔ Better generalization

The model learns logic, not memorized strings.

✔ Stronger performance on edge cases

Because “hard mode” training comes last.

✔ More stable final checkpoints

No late-stage collapse.

SLMs benefit more from curriculum learning than large models.

Conclusion

Curriculum learning is one of the simplest and most powerful ways to train a Small Language Model well. By ordering your dataset from easy to hard, you create a learning path that:

stabilizes training
accelerates convergence
prevents overfitting
improves generalization
produces a much smarter SLM

If you want your SLM to perform like a specialist, curriculum design is non-negotiable.

Nano Language Models

Curriculum Learning — Training Your SLM From Easy to Hard

1. What Is Curriculum Learning?

✔ Stage 1: Basic Functions

✔ Stage 2: Criteria-Based Functions

✔ Stage 3: Logical Functions

✔ Stage 4: Array Functions

✔ Stage 5: Advanced LAMBDA Logic

2. Why Curriculum Learning Works (Especially for SLMs)

✔ The model builds foundational patterns first

✔ Reduces cognitive load

✔ Prevents early overfitting

✔ Stabilizes gradients

✔ Improves long-term generalization

3. The Three Types of Curriculum

A. Difficulty-Based Curriculum

B. Structured-to-Unstructured Curriculum

C. Range-Based Curriculum

4. Building a Curriculum for Your Granite-350M Excel SLM

Stage 1 — Basic Function Patterns (10%)

Stage 2 — Criteria Logic (20%)

Stage 3 — Text & Date Functions (20%)

Stage 4 — Logical Flow (20%)

Stage 5 — Advanced Arrays (20%)

Stage 6 — LAMBDA / Dynamic Arrays (10%)

5. Dataset Ordering Matters

6. How Long Should Each Stage Last?

7. How Curriculum Learning Improves Performance

✔ Faster training

✔ Fewer syntax errors

✔ Better generalization

✔ Stronger performance on edge cases

✔ More stable final checkpoints

Conclusion

Latest Articles

Stop Googling Excel Syntax — Let the AI Assistant Handle It

The AI That Understands Your Spreadsheet — User Edition

Curriculum Learning — Training Your SLM From Easy to Hard

1. What Is Curriculum Learning?

✔ Stage 1: Basic Functions

✔ Stage 2: Criteria-Based Functions

✔ Stage 3: Logical Functions

✔ Stage 4: Array Functions

✔ Stage 5: Advanced LAMBDA Logic

2. Why Curriculum Learning Works (Especially for SLMs)

✔ The model builds foundational patterns first

✔ Reduces cognitive load

✔ Prevents early overfitting

✔ Stabilizes gradients

✔ Improves long-term generalization

3. The Three Types of Curriculum

A. Difficulty-Based Curriculum

B. Structured-to-Unstructured Curriculum

C. Range-Based Curriculum

4. Building a Curriculum for Your Granite-350M Excel SLM

Stage 1 — Basic Function Patterns (10%)

Stage 2 — Criteria Logic (20%)

Stage 3 — Text & Date Functions (20%)

Stage 4 — Logical Flow (20%)

Stage 5 — Advanced Arrays (20%)

Stage 6 — LAMBDA / Dynamic Arrays (10%)

5. Dataset Ordering Matters

6. How Long Should Each Stage Last?

7. How Curriculum Learning Improves Performance

✔ Faster training

✔ Fewer syntax errors

✔ Better generalization

✔ Stronger performance on edge cases

✔ More stable final checkpoints

Conclusion

Share this:

Latest Articles

Stop Googling Excel Syntax — Let the AI Assistant Handle It

The AI That Understands Your Spreadsheet — User Edition