Curriculum Learning — Training Your SLM From Easy to Hard

(Article #13 in the Build Your Own Small Language Model series)

One of the most effective — yet most overlooked — ways to improve Small Language Model (SLM) training is to control the order in which your model learns tasks.
This is known as curriculum learning.

Instead of throwing everything at the model at once, you train it:

  1. easy examples first
  2. then medium difficulty
  3. and only later the complex or abstract tasks

This approach mirrors how humans learn.
And for small models with limited capacity (350M–1B parameters), it dramatically improves:

  • stability
  • final accuracy
  • generalization
  • training speed
  • resistance to overfitting

Let’s break it down.

1. What Is Curriculum Learning?

Curriculum Learning is a training strategy where:

The model starts with simple tasks and progressively moves to harder tasks.

For an Excel SLM, this might look like:

✔ Stage 1: Basic Functions

SUM, AVERAGE, COUNT, MIN, MAX

✔ Stage 2: Criteria-Based Functions

SUMIF, COUNTIF, LEFT, RIGHT

✔ Stage 3: Logical Functions

IF, AND, OR, nested logic

✔ Stage 4: Array Functions

FILTER, SORT, UNIQUE

✔ Stage 5: Advanced LAMBDA Logic

SCAN, BYROW, REDUCE, XLOOKUP combinations

By the time your model reaches advanced tasks, it already understands:

  • syntax
  • patterns
  • structure
  • dependencies
  • token flows

This results in higher-quality specialization with far fewer mistakes.

2. Why Curriculum Learning Works (Especially for SLMs)

Small models don’t have the raw capacity of LLMs.
They cannot handle chaos early in training.

Curriculum Learning works because:

✔ The model builds foundational patterns first

No model should learn FILTER before SUMIF.

✔ Reduces cognitive load

Fewer competing patterns = faster convergence.

✔ Prevents early overfitting

Simple examples teach general structure.

✔ Stabilizes gradients

The model receives clean, predictable training signals.

✔ Improves long-term generalization

The SLM learns concepts, not just memorized examples.

3. The Three Types of Curriculum

There are multiple ways to structure a curriculum for SLM training.

A. Difficulty-Based Curriculum

Start easy → ramp to complex.

Perfect for:

  • Excel formulas
  • SQL queries
  • Python snippets
  • Sheets functions
  • Structured NLP tasks

B. Structured-to-Unstructured Curriculum

Start with consistent templates → end with varied phrasing.

For example:

Stage 1 (structured):
“Sum values in E where B = ‘North’.”

Stage 2 (semi-structured):
“Add up column E when column B says North.”

Stage 3 (natural):
“I need the total of column E for rows labeled North.”

This helps the SLM generalize to messy human prompts.

C. Range-Based Curriculum

Start with small ranges → move to larger ranges.

Example:

Stage 1: A1:A10
Stage 2: A2:A100
Stage 3: A:A

Simple but surprisingly effective for Excel SLMs.

4. Building a Curriculum for Your Granite-350M Excel SLM

Here is a curriculum structure ideal for 10,000–80,000 training samples.

Stage 1 — Basic Function Patterns (10%)

SUM, AVERAGE, COUNT, MIN, MAX
Teach syntax and general structure.

Stage 2 — Criteria Logic (20%)

SUMIF, COUNTIF, numerical comparisons
Teach conditional reasoning.

Stage 3 — Text & Date Functions (20%)

LEFT, RIGHT, MID, LEN, DATE, YEAR
Teach token-level manipulation.

Stage 4 — Logical Flow (20%)

IF, AND, OR, nested IF
Teach branching logic.

Stage 5 — Advanced Arrays (20%)

FILTER, SORT, UNIQUE, TEXTSPLIT
Teach array formula reasoning.

Stage 6 — LAMBDA / Dynamic Arrays (10%)

BYROW, SCAN, REDUCE, LET
Teach structured algorithmic logic.

This curriculum produces high generalization while maintaining accuracy.

5. Dataset Ordering Matters

If you randomize everything:

  • the model learns slower
  • gradients are noisier
  • early overfitting occurs
  • syntax collapses early

Curriculum learning should be applied before shuffling.

Correct approach:

sorted_dataset = easy → hard  
shuffled_in_small_chunks = True  

This preserves curriculum but prevents clustering artifacts.

6. How Long Should Each Stage Last?

For SLMs under 1B parameters:

  • Each stage can be 10–20% of total steps.
  • The SLM should pass through the full curriculum 1–3 times.
  • Hard stages should be repeated with small variations.

This produces stable specialization without stagnation.

7. How Curriculum Learning Improves Performance

✔ Faster training

Loss drops earlier and more smoothly.

✔ Fewer syntax errors

Because foundational syntax is trained first.

✔ Better generalization

The model learns logic, not memorized strings.

✔ Stronger performance on edge cases

Because “hard mode” training comes last.

✔ More stable final checkpoints

No late-stage collapse.

SLMs benefit more from curriculum learning than large models.

Conclusion

Curriculum learning is one of the simplest and most powerful ways to train a Small Language Model well. By ordering your dataset from easy to hard, you create a learning path that:

  • stabilizes training
  • accelerates convergence
  • prevents overfitting
  • improves generalization
  • produces a much smarter SLM

If you want your SLM to perform like a specialist, curriculum design is non-negotiable.

Get early access to the fastest way to turn plain language into Excel formulas—sign up for the waitlist.

Latest Articles