Overfitting vs Underfitting — Finding the Sweet Spot in SLM Training

(Article #9 in the Build Your Own Small Language Model series)

Training a Small Language Model (SLM) is a balancing act. Train too little and the model remains weak. Train too much and the model becomes brittle, overly specific, or unable to generalize. These two failure modes are known as:

  • underfitting
  • overfitting

Finding the balance between them is one of the defining skills in SLM engineering. This article explains how both happen, how to detect them, and how to avoid them when training models like Granite-350M, Phi-2, or TinyLlama on domain-specific tasks.

1. What Is Underfitting?

Underfitting means the model hasn’t learned enough from your data.
It hasn’t captured the patterns, structure, or logic required to perform your task reliably.

Signs of Underfitting

  • Loss stays high and barely improves
  • Outputs are inconsistent or incomplete
  • Syntax errors occur frequently
  • Model behaves like the base model, not your specialized version
  • Training accuracy is low
  • Evaluation accuracy is also low

Causes of Underfitting

  • Too little training time
  • Too small dataset
  • Dataset too diverse
  • Learning rate too low
  • Weak training signals (messy formatting, noisy examples)
  • Poor prompt structure

Example (Excel SLM)

Model still outputs:

=SUM(A:A)

when asked for:

“Sum values in E where B="North"”

This shows it hasn’t internalized task-specific patterns yet.

2. What Is Overfitting?

Overfitting means the model memorized training data too closely and cannot generalize.

Instead of learning rules, it learns:

  • exact examples
  • exact phrase patterns
  • exact column letters
  • your dataset’s biases

Signs of Overfitting

  • Training loss is very low
  • Evaluation loss is much higher
  • Model repeats training patterns verbatim
  • Small perturbations in input cause large errors
  • Model fails on edge cases
  • Formulas look “memorized” rather than adapted

Example (Excel SLM)

You trained using A:A, B:B, C:C extensively.

Now the model outputs:

=SUMIF(B:B,"North",E:E)

even when asked for different column combinations.

3. Why Small Models Are More Sensitive

Small Language Models (≤1B parameters) have:

  • limited capacity
  • smaller embedding layers
  • simpler attention
  • restricted context windows

This means they:

  • underfit fast (if dataset too small)
  • overfit fast (if dataset too repetitive)
  • reach their “optimal capacity” quickly

This is good news — your training cycles are shorter, faster, and easier to tune.

4. How to Detect Underfitting and Overfitting

Track training loss vs validation loss

ConditionTraining LossValidation Loss
Underfittinghighhigh
Overfittinglowhigh
Good Traininglowishlowish

If validation loss is higher than training loss and diverging, you’re overfitting.

Use a gold benchmark dataset

A small set of 50–200 high-quality examples
(never used in training)
lets you check generalization without ambiguity.

Inspect qualitative outputs

Overfit models:

  • repeat exact instructions
  • reuse specific training formats
  • output the same columns repeatedly

Underfit models:

  • output random formulas
  • skip logic
  • produce syntax errors

5. How to Prevent Underfitting

✔ Use enough training samples

For a domain like Excel:

  • 5,000 samples → minimum
  • 10,000–40,000 → ideal
  • 80,000+ → excellent for multi-task Excel SLMs

✔ Increase training steps

Most SLMs need:

  • 1,000–8,000 steps for small datasets
  • 10,000–40,000 steps for large datasets

✔ Increase model capacity a bit

A 700M or 1.3B model underfits less on messy datasets.

✔ Improve dataset consistency

Stable formatting reduces cognitive load on the model.

6. How to Prevent Overfitting

✔ Add variation

Don’t use the same column letters or values repeatedly.

✔ Use synthetic diversity

Even tiny changes prevent memorization:

  • random numbers
  • random labels (“North”, “West”, “Online”, “Active”)
  • different ranges
  • different delimiters

✔ Increase data size

More samples = better generalization.

✔ Use early stopping

If validation loss increases for 3–5 checkpoints → stop.

✔ Lower the learning rate

High LR can push the model into memorization.

7. How to Strike the Perfect Balance

Your SLM is properly trained when:

  • training loss is low
  • evaluation loss is also low
  • benchmark results are consistent
  • outputs vary appropriately
  • outputs generalize to unseen cases
  • no memorized patterns appear

This is the sweet spot — the point where the SLM understands the rules, not just the examples.

Recommended Settings for Granite-350M Excel SLM Training

✔ Dataset size

10k–80k examples

✔ Effective batch size

32–128 tokens (via gradient accumulation)

✔ Learning rate

1e-4 or 2e-4

✔ Sequence length

128–256 tokens

✔ Evaluation every

200–500 training steps

✔ Stop training when

Validation loss plateaus or increases

Conclusion

Overfitting and underfitting are two sides of the same challenge:
making sure your SLM learns just enough, but not too much.

If you monitor loss curves, maintain clean datasets, introduce variation, and evaluate regularly, you can train small models that:

  • generalize well
  • stay stable
  • avoid hallucination
  • and deliver accurate domain-specific behavior

Small models are powerful — but only when trained in balance.

Get early access to the fastest way to turn plain language into Excel formulas—sign up for the waitlist.

Latest Articles