(Article #9 in the Build Your Own Small Language Model series)

Training a Small Language Model (SLM) is a balancing act. Train too little and the model remains weak. Train too much and the model becomes brittle, overly specific, or unable to generalize. These two failure modes are known as:

underfitting
overfitting

Finding the balance between them is one of the defining skills in SLM engineering. This article explains how both happen, how to detect them, and how to avoid them when training models like Granite-350M, Phi-2, or TinyLlama on domain-specific tasks.

1. What Is Underfitting?

Underfitting means the model hasn’t learned enough from your data.
It hasn’t captured the patterns, structure, or logic required to perform your task reliably.

Signs of Underfitting

Loss stays high and barely improves
Outputs are inconsistent or incomplete
Syntax errors occur frequently
Model behaves like the base model, not your specialized version
Training accuracy is low
Evaluation accuracy is also low

Causes of Underfitting

Too little training time
Too small dataset
Dataset too diverse
Learning rate too low
Weak training signals (messy formatting, noisy examples)
Poor prompt structure

Example (Excel SLM)

Model still outputs:

=SUM(A:A)

when asked for:

“Sum values in E where B="North"”

This shows it hasn’t internalized task-specific patterns yet.

2. What Is Overfitting?

Overfitting means the model memorized training data too closely and cannot generalize.

Instead of learning rules, it learns:

exact examples
exact phrase patterns
exact column letters
your dataset’s biases

Signs of Overfitting

Training loss is very low
Evaluation loss is much higher
Model repeats training patterns verbatim
Small perturbations in input cause large errors
Model fails on edge cases
Formulas look “memorized” rather than adapted

Example (Excel SLM)

You trained using A:A, B:B, C:C extensively.

Now the model outputs:

=SUMIF(B:B,"North",E:E)

even when asked for different column combinations.

3. Why Small Models Are More Sensitive

Small Language Models (≤1B parameters) have:

limited capacity
smaller embedding layers
simpler attention
restricted context windows

This means they:

underfit fast (if dataset too small)
overfit fast (if dataset too repetitive)
reach their “optimal capacity” quickly

This is good news — your training cycles are shorter, faster, and easier to tune.

4. How to Detect Underfitting and Overfitting

✔ Track training loss vs validation loss

Condition	Training Loss	Validation Loss
Underfitting	high	high
Overfitting	low	high
Good Training	lowish	lowish

If validation loss is higher than training loss and diverging, you’re overfitting.

✔ Use a gold benchmark dataset

A small set of 50–200 high-quality examples
(never used in training)
lets you check generalization without ambiguity.

✔ Inspect qualitative outputs

Overfit models:

repeat exact instructions
reuse specific training formats
output the same columns repeatedly

Underfit models:

output random formulas
skip logic
produce syntax errors

5. How to Prevent Underfitting

✔ Use enough training samples

For a domain like Excel:

5,000 samples → minimum
10,000–40,000 → ideal
80,000+ → excellent for multi-task Excel SLMs

✔ Increase training steps

Most SLMs need:

1,000–8,000 steps for small datasets
10,000–40,000 steps for large datasets

✔ Increase model capacity a bit

A 700M or 1.3B model underfits less on messy datasets.

✔ Improve dataset consistency

Stable formatting reduces cognitive load on the model.

6. How to Prevent Overfitting

✔ Add variation

Don’t use the same column letters or values repeatedly.

✔ Use synthetic diversity

Even tiny changes prevent memorization:

random numbers
random labels (“North”, “West”, “Online”, “Active”)
different ranges
different delimiters

✔ Increase data size

More samples = better generalization.

✔ Use early stopping

If validation loss increases for 3–5 checkpoints → stop.

✔ Lower the learning rate

High LR can push the model into memorization.

7. How to Strike the Perfect Balance

Your SLM is properly trained when:

training loss is low
evaluation loss is also low
benchmark results are consistent
outputs vary appropriately
outputs generalize to unseen cases
no memorized patterns appear

This is the sweet spot — the point where the SLM understands the rules, not just the examples.

Recommended Settings for Granite-350M Excel SLM Training

✔ Dataset size

10k–80k examples

✔ Effective batch size

32–128 tokens (via gradient accumulation)

✔ Learning rate

1e-4 or 2e-4

✔ Sequence length

128–256 tokens

✔ Evaluation every

200–500 training steps

✔ Stop training when

Validation loss plateaus or increases

Conclusion

Overfitting and underfitting are two sides of the same challenge:
making sure your SLM learns just enough, but not too much.

If you monitor loss curves, maintain clean datasets, introduce variation, and evaluate regularly, you can train small models that:

generalize well
stay stable
avoid hallucination
and deliver accurate domain-specific behavior

Small models are powerful — but only when trained in balance.

Nano Language Models

Overfitting vs Underfitting — Finding the Sweet Spot in SLM Training

1. What Is Underfitting?

Signs of Underfitting

Causes of Underfitting

Example (Excel SLM)

2. What Is Overfitting?

Signs of Overfitting

Example (Excel SLM)

3. Why Small Models Are More Sensitive

4. How to Detect Underfitting and Overfitting

✔ Track training loss vs validation loss

✔ Use a gold benchmark dataset

✔ Inspect qualitative outputs

5. How to Prevent Underfitting

✔ Use enough training samples

✔ Increase training steps

✔ Increase model capacity a bit

✔ Improve dataset consistency

6. How to Prevent Overfitting

✔ Add variation

✔ Use synthetic diversity

✔ Increase data size

✔ Use early stopping

✔ Lower the learning rate

7. How to Strike the Perfect Balance

Recommended Settings for Granite-350M Excel SLM Training

✔ Dataset size

✔ Effective batch size

✔ Learning rate

✔ Sequence length

✔ Evaluation every

✔ Stop training when

Conclusion

Latest Articles

Stop Googling Excel Syntax — Let the AI Assistant Handle It

The AI That Understands Your Spreadsheet — User Edition

Overfitting vs Underfitting — Finding the Sweet Spot in SLM Training

1. What Is Underfitting?

Signs of Underfitting

Causes of Underfitting

Example (Excel SLM)

2. What Is Overfitting?

Signs of Overfitting

Example (Excel SLM)

3. Why Small Models Are More Sensitive

4. How to Detect Underfitting and Overfitting

✔ Track training loss vs validation loss

✔ Use a gold benchmark dataset

✔ Inspect qualitative outputs

5. How to Prevent Underfitting

✔ Use enough training samples

✔ Increase training steps

✔ Increase model capacity a bit

✔ Improve dataset consistency

6. How to Prevent Overfitting

✔ Add variation

✔ Use synthetic diversity

✔ Increase data size

✔ Use early stopping

✔ Lower the learning rate

7. How to Strike the Perfect Balance

Recommended Settings for Granite-350M Excel SLM Training

✔ Dataset size

✔ Effective batch size

✔ Learning rate

✔ Sequence length

✔ Evaluation every

✔ Stop training when

Conclusion

Share this:

Latest Articles

Stop Googling Excel Syntax — Let the AI Assistant Handle It

The AI That Understands Your Spreadsheet — User Edition