Big models get all the hype — but small models are becoming the real surprise.

When I started 6SigmaMind, I wanted to know:

How far can a tiny 1.7B model go when solving real Excel tasks?

To find out, I ran it through three challenges:

SUMIFS (conditional math)
XLOOKUP (modern lookups)
T.TEST (statistics)

Then I asked readers like you to break it.

The results?
Surprisingly good — and occasionally hilarious.

👉 Try the model yourself: https://huggingface.co/spaces/benkemp/6SigmaMindv2

Let’s walk through the battle test.

🥊 Round 1 — SUMIFS: Everyday Excel Powerhouse

SUMIFS is one of the most-used formulas on the planet.

Test Prompt:
“Sum all values in C where B equals ‘Closed’.”

Expected Output:

=SUMIFS(C:C, B:B, "Closed")

6SigmaMind’s performance:
⭐⭐⭐⭐☆ (4/5)

Almost always gets it right
Clean formatting
Works across most phrasings
Rarely mixes up the argument order (but it can)

Try these in the demo:

“Sum D2:D200 where A2:A200 = ‘Active’.”
“Add all values in column F only if column C is ‘Approved’.”

This is the model’s comfort zone.

🥊 Round 2 — XLOOKUP: The Modern Lookup Test

XLOOKUP is where small models often fail — but 6SigmaMind holds its own.

Test Prompt:
“Return the price in D for the SKU in A that matches H2.”

Expected Output:

=XLOOKUP(H2, A:A, D:D)

6SigmaMind’s performance:
⭐⭐⭐☆☆ (3/5)

Often correct
Sometimes swaps lookup + return arrays
Occasionally reverts to VLOOKUP (funny, but still works)

Try these variants:

“Lookup the value in column C where A = G5.”
“Give me an XLOOKUP formula for ID in A and name in B.”

Evaluating a small model on lookups is eye-opening — it reveals where reasoning lives.

🥊 Round 3 — T.TEST: Entering the Statistics Arena

Here’s where things get… unpredictable.

Test Prompt:
“Do a two-tailed t-test comparing C2:C50 with D2:D50.”

Expected Output:

=T.TEST(C2:C50, D2:D50, 2, 2)

6SigmaMind’s performance:
⭐⭐☆☆☆ (2/5)

It knows T.TEST exists
It understands two ranges
It sometimes confuses arguments
It sometimes produces a more complex formula than needed

This is where future fine-tuning will shine — especially when we introduce the Excel for Statistics Dataset.

Try pushing the model with:

“Calculate the correlation between A and B.”
“Give me the variance for C2:C100.”
“Return the standard deviation for D2:D80.”

These tasks help map out the model’s statistical “baseline.”

📊 What This Battle Test Really Shows

This comparison reveals the nature of tiny models:

✔ They can learn structured logic

Even SUMIFS feels natural to them.

✔ They understand modern Excel functions

XLOOKUP support is impressive for their size.

✔ They have pockets of statistical knowledge

T.TEST, CORREL, STDEV.S — these aren’t trivial.

✔ Their biggest weakness is precision ordering

Not surprising for a 1.7B model.

✔ But they’re fast, responsive, and fun

And that makes experimentation addictive.

🧠 Why This Test Matters

6SigmaMind isn’t trying to beat GPT-4.
It’s demonstrating something different:

Tiny, specialized models can already handle real tasks —
and with the right fine-tuning, they get dramatically better.

The next steps:

Build a dataset of 1,000+ Excel tasks
Fine-tune the model on structured examples
Evaluate accuracy across domains
Compare against other small models: Phi-3 Mini, Qwen 3B, Gemma 2B
Release benchmark results publicly

This project is as much about education as performance.

🎯 Your Turn: Try the Battle Test Yourself

Here’s how to play along:

Open the demo
Copy/paste these three prompts:

Prompt 1 — SUMIFS
“Sum values in C where B equals ‘Closed’.”

Prompt 2 — XLOOKUP
“Return the value in column D for the ID in A matching H2.”

Prompt 3 — T.TEST
“Perform a two-tailed t-test for C2:C50 vs D2:D50.”

Compare the output
Share the results — good, bad, weird — it all helps

👉 Try the 6SigmaMind battle test:
https://huggingface.co/spaces/benkemp/6SigmaMindv2

🚀 Coming Next in the Series

The next articles will explore:

How Small Models Understand Excel Logic
Why Domain-Specific Fine-Tuning Changes Everything
Excel for Statistics: Training a Mini-Model on Real Data
How 6SigmaMind Compares to Qwen / Phi-3 / Gemma / SmolLM2
The Road to 6SigmaMind v2

We’re building this project in the open — and you’re part of it.

Nano Language Models

Battle Test: How Small Models Handle SUMIFS, XLOOKUP, and T.TEST

🥊 Round 1 — SUMIFS: Everyday Excel Powerhouse

🥊 Round 2 — XLOOKUP: The Modern Lookup Test

🥊 Round 3 — T.TEST: Entering the Statistics Arena

📊 What This Battle Test Really Shows

✔ They can learn structured logic

✔ They understand modern Excel functions

✔ They have pockets of statistical knowledge

✔ Their biggest weakness is precision ordering

✔ But they’re fast, responsive, and fun

🧠 Why This Test Matters

🎯 Your Turn: Try the Battle Test Yourself

🚀 Coming Next in the Series

Latest Articles

Stop Googling Excel Syntax — Let the AI Assistant Handle It