Battle Test: How Small Models Handle SUMIFS, XLOOKUP, and T.TEST

Big models get all the hype — but small models are becoming the real surprise.

When I started 6SigmaMind, I wanted to know:

How far can a tiny 1.7B model go when solving real Excel tasks?

To find out, I ran it through three challenges:

  • SUMIFS (conditional math)
  • XLOOKUP (modern lookups)
  • T.TEST (statistics)

Then I asked readers like you to break it.

The results?
Surprisingly good — and occasionally hilarious.

👉 Try the model yourself: https://huggingface.co/spaces/benkemp/6SigmaMindv2

Let’s walk through the battle test.

🥊 Round 1 — SUMIFS: Everyday Excel Powerhouse

SUMIFS is one of the most-used formulas on the planet.

Test Prompt:
“Sum all values in C where B equals ‘Closed’.”

Expected Output:

=SUMIFS(C:C, B:B, "Closed")

6SigmaMind’s performance:
⭐⭐⭐⭐☆ (4/5)

  • Almost always gets it right
  • Clean formatting
  • Works across most phrasings
  • Rarely mixes up the argument order (but it can)

Try these in the demo:

  • “Sum D2:D200 where A2:A200 = ‘Active’.”
  • “Add all values in column F only if column C is ‘Approved’.”

This is the model’s comfort zone.

🥊 Round 2 — XLOOKUP: The Modern Lookup Test

XLOOKUP is where small models often fail — but 6SigmaMind holds its own.

Test Prompt:
“Return the price in D for the SKU in A that matches H2.”

Expected Output:

=XLOOKUP(H2, A:A, D:D)

6SigmaMind’s performance:
⭐⭐⭐☆☆ (3/5)

  • Often correct
  • Sometimes swaps lookup + return arrays
  • Occasionally reverts to VLOOKUP (funny, but still works)

Try these variants:

  • “Lookup the value in column C where A = G5.”
  • “Give me an XLOOKUP formula for ID in A and name in B.”

Evaluating a small model on lookups is eye-opening — it reveals where reasoning lives.

🥊 Round 3 — T.TEST: Entering the Statistics Arena

Here’s where things get… unpredictable.

Test Prompt:
“Do a two-tailed t-test comparing C2:C50 with D2:D50.”

Expected Output:

=T.TEST(C2:C50, D2:D50, 2, 2)

6SigmaMind’s performance:
⭐⭐☆☆☆ (2/5)

  • It knows T.TEST exists
  • It understands two ranges
  • It sometimes confuses arguments
  • It sometimes produces a more complex formula than needed

This is where future fine-tuning will shine — especially when we introduce the Excel for Statistics Dataset.

Try pushing the model with:

  • “Calculate the correlation between A and B.”
  • “Give me the variance for C2:C100.”
  • “Return the standard deviation for D2:D80.”

These tasks help map out the model’s statistical “baseline.”

📊 What This Battle Test Really Shows

This comparison reveals the nature of tiny models:

✔ They can learn structured logic

Even SUMIFS feels natural to them.

✔ They understand modern Excel functions

XLOOKUP support is impressive for their size.

✔ They have pockets of statistical knowledge

T.TEST, CORREL, STDEV.S — these aren’t trivial.

✔ Their biggest weakness is precision ordering

Not surprising for a 1.7B model.

✔ But they’re fast, responsive, and fun

And that makes experimentation addictive.

🧠 Why This Test Matters

6SigmaMind isn’t trying to beat GPT-4.
It’s demonstrating something different:

Tiny, specialized models can already handle real tasks —
and with the right fine-tuning, they get dramatically better.

The next steps:

  • Build a dataset of 1,000+ Excel tasks
  • Fine-tune the model on structured examples
  • Evaluate accuracy across domains
  • Compare against other small models: Phi-3 Mini, Qwen 3B, Gemma 2B
  • Release benchmark results publicly

This project is as much about education as performance.

🎯 Your Turn: Try the Battle Test Yourself

Here’s how to play along:

  1. Open the demo
  2. Copy/paste these three prompts:

Prompt 1 — SUMIFS
“Sum values in C where B equals ‘Closed’.”

Prompt 2 — XLOOKUP
“Return the value in column D for the ID in A matching H2.”

Prompt 3 — T.TEST
“Perform a two-tailed t-test for C2:C50 vs D2:D50.”

  1. Compare the output
  2. Share the results — good, bad, weird — it all helps

👉 Try the 6SigmaMind battle test:
https://huggingface.co/spaces/benkemp/6SigmaMindv2

🚀 Coming Next in the Series

The next articles will explore:

  • How Small Models Understand Excel Logic
  • Why Domain-Specific Fine-Tuning Changes Everything
  • Excel for Statistics: Training a Mini-Model on Real Data
  • How 6SigmaMind Compares to Qwen / Phi-3 / Gemma / SmolLM2
  • The Road to 6SigmaMind v2

We’re building this project in the open — and you’re part of it.

Latest Articles