Small AI models are getting powerful enough to become real tools — fast, cheap, and accurate in specialized domains.
To push the limits of 6SigmaMind, I built a public benchmark where three cutting-edge small models answer the same Excel prompt side by side:

🔹 SmolLM-1.7B — the 6SigmaMind baseline

🔹 Qwen2.5-1.5B-Instruct — one of the most capable small instruction models

🔹 DeepSeek R1 Distill Llama-1.5B — the new reasoning-focused small model

👉 Try all 3 live: https://huggingface.co/spaces/benkemp/6SigmaMindV3

👉 The python code used for the benchmarking available here on Github

👉 The benchmark side-by-side comparison of the SLMs is available here on Google docs

👉 The Excel Function benchmark testing excel sheet is available here

These three models are all in the 1.5–1.7B range — small enough to run on CPU, but smart enough to generate valid Excel logic.

Let’s see how they behave.

⚡ Why Compare These Three Models?

Because the future of AI is not only about scale — it’s about specialization.

Small models:

run anywhere
respond quickly
cost nothing to operate
can be fine-tuned into extremely focused assistants

6SigmaMind is designed to answer one question:

How good can a tiny model become at Excel formulas?

To find out, we test three different architectures, each with its own strengths.

🥊 The 6SigmaMind Battle Arena

One prompt → three different reasoning styles.

Try these prompts in the Space to see the differences:

“Sum values in column C where column B equals ‘Closed’.”
“Return the last non-empty cell in column B.”
“Calculate correlation between A and B.”
“Perform a two-tailed t-test between C2:C50 and D2:D50.”
“XLOOKUP the price in D where A matches H2.”

Each model gives a unique “flavor” of output.

🧪 1. SmolLM-1.7B — The 6SigmaMind Baseline

⭐ Strengths

Fastest responses
Strong at SUMIFS / COUNTIFS
Reliable IF, AND, OR logic
Good at simple lookups
Concise formulas

⚠️ Weaknesses

Argument ordering mistakes
Rare statistical errors
Occasionally echoes the prompt

🎯 Best use

Lightweight, embedded Excel assistants.

🧪 2. Qwen2.5-1.5B-Instruct

⭐ Strengths

Most consistent reasoning of the small trio
Excellent at modern Excel functions (XLOOKUP, FILTER)
Structure-aware (very good argument ordering)
Handles unusual phrasings better than SmolLM

⚠️ Weaknesses

Sometimes too verbose
Occasionally over-explains unless constrained

🎯 Best use

When accuracy and structure matter most.

🧪 3. DeepSeek R1 Distill Llama-1.5B

This model is the newest in the group — designed with a distilled-reasoning approach, meaning it tries to “think” more carefully than typical 1B–2B models.

⭐ Strengths

Very strong reasoning for its size
Good at multi-step formula logic
Often provides notably precise function choices
Handles text-processing functions well (LEFT, MID, SEARCH)

⚠️ Weaknesses

Sometimes produces “explanatory” text unless guided
Statistical formulas can be hit-or-miss
Sometimes generates slightly unusual but logically valid formulas

🎯 Best use

Exploring how reasoning-distilled small models behave on structured tasks.

🧠 What the Side-by-Side Results Reveal

✔ Small models already understand Excel semantics

All three know:
SUMIFS, COUNTIFS, IF logic, XLOOKUP, STDEV.S, CORREL, FILTER.

✔ They each have unique reasoning tendencies

SmolLM → direct & fast
Qwen → structured & accurate
DeepSeek R1 → reflective & reasoning-driven

✔ 1.5–1.7B is a “sweet spot”

Large enough to handle structured tasks,
small enough to run anywhere.

✔ Fine-tuning will make the difference

With a specialized Excel dataset, these models will jump another level.

📊 Mini-Benchmark: Example Outputs

Prompt:

“Return the last non-empty value in column B.”

Model	Typical Output
SmolLM-1.7B	`=LOOKUP(2,1/(B:B<>""),B:B)`
Qwen-1.5B	Same as above, very consistent
DeepSeek R1	Same formula, sometimes adds brief reasoning

Prompt:

“Calculate the standard deviation of B2:B80.”

Model	Typical Output
SmolLM-1.7B	`=STDEV.S(B2:B80)`
Qwen-1.5B	Most reliable answer
DeepSeek R1	Usually correct, sometimes extra commentary

Prompt:

“Perform a two-tailed t-test on C2:C50 vs D2:D50.”

Model	Typical Output
SmolLM-1.7B	Attempts but inconsistent
Qwen-1.5B	`=T.TEST(C2:C50, D2:D50, 2, 2)` (most accurate)
DeepSeek R1	Usually correct, but may attempt an alternate approach

🚀 Why This Matters for 6SigmaMind

Because 6SigmaMind isn’t just a demo —
it’s the start of a small-model specialization journey.

This comparison shows:

Small models are already close to what’s needed
Fine-tuning will push them from “good” to “expert”
A benchmark helps identify which model becomes 6SigmaMindv3

This is the foundation for an Excel-focused mini-LLM.

🎯 Try the Live Benchmark Yourself

👉 Launch the model-comparison Space:
https://huggingface.co/spaces/benkemp/6SigmaMindV3

Try prompts like:

“Sum C where B = ‘Closed’.”
“Lookup D where A matches H2.”
“IF A2 > 100 then return ‘High’ else ‘OK’.”
“Correlation between A and B.”
“Two-sample t-test for C2:C50 vs D2:D50.”

You’ll immediately feel which model “thinks” the way you like.

Small models are the future — and 6SigmaMind is being built in the open.

Nano Language Models

6SigmaMind Model Comparison: SmolLM-1.7B vs Qwen-1.5B vs DeepSeek R1 — Which Small Model Understands Excel Best?

🔹 SmolLM-1.7B — the 6SigmaMind baseline

🔹 Qwen2.5-1.5B-Instruct — one of the most capable small instruction models

🔹 DeepSeek R1 Distill Llama-1.5B — the new reasoning-focused small model

⚡ Why Compare These Three Models?

🥊 The 6SigmaMind Battle Arena

One prompt → three different reasoning styles.

🧪 1. SmolLM-1.7B — The 6SigmaMind Baseline

⭐ Strengths

⚠️ Weaknesses

🎯 Best use

🧪 2. Qwen2.5-1.5B-Instruct

⭐ Strengths

⚠️ Weaknesses

🎯 Best use

🧪 3. DeepSeek R1 Distill Llama-1.5B

⭐ Strengths

⚠️ Weaknesses

🎯 Best use

🧠 What the Side-by-Side Results Reveal

✔ Small models already understand Excel semantics

✔ They each have unique reasoning tendencies

✔ 1.5–1.7B is a “sweet spot”

✔ Fine-tuning will make the difference

📊 Mini-Benchmark: Example Outputs

Prompt:

Prompt:

Prompt:

🚀 Why This Matters for 6SigmaMind

🎯 Try the Live Benchmark Yourself

Latest Articles

Stop Googling Excel Syntax — Let the AI Assistant Handle It

The AI That Understands Your Spreadsheet — User Edition

6SigmaMind Model Comparison: SmolLM-1.7B vs Qwen-1.5B vs DeepSeek R1 — Which Small Model Understands Excel Best?

🔹 SmolLM-1.7B — the 6SigmaMind baseline

🔹 Qwen2.5-1.5B-Instruct — one of the most capable small instruction models

🔹 DeepSeek R1 Distill Llama-1.5B — the new reasoning-focused small model

⚡ Why Compare These Three Models?

🥊 The 6SigmaMind Battle Arena

One prompt → three different reasoning styles.

🧪 1. SmolLM-1.7B — The 6SigmaMind Baseline

⭐ Strengths

⚠️ Weaknesses

🎯 Best use

🧪 2. Qwen2.5-1.5B-Instruct

⭐ Strengths

⚠️ Weaknesses

🎯 Best use

🧪 3. DeepSeek R1 Distill Llama-1.5B

⭐ Strengths

⚠️ Weaknesses

🎯 Best use

🧠 What the Side-by-Side Results Reveal

✔ Small models already understand Excel semantics

✔ They each have unique reasoning tendencies

✔ 1.5–1.7B is a “sweet spot”

✔ Fine-tuning will make the difference

📊 Mini-Benchmark: Example Outputs

Prompt:

Prompt:

Prompt:

🚀 Why This Matters for 6SigmaMind

🎯 Try the Live Benchmark Yourself

Share this:

Latest Articles

Stop Googling Excel Syntax — Let the AI Assistant Handle It

The AI That Understands Your Spreadsheet — User Edition