Small Language Models (SLMs) aren’t just capable of generating text — they can translate natural language instructions into complex Excel formulas with surprising accuracy. With the right prompting techniques and a well-designed training dataset, an SLM can understand requests like:
- “Find all rows where Sales > 10,000 and Region = ‘West’.”
- “Sum the last 7 entries in column B only if column A equals ‘Active’.”
- “Extract the domain name from an email address.”
…and convert these commands into fully working Excel formulas.
This article shows how SLMs interpret human requests, how prompts should be structured, and how to build an Excel Formula Assistant model using your TinySLM, SmolLM, Granite Nano, or any SLM you’re fine-tuning.
1. Why Excel Formulas Are the Perfect Use Case for Small Models
Excel formulas are:
- Structured
- Predictable
- Deterministic
- Short and syntactically narrow
This makes them ideal for SLMs (350M–1.7B), which excel at tasks requiring:
- Logical pattern recognition
- Formula transformation
- Symbolic reasoning
- Limited context windows
You don’t need GPT-4 for SUMIFS, VLOOKUP, or FILTER.
A specialized SLM can outperform larger models for this niche.
2. Core Prompt Structure for Excel Formula Generation
For consistent results, SLMs respond best to a simple, strict prompt format.
Recommended Prompt Template
Convert the following instruction into a valid Excel formula.
Instruction: "<user_instruction>"
Only return the Excel formula.
Example:
Convert the following instruction into a valid Excel formula.
Instruction: "Sum all values in column E where column B equals 'North'."
Only return the Excel formula.
Expected SLM output:
=SUMIF(B:B,"North",E:E)
This structure ensures:
- Clear intent
- No extra text
- No explanation
- Predictable output
3. Example Inputs & Outputs for Training
These examples help shape your fine-tuning dataset.
Input
Instruction: "Filter rows where F > 100 and G = 'Yes'."
Output
=FILTER(A:Z,(F:F>100)*(G:G="Yes"))
Input
Instruction: "Find the average of C2:C100 but only for rows where column D equals 'Closed'."
Output
=AVERAGEIF(D2:D100,"Closed",C2:C100)
Input
Instruction: "Extract the last 4 characters from each value in column B."
Output
=RIGHT(B:B,4)
These examples help give the model confidence.
Your dataset should contain hundreds to thousands of such pairs.
4. How SLMs Parse Natural Language into Formula Logic
SLMs break down your input into three steps:
Step 1 — Identify the action
Sum, filter, count, average, lookup, extract…
Step 2 — Identify the conditions
“>100”, “equals Closed”, “not empty”, “contains @gmail”
Step 3 — Map the logic to a formula pattern
This mapping is learnable through pattern repetition —
making SLMs incredibly efficient at formula generation.
5. Improving Accuracy with Prompt Engineering
To reduce hallucinations:
a) Force output constraints
Only return one Excel formula with no explanation.
b) Add examples
Few-shot prompting:
Instruction: "Get the first word from each cell in A:A."
Output: =LEFT(A:A, FIND(" ",A:A)-1)
Instruction: "Sum all values in C where B equals 'X'."
Output: =SUMIF(B:B,"X",C:C)
Instruction: "<new_instruction>"
Output:
c) Add formatting rules
Use uppercase function names.
Do not explain.
Return nothing but the formula.
SLMs follow these rules surprisingly well.
6. Fine-Tuning Your Own Excel Formula SLM
To fine-tune a model like SmolLM-360M or Granite 4.0 Nano:
Dataset Format (JSONL)
{"instruction": "Sum all values in column C.", "output": "=SUM(C:C)"}
{"instruction": "Filter rows where G = 1.", "output": "=FILTER(A:Z, G:G=1)"}
Ideal size:
- 5,000 entries = usable model
- 20,000 entries = very accurate
- 50,000+ entries = near-perfect
Train using:
- QLoRA
- PEFT
- Hugging Face TRL
- 4-bit quantization
Within 1–3 hours, you get a powerful Excel assistant running fully local.
7. Deploying the Excel Formula SLM
You can deploy the model into:
- Gradio (Hugging Face Space)
- FastAPI server
- VS Code extension
- Excel add-in
- Local Python desktop app
A simple Gradio interface lets users type:
“Get unique values from column A”
…and instantly receive:
=UNIQUE(A:A)