Six questions — data freshness, volume, latency budget, accuracy floor, governance posture, engineering headcount — route each use case to one of four patterns. We argue 70% of SMB AI projects pick fine-tuning when RAG-with-rerank would have been ten times cheaper. The three signals that actually justify a fine-tune are at the bottom.

Every SMB engineering team eventually sits in a meeting where someone says 'we should fine-tune'. Usually the answer is no. The pattern most teams need is a structured prompt over RAG with a rerank step, and they reach for fine-tuning because the literature makes it sound like the serious choice.

Below is the decision tree we walk teams through. Six questions. Four named outcomes. The whole thing fits on a meeting room whiteboard, which is the point.

The six questions

  1. How fresh does the answer need to be? If the underlying knowledge changes more than once a week, fine-tuning is the wrong answer regardless of what else you optimise for.
  2. What is your daily volume? Below ~10k calls/day, prompting + RAG is almost always cheaper than fine-tuning, even at scale models.
  3. What is your latency budget? RAG adds 200-600ms. Fine-tuning of a smaller model can pull that down. If you need <300ms p95, this matters.
  4. What is your accuracy floor? Below 80%, prompting is fine. 80-95% is where structured prompts + RAG + rerank earn their keep. Above 95% on a narrow task, fine-tuning starts to win.
  5. What is your governance posture? Regulated outputs need an explainability story. RAG retrieval citations are easier to defend than fine-tuned weights.
  6. What is your engineering headcount? Fine-tuning means MLOps. If you do not have an MLOps capability, you are buying it whether you priced it or not.

The four outcomes

Structured prompt only

Best for: anything where the input is small, the output is structured, and the knowledge needed is general (or fits in the prompt). Email triage, classification under 50 categories, simple summarisation. Cheapest path to production. Lowest ceiling.

RAG (with rerank)

Best for: question-answering over documents you own, customer support deflection, internal copilots, sales call summaries grounded in CRM data. Three-quarters of the SMB AI projects we see should be here. The rerank step is what separates serious RAG from toy RAG — without it, retrieval pulls in too much noise and the model latches on to the wrong chunk.

Fine-tuning

Best for: narrow tasks with high volume, stable knowledge, and an accuracy ceiling that prompting cannot reach. Three signals justify it: (1) you have at least 5k labelled examples, (2) the task is so narrow that a smaller model fine-tuned on it beats a frontier model prompted, and (3) your inference volume is high enough that the smaller-model cost saving pays for the MLOps overhead.

If you cannot list the three signals, you do not need to fine-tune.

Hybrid (RAG + fine-tune)

Rare. Justified when the knowledge changes (RAG) and the output style is so specific the model cannot reach it from prompts alone (fine-tune). Underwriting summary generation is the canonical example. Costs more than either path alone — only do this when you can articulate why both layers are required.

A worked example

A 200-person fintech wants to summarise customer support transcripts and route the urgent ones. Knowledge changes daily (new products, new policies). Volume is 4k tickets/day. Latency budget is generous (overnight is fine for routing). Accuracy floor is 88%. No MLOps team.

Decision: structured prompt + RAG over the policy and product docs. Total build time: ~6 weeks. Ongoing inference: €400/month. If the team had picked fine-tuning, they would have spent 4-5 months building an MLOps practice they did not need, missed the latency advantage they did not need, and ended up with a model that goes stale every time a new product ships.

The decision tree above would have saved them four months. Run it before your next AI architecture meeting and watch the conversation get shorter.

Or skip ahead and talk through it directly