The cheapest LLM for high-volume classification (when you have a million rows).

Per-million-row cost across Haiku 4.5, GPT-5 nano, Gemini 3 Flash, and a self-hosted Llama on a single L4 — with prompt caching, batch API, and structured output flags actually turned on. Most teams over-spec the model: 80% of 'AI classification' work is solved by Haiku 4.5 with caching at sub-$0.20 per 1k rows.

This piece is being expanded into a full long-form article in the coming weeks. We publish each insight once the engagement it draws from has settled enough that we can name the trade-offs honestly — not while a pattern is still proving itself in production.

Field notes ship when the engagement they came from has stopped surprising us, not before.

Want the long-form version when it lands? Or want to skip ahead and talk through the same questions for your own company?

Or skip ahead and talk through it directly

More from the same beat.

What AI integration actually costs a 50–500 person company.

Self-hosted GPU vs hyperscaler: the break-even maths most CFOs get wrong.

The six patterns that stop an agent loop from burning $40,000 overnight.

Want a custom brief for your industry?