Per-million-row cost across Haiku 4.5, GPT-5 nano, Gemini 3 Flash, and a self-hosted Llama on a single L4 — with prompt caching, batch API, and structured output flags actually turned on. Most teams over-spec the model: 80% of 'AI classification' work is solved by Haiku 4.5 with caching at sub-$0.20 per 1k rows.
This piece is being expanded into a full long-form article in the coming weeks. We publish each insight once the engagement it draws from has settled enough that we can name the trade-offs honestly — not while a pattern is still proving itself in production.
Field notes ship when the engagement they came from has stopped surprising us, not before.
Want the long-form version when it lands? Or want to skip ahead and talk through the same questions for your own company?