Best LLM for spreadsheet QA: a field test across five real finance workbooks.

Five messy real workbooks (close-pack, AP ageing, sales commission, headcount plan, FP&A consolidation) tested against Claude, GPT-5, and Gemini for formula-error detection, broken references, and tab-to-tab consistency. Claude wins on long-context multi-sheet reasoning. Gemini wins on cell-level formula checks.

This piece is being expanded into a full long-form article in the coming weeks. We publish each insight once the engagement it draws from has settled enough that we can name the trade-offs honestly — not while a pattern is still proving itself in production.

Field notes ship when the engagement they came from has stopped surprising us, not before.

Want the long-form version when it lands? Or want to skip ahead and talk through the same questions for your own company?

Or skip ahead and talk through it directly

More from the same beat.

Where to put your AI lead: under the CTO, COO, or standalone?

Hire an AI engineer, or upskill an existing one? The honest maths at 50, 200, and 500 people.

Best LLM for sales call summaries: why most teams are buying the wrong tier.

Want a custom brief for your industry?