Self-hosted GPU vs hyperscaler: the break-even maths most CFOs get wrong.

Most 'buy or rent' arguments leave out the engineering tax — the hours your team spends running a fleet that AWS would have run for you. We model unit economics with three honest variables: tokens per day, fleet utilisation, depreciation horizon. The break-even is not where the vendor pages claim it is.

Every CTO eventually runs the spreadsheet. AWS p5 instances vs an H100 fleet in a colo. The hyperscaler cost stacks up; the self-hosted cost looks like a bargain; the spreadsheet says ship the rack. Two years later the bargain has become an MLOps team and a depreciation schedule that nobody wants to defend.

The break-even maths are real. The maths are also incomplete in 90% of the spreadsheets we have seen. Below: the three honest variables, the engineering tax most cost models miss, and where the line actually sits.

The three honest variables

Tokens per day

The denominator on every cost model. Most teams overestimate by 3-5x. They model peak day, not average. They include token volume the system processes internally (chain-of-thought, tool calls, retries) without flagging it. The honest number is the one you measure in production for at least four weeks.

Fleet utilisation

The variable hyperscaler models elide and self-hosted models pretend is 80%. Real fleet utilisation in steady state is 30-50% unless you have built sophisticated load shaping. At 35% utilisation, your H100 unit economics get cut by ~3x against the spreadsheet's assumption. This is the single biggest place self-hosted models lie.

Depreciation horizon

An H100 costs ~$30k. A 36-month depreciation schedule says it has to do $830/month of useful work. Two-year? $1,250. Twelve months? $2,500. The right horizon is the lifetime of the workload, not the hardware. If the workload changes shape every 18 months — and it will, given current model release cadence — your effective depreciation is shorter than the hardware's physical lifespan.

Self-hosting wins when your workload is stable, your utilisation is high, and you already pay an MLOps team. Two of three is not enough.

The engineering tax most spreadsheets skip

The cost the cost models miss is the team. Running a serious GPU fleet means: cluster orchestration, monitoring, on-call, capacity planning, driver and CUDA upgrades, security patching, and the ongoing argument with finance about whether the next box is necessary. For a fleet of 4-8 H100s, that is at least 1.5 FTE of senior infra time. At a loaded cost of €180k/year, your engineering tax is €270k/year — most of which never appears in the build vs buy slide.

Hyperscaler vs self-hosted is not a hardware decision. It's a 'do we want to be in the GPU operations business' decision. If the answer is no, the cost analysis is over.

Where the break-even actually sits

For inference workloads under 50 million tokens/day, hyperscaler pricing is almost always cheaper once the engineering tax is included. Above 200 million tokens/day with reasonable utilisation discipline, self-hosting starts to make sense. Between 50M and 200M, it depends entirely on your team and your workload stability.

The framing question for the next CFO conversation: at our current volume, what would the second decimal place of utilisation cost or save us? If the answer is "less than the cost of one engineer" — and for most SMBs it is — the spreadsheet was the wrong question.

Or skip ahead and talk through it directly

The three honest variables

Tokens per day

Fleet utilisation

Depreciation horizon

The engineering tax most spreadsheets skip

Where the break-even actually sits

More from the same beat.

What AI integration actually costs a 50–500 person company.

The six patterns that stop an agent loop from burning $40,000 overnight.

Which Claude model should you use for code review? Sonnet 4.6 vs Opus 4.6 vs Haiku 4.5.

Want a custom brief for your industry?