Enterprise Model-as-a-Service (MaaS) AI companies such as Anthropic and OpenAI are facing rapid consolidations and disruption during this unprecedented high tech gold rush. Powered by debt, providers are sprinting to evolve model capabilities while also anticipating capacity. Finally, they have to figure out a fee structure that covers the costs.

Flat fee subscription pricing (as opposed to pay-per-use) is leaving many MaaS startups vulnerable to expensive cost overruns, and extended profitability timelines. Large and unexpected consolidations are already underway.

Providers feel the squeeze

Many AI vendors (including the biggest names) are already running at full capacity. The cost of AI tokens is improving, but backend processing needs and capex spending on mega-data centers and mind-bending GPU purchases is exploding. McKinsey estimates global data center capex at ~$6.7T by 2030, mostly driven by AI. Meta alone spent $66–72B on AI infrastructure in 2025. That pressure rolls downhill to pricing, usage-related costs, and pressure from equity partners for profitability. (McKinsey & Company, TechCrunch)

At the same time, we hear a lot about immense growth running up against hard costs. OpenAI embodies this: its valuation and usage are sky-high, but the risk is significant if market growth fails to deliver. The key theme is scale: big growth, big financial growth, and big dependencies. (Reuters, The Washington Post)

Ethan Ding’s “Short Squeeze” framing is real

If you want a crisp narrative for why costs feel weird even as infrastructure gets cheaper and growth skyrockets, read the following articles by Ethan Ding:

windsurf gets margin called — on how flat-rate pricing meets frequent, token-intensive usage. (ethanding.substack.com)
ai subscriptions get short-squeezed — why “all-you-can-eat,” unlimited usage plans break when agents and chain of thought (CoT) processing churn tokens, exceeding what the companies have prepared to pay.

Ding captures the current economic challenges facing AI companies:

The better models trigger longer traces (meaning that they use more tokens per solved task), thereby costing more per task.
Clumsily programmed agents often iterate until they solve a task, using excessive tokens and blowing up costs.

Not everyone will make it

Let’s be frank: while most leading providers are stable, not all vendors will survive the early capital funding cycle. Most MaaS companies are already leaning on debt to drive growth; they are already operating at a loss. Expect more consolidation, repricing, and the occasional abrupt EOL announcement as 2025 and 2026 play out. In fact, plan on it. (SiliconANGLE)

90 day plan for AI builders

AI builders need to tackle this three-month plan as soon as possible:

1) Prepare for a token diet

Prepare for increased token costs (or quotas) and explore caching results from your agents and workloads (for example, prompt + retrieval + tool results).
Define target tokens-per-business-outcome for the top 10 compute tasks as what we measure tends to improve.
Are you reliant on Chain of Thought (or other token heavy interactions) to get your outputs? If so, do you need it? Can you reserve it for only those tasks that require it?

2) Design for portability and churn

Abstract the model layer (focus on capabilities, not brand names).
Maintain a minimum of two production vendors plus one local/edge fallback vendor.
Run portability drills. You must be able to measure quality loss with other vendors and ensure that loss is within acceptable bounds. This is a critical step toward portability, ensuring that you can move to other providers.

3) Reduce risk with local and smaller models

Break down AI workflows/usage into smaller pieces and wrap them with tests.
Look at quantized and lighter models for narrow jobs (such as classification, extraction, routing).
Use hosted frontier models only where they beat smaller/local ones on measured quality. Treat these as high risk integrations and very cautiously rely on them.
Do not assume that reasoning models (such as ChatGPT 5) are the default models of choice; enable token-heavy CoT (Chain of Thought) features selectively.

4) Negotiate contracts with eyes open

Look for price-protection and sufficient notification regarding any service or pricing changes.
Demand explicit data-handling and deletion SLAs (to learn more about the pitfalls of poor data-handling and deletion SLAs, see the 23andMe bankruptcy).
Look for exit clauses and migration support should a service sunset or abruptly and severely change prices.

Bottom line for builders

Embrace AI, but build like you’re sailing in choppy water. The weather is likely to change, and the remotely hosted models you depend on will likely at some point break, or become less available.

Want help devising a testing strategy for AI applications? Give us a call.