Software engineer. Writing and building things.

Cost patterns for AI-backed apps

April 2026

I've been building an app that uses Claude for two distinct jobs: tagging wardrobe items from photos, and generating daily outfit suggestions. The naive approach — call a capable model for everything, every time — would cost somewhere between "unsustainable" and "immediately bankrupt." The approach I ended up with costs almost nothing at scale. Here's the pattern.

The observation

There are two kinds of AI work in most apps:

The pattern: do the expensive work once, store the result, run the cheap work against the stored result forever.

Applied to the wardrobe app

When a user uploads a photo of a shirt, I send it to claude-sonnet-4-6 with a forced tool call that returns structured tags: color, fabric, category, formality, season, a short description. That call costs real money. It happens once per item, ever. The tags go into the database.

When generating outfit suggestions, I never look at images again. I pull the text tags for all items, assemble a prompt, and call claude-haiku-4-5 — dramatically cheaper — to pick combinations and write the rationale. The model is reasoning over text, not perceiving images. It's very good at that for much less money.

Prompt caching on top of that

The user's wardrobe doesn't change between breakfast and lunch. So I generate suggestions once per day and cache them in the database. Most app opens hit zero model calls — just a database read.

For users whose wardrobe hasn't changed, the suggestion prompt is nearly identical day over day. Claude's prompt caching means repeated similar prompts are heavily discounted.

The numbers roughly

Tagging 50 items: ~50 vision calls at Sonnet pricing. One-time cost per user, paid at onboarding pace.

Daily suggestions: 1 Haiku call per day, cached result serves all opens. With prompt caching on wardrobe-heavy users, marginal cost approaches zero.

Roughly 90% cost reduction versus calling Sonnet for everything, every time.

The general pattern

  1. Identify which AI work is perception (unstructured → structured) and which is reasoning (structured → decision).
  2. Use your best model for perception. Accept the cost — it's a one-time extraction.
  3. Store the structured result durably. This is your asset.
  4. Use a cheaper model for reasoning against that result.
  5. Cache aggressively at the application layer. Most AI calls in a mature app should be skippable.

Where it breaks down

If the structured extraction can't capture what downstream reasoning needs, you're stuck calling the big model at reasoning time too. This happens when the interesting signal is visual and can't be expressed in text — fine-grained texture, exact color matching, fit details.

Test the quality of your extraction early. If it's lossy, the downstream reasoning will be wrong in ways that are hard to debug.

← All essays