Cost tracking

Every request row carries a cost_usd field computed at ingest time from your actual token usage and the provider's current list price. No approximations, no post-hoc math in the dashboard — the number is deterministic and auditable.

Why it matters

Provider dashboards show spend a day late and at the billing-period level. That's useless for the daily question — “is my new feature's LLM usage going to blow up the budget?” Spanlens computes cost the moment the response lands, so you can track per-minute, per-project, per-user cost without waiting for an invoice.

How it works

Price table

apps/server/src/lib/cost.ts ships a curated MODEL_PRICES map — USD per 1M tokens, separately for prompt and completion. Snapshot as of 2026-04:

'gpt-4o':                         { prompt: 2.5,   completion: 10   }
'gpt-4o-mini':                    { prompt: 0.15,  completion: 0.6  }
'gpt-4-turbo':                    { prompt: 10,    completion: 30   }
'claude-opus-4-7':                { prompt: 15,    completion: 75   }
'claude-sonnet-4-6':              { prompt: 3,     completion: 15   }
'claude-haiku-4-5-20251001':      { prompt: 0.8,   completion: 4    }
'gemini-1.5-pro':                 { prompt: 1.25,  completion: 5    }
'gemini-1.5-flash':               { prompt: 0.075, completion: 0.3  }
// ...
ts

The formula

promptCost     = (promptTokens     / 1_000_000) * price.prompt
completionCost = (completionTokens / 1_000_000) * price.completion
totalCost      = promptCost + completionCost
ts

The dated-variant problem (critical gotcha)

OpenAI returns dated variants in the model field of the response body (e.g. you request gpt-4o-mini and get back gpt-4o-mini-2024-07-18). That dated string is what lands in requests.model. Naive lookup against MODEL_PRICES['gpt-4o-mini'] would miss and return null.

calculateCost() handles this by:

  1. Exact match first — if gpt-4o-mini-2024-07-18 is in the table, use it.
  2. Otherwise, longest boundary-aware prefix match. The model id must start with a registered key followed by -, so gpt-4 does not accidentally match gpt-4o-mini.

The same matching pattern is reused by Savings and any future feature that keys on model family.

Graceful degradation on unknown models

If a request comes in for a model we don't have pricing for (brand-new release, fine-tuned custom model), calculateCost() returns null and the row's cost_usd is NULL. Dashboard filters this out of cost aggregates — we never estimate or fabricate. The gap is visible, not hidden.

Fix: open a PR to add the model to MODEL_PRICES with the provider's official rate. Backfill isn't retroactive; cost appears on new requests only.

Using it

Programmatic access

Cost is a first-class field on every request. Fetch via:

# All requests with cost, last 7 days
GET /api/v1/requests?sinceHours=168

# Aggregate cost by model
GET /api/v1/stats?sinceHours=720&groupBy=model
# → [
#     { "model": "gpt-4o", "requestCount": 1204, "totalCostUsd": 12.84 },
#     { "model": "gpt-4o-mini", "requestCount": 42103, "totalCostUsd": 6.32 }
#   ]
bash

Per-project rollup

Pass projectId to scope. Useful for chargeback when multiple teams share one Spanlens org:

GET /api/v1/stats?projectId=<uuid>&sinceHours=720&groupBy=model
bash

Design choices

  • Computed at ingest, not at read time. Freezes the price at the moment of the call. If OpenAI drops gpt-4o prices tomorrow, your historical cost data doesn't retroactively change. Audit-friendly.
  • Stored as numeric(14,8). 8 decimal places of precision — enough to represent fractional cents on very cheap models without rounding error.
  • Table, not API. We don't fetch live prices from provider APIs — they don't expose one. Hand-maintained table is the reality for the industry.

Limitations

  • Price table drifts. When a provider changes prices, our table needs a PR. Tracked as a monthly maintenance item. If you're self-hosting, pin a specific commit or expect to cherry-pick updates.
  • No cache-token pricing separate line yet. Anthropic's cache_read_input_tokens and cache_creation_input_tokens are currently folded into prompt tokens. We're adding separate accounting so cost reflects the 10× discount. Roadmap.
  • No batch API discount. OpenAI and Anthropic both offer ~50% off batch calls. Spanlens treats a batch response as a normal request. Mark batch traffic with a custom tag and filter manually for now.

Note: this page is about per-request USD cost (how much each LLM call cost you at the provider). For Spanlens subscription billing — plan quotas, overage rates, invoicing — see Billing & quotas.

Related: Requests, Savings, /dashboard. Source: apps/server/src/lib/cost.ts.