← LLM cost tracking

o3-mini pricing

OpenAI reasoning model. $1.10 per 1M input, $4.40 per 1M output. Built for hard reasoning tasks with hidden chain-of-thought tokens.

Input
$1.10
per 1M tokens
Cached input: $0.550 per 1M
Output
$4.40
per 1M tokens
Provider
OpenAI
Context window
200K tokens
Max output
100K tokens
Released
2026-01

What o3-mini is best for

  • Math, coding, and STEM reasoning
  • Multi-step planning tasks
  • Agent steps where the LLM has to think before responding
  • Tasks where chain-of-thought visibility (developer-only) matters
  • Workloads previously handled by o1 at lower cost

Monthly cost scenarios

Real-world estimates at common usage levels. Numbers assume no caching, no batching, and the standard tier price.

Use caseIn / reqOut / reqReqs / moMonthly
Reasoning sub-step2,0003,00010,000$154.00
Code reviewer8,0004,0005,000$132.00
Hard agent planner4,0005,00020,000$528.00
Math tutor1,5003,50050,000$852.50
High-volume reasoning2,5004,000100,000$2035.00

Alternatives to o3-mini

GPT-4o

General-purpose at $2.50 input / $10 output. Better for chat, vision, and tool use. Worse at multi-step reasoning. Use as the default and route hard steps to o3-mini.

GPT-4o-mini

Cheaper general-purpose at $0.15 input / $0.60 output. Faster but not designed for reasoning. Use for routing and classification.

Claude 3.5 Sonnet

Strong reasoning and long-form writing at $3.00 input / $15 output. Higher cost but better at writing-heavy reasoning tasks.

Track o3-mini usage with Spanlens

Spanlens captures every o3-mini call with input + output tokens, exact cost, latency, and full request body. One line of code or a baseURL swap. Open source MIT licensed, self-hostable.

FAQ

What is the o3-mini cost per 1M tokens?

o3-mini is priced at $1.10 per 1M input tokens and $4.40 per 1M output tokens. Cached input is $0.55 per 1M (50% off). Note that output tokens include hidden reasoning tokens, which can be significant.

Why are reasoning tokens billed as output?

OpenAI reasoning models generate an internal chain-of-thought before producing the visible answer. Those reasoning tokens are billed at the output rate even though they are not returned to your application. For a 3000-token visible response, you might be billed for 8000+ output tokens including reasoning.

When should I use o3-mini instead of GPT-4o?

For multi-step reasoning, math, coding review, and planning tasks where chain-of-thought helps. o3-mini typically outperforms GPT-4o on these benchmarks. For chat, vision, and simple instruction-following, GPT-4o is faster and often cheaper because o3-mini consumes reasoning tokens you do not see.

Is o3-mini cheaper than o1?

Yes. o1 was priced at $15 input / $60 output. o3-mini at $1.10 / $4.40 is roughly 14x cheaper on input and output while matching or exceeding o1 on most reasoning benchmarks.

Does o3-mini support function calling and structured outputs?

Yes. Function calling, parallel function calling, and structured outputs are all supported. Note that streaming TTFT is slower than GPT-4o because the model thinks first.

How do I track reasoning token cost?

Capture the usage.completion_tokens_details.reasoning_tokens field returned by the API. Spanlens stores it as a separate column so you can split visible-output vs reasoning cost per request. See /integrations/openai.

Last updated 2026-06-16. Prices in USD at the standard tier. Spot something out of date? Tell us.