o3-mini pricing
OpenAI reasoning model. $1.10 per 1M input, $4.40 per 1M output. Built for hard reasoning tasks with hidden chain-of-thought tokens.
- Provider
- OpenAI
- Context window
- 200K tokens
- Max output
- 100K tokens
- Released
- 2026-01
What o3-mini is best for
- ✓Math, coding, and STEM reasoning
- ✓Multi-step planning tasks
- ✓Agent steps where the LLM has to think before responding
- ✓Tasks where chain-of-thought visibility (developer-only) matters
- ✓Workloads previously handled by o1 at lower cost
Monthly cost scenarios
Real-world estimates at common usage levels. Numbers assume no caching, no batching, and the standard tier price.
| Use case | In / req | Out / req | Reqs / mo | Monthly |
|---|---|---|---|---|
| Reasoning sub-step | 2,000 | 3,000 | 10,000 | $154.00 |
| Code reviewer | 8,000 | 4,000 | 5,000 | $132.00 |
| Hard agent planner | 4,000 | 5,000 | 20,000 | $528.00 |
| Math tutor | 1,500 | 3,500 | 50,000 | $852.50 |
| High-volume reasoning | 2,500 | 4,000 | 100,000 | $2035.00 |
Alternatives to o3-mini
GPT-4o
General-purpose at $2.50 input / $10 output. Better for chat, vision, and tool use. Worse at multi-step reasoning. Use as the default and route hard steps to o3-mini.
GPT-4o-mini
Cheaper general-purpose at $0.15 input / $0.60 output. Faster but not designed for reasoning. Use for routing and classification.
Claude 3.5 Sonnet
Strong reasoning and long-form writing at $3.00 input / $15 output. Higher cost but better at writing-heavy reasoning tasks.
Track o3-mini usage with Spanlens
Spanlens captures every o3-mini call with input + output tokens, exact cost, latency, and full request body. One line of code or a baseURL swap. Open source MIT licensed, self-hostable.
FAQ
What is the o3-mini cost per 1M tokens?
o3-mini is priced at $1.10 per 1M input tokens and $4.40 per 1M output tokens. Cached input is $0.55 per 1M (50% off). Note that output tokens include hidden reasoning tokens, which can be significant.
Why are reasoning tokens billed as output?
OpenAI reasoning models generate an internal chain-of-thought before producing the visible answer. Those reasoning tokens are billed at the output rate even though they are not returned to your application. For a 3000-token visible response, you might be billed for 8000+ output tokens including reasoning.
When should I use o3-mini instead of GPT-4o?
For multi-step reasoning, math, coding review, and planning tasks where chain-of-thought helps. o3-mini typically outperforms GPT-4o on these benchmarks. For chat, vision, and simple instruction-following, GPT-4o is faster and often cheaper because o3-mini consumes reasoning tokens you do not see.
Is o3-mini cheaper than o1?
Yes. o1 was priced at $15 input / $60 output. o3-mini at $1.10 / $4.40 is roughly 14x cheaper on input and output while matching or exceeding o1 on most reasoning benchmarks.
Does o3-mini support function calling and structured outputs?
Yes. Function calling, parallel function calling, and structured outputs are all supported. Note that streaming TTFT is slower than GPT-4o because the model thinks first.
How do I track reasoning token cost?
Capture the usage.completion_tokens_details.reasoning_tokens field returned by the API. Spanlens stores it as a separate column so you can split visible-output vs reasoning cost per request. See /integrations/openai.
Last updated 2026-06-16. Prices in USD at the standard tier. Spot something out of date? Tell us.