o3-mini pricing

OpenAI reasoning model. $1.10 per 1M input, $4.40 per 1M output. Built for hard reasoning tasks with hidden chain-of-thought tokens.

Input

$1.10

per 1M tokens

Cached input: $0.550 per 1M

Output

$4.40

per 1M tokens

Provider: OpenAI
Context window: 200K tokens
Max output: 100K tokens
Released: 2026-01

What o3-mini is best for

✓Math, coding, and STEM reasoning
✓Multi-step planning tasks
✓Agent steps where the LLM has to think before responding
✓Tasks where chain-of-thought visibility (developer-only) matters
✓Workloads previously handled by o1 at lower cost

Monthly cost scenarios

Real-world estimates at common usage levels. Numbers assume no caching, no batching, and the standard tier price.

Use case	In / req	Out / req	Reqs / mo	Monthly
Reasoning sub-step	2,000	3,000	10,000	$154.00
Code reviewer	8,000	4,000	5,000	$132.00
Hard agent planner	4,000	5,000	20,000	$528.00
Math tutor	1,500	3,500	50,000	$852.50
High-volume reasoning	2,500	4,000	100,000	$2035.00

Alternatives to o3-mini

GPT-4o

General-purpose at $2.50 input / $10 output. Better for chat, vision, and tool use. Worse at multi-step reasoning. Use as the default and route hard steps to o3-mini.

GPT-4o-mini

Cheaper general-purpose at $0.15 input / $0.60 output. Faster but not designed for reasoning. Use for routing and classification.

Claude 3.5 Sonnet

Strong reasoning and long-form writing at $3.00 input / $15 output. Higher cost but better at writing-heavy reasoning tasks.

Track o3-mini usage with Spanlens

Spanlens captures every o3-mini call with input + output tokens, exact cost, latency, and full request body. One line of code or a baseURL swap. Open source MIT licensed, self-hostable.

Start free →OpenAI integration guide Official pricing ↗

FAQ

What is the o3-mini cost per 1M tokens?

o3-mini is priced at $1.10 per 1M input tokens and $4.40 per 1M output tokens. Cached input is $0.55 per 1M (50% off). Note that output tokens include hidden reasoning tokens, which can be significant.

Why are reasoning tokens billed as output?

OpenAI reasoning models generate an internal chain-of-thought before producing the visible answer. Those reasoning tokens are billed at the output rate even though they are not returned to your application. For a 3000-token visible response, you might be billed for 8000+ output tokens including reasoning.

When should I use o3-mini instead of GPT-4o?

For multi-step reasoning, math, coding review, and planning tasks where chain-of-thought helps. o3-mini typically outperforms GPT-4o on these benchmarks. For chat, vision, and simple instruction-following, GPT-4o is faster and often cheaper because o3-mini consumes reasoning tokens you do not see.

Is o3-mini cheaper than o1?

Yes. o1 was priced at $15 input / $60 output. o3-mini at $1.10 / $4.40 is roughly 14x cheaper on input and output while matching or exceeding o1 on most reasoning benchmarks.

Does o3-mini support function calling and structured outputs?

Yes. Function calling, parallel function calling, and structured outputs are all supported. Note that streaming TTFT is slower than GPT-4o because the model thinks first.

How do I track reasoning token cost?

Capture the usage.completion_tokens_details.reasoning_tokens field returned by the API. Spanlens stores it as a separate column so you can split visible-output vs reasoning cost per request. See /integrations/openai.

Last updated 2026-06-16. Prices in USD at the standard tier. Spot something out of date? Tell us.