GPT-4o-mini pricing

OpenAI's small, fast, cheap general-purpose model. $0.15 per 1M input, $0.60 per 1M output. About 15x cheaper than GPT-4o.

Input

$0.150

per 1M tokens

Cached input: $0.075 per 1M

Output

$0.600

per 1M tokens

Provider: OpenAI
Context window: 128K tokens
Max output: 16K tokens
Released: 2024-07

What GPT-4o-mini is best for

✓Intent classification and routing
✓Structured output extraction
✓Short-form generation (one-paragraph replies, summaries)
✓Agent sub-tasks that do not need frontier reasoning
✓High-volume APIs where cost is the primary constraint

Monthly cost scenarios

Real-world estimates at common usage levels. Numbers assume no caching, no batching, and the standard tier price.

Use case	In / req	Out / req	Reqs / mo	Monthly
Intent classifier	300	20	500,000	$28.50
Short chat replies	800	200	100,000	$24.00
Structured extraction	2,000	400	200,000	$108.00
Agent sub-step	1,200	300	1,000,000	$360.00
Very high volume	500	100	5,000,000	$675.00

Alternatives to GPT-4o-mini

GPT-4o

Frontier quality at 15x the cost ($2.50 input / $10 output). Use when GPT-4o-mini fails on quality eval but only for the steps that need it.

Claude 3.5 Haiku

Anthropic small model at $0.80 input / $4.00 output. Roughly 5x more expensive than GPT-4o-mini on input, often higher quality on long-form writing.

Gemini 2.0 Flash

Competitively priced ($0.10 input / $0.40 output). Slightly cheaper. Multimodal support is a plus over GPT-4o-mini for image inputs.

Track GPT-4o-mini usage with Spanlens

Spanlens captures every GPT-4o-mini call with input + output tokens, exact cost, latency, and full request body. One line of code or a baseURL swap. Open source MIT licensed, self-hostable.

Start free →OpenAI integration guide Official pricing ↗

FAQ

What is the GPT-4o-mini cost per 1M tokens in 2026?

GPT-4o-mini is priced at $0.15 per 1M input tokens and $0.60 per 1M output tokens. Cached input is $0.075 per 1M (50% off). It is roughly 15x cheaper than GPT-4o on input and 16x cheaper on output.

Is GPT-4o-mini good enough to replace GPT-4o?

For narrow tasks (classification, extraction, short replies) yes — almost always at no measurable quality regression. For complex multi-turn reasoning, long-form synthesis, or vision tasks requiring frontier quality, no. Run an eval on your real workload before swapping.

Does GPT-4o-mini support function calling and JSON mode?

Yes. Both function calling and the structured outputs / JSON mode APIs work with GPT-4o-mini at the same surface as GPT-4o. Tool use is reliable enough for most agent workflows.

What is the GPT-4o-mini context window?

128K tokens of input, 16K maximum output. Same context envelope as GPT-4o. The mini in the name refers to model size, not context.

How fast is GPT-4o-mini?

Time-to-first-token is typically 200-400ms (faster than GPT-4o). Throughput in the streaming phase is also higher. For latency-sensitive workloads, the speed bump alongside the cost reduction is often the bigger win.

How do I track GPT-4o-mini spend per customer?

Tag each request with X-Spanlens-User: <customer_id>. Spanlens aggregates per-customer cost in /users so you can bill, alert on outliers, or detect runaway loops. See /integrations/openai.

Last updated 2026-06-16. Prices in USD at the standard tier. Spot something out of date? Tell us.