GPT-4o-mini pricing
OpenAI's small, fast, cheap general-purpose model. $0.15 per 1M input, $0.60 per 1M output. About 15x cheaper than GPT-4o.
- Provider
- OpenAI
- Context window
- 128K tokens
- Max output
- 16K tokens
- Released
- 2024-07
What GPT-4o-mini is best for
- ✓Intent classification and routing
- ✓Structured output extraction
- ✓Short-form generation (one-paragraph replies, summaries)
- ✓Agent sub-tasks that do not need frontier reasoning
- ✓High-volume APIs where cost is the primary constraint
Monthly cost scenarios
Real-world estimates at common usage levels. Numbers assume no caching, no batching, and the standard tier price.
| Use case | In / req | Out / req | Reqs / mo | Monthly |
|---|---|---|---|---|
| Intent classifier | 300 | 20 | 500,000 | $28.50 |
| Short chat replies | 800 | 200 | 100,000 | $24.00 |
| Structured extraction | 2,000 | 400 | 200,000 | $108.00 |
| Agent sub-step | 1,200 | 300 | 1,000,000 | $360.00 |
| Very high volume | 500 | 100 | 5,000,000 | $675.00 |
Alternatives to GPT-4o-mini
GPT-4o
Frontier quality at 15x the cost ($2.50 input / $10 output). Use when GPT-4o-mini fails on quality eval but only for the steps that need it.
Claude 3.5 Haiku
Anthropic small model at $0.80 input / $4.00 output. Roughly 5x more expensive than GPT-4o-mini on input, often higher quality on long-form writing.
Gemini 2.0 Flash
Competitively priced ($0.10 input / $0.40 output). Slightly cheaper. Multimodal support is a plus over GPT-4o-mini for image inputs.
Track GPT-4o-mini usage with Spanlens
Spanlens captures every GPT-4o-mini call with input + output tokens, exact cost, latency, and full request body. One line of code or a baseURL swap. Open source MIT licensed, self-hostable.
FAQ
What is the GPT-4o-mini cost per 1M tokens in 2026?
GPT-4o-mini is priced at $0.15 per 1M input tokens and $0.60 per 1M output tokens. Cached input is $0.075 per 1M (50% off). It is roughly 15x cheaper than GPT-4o on input and 16x cheaper on output.
Is GPT-4o-mini good enough to replace GPT-4o?
For narrow tasks (classification, extraction, short replies) yes — almost always at no measurable quality regression. For complex multi-turn reasoning, long-form synthesis, or vision tasks requiring frontier quality, no. Run an eval on your real workload before swapping.
Does GPT-4o-mini support function calling and JSON mode?
Yes. Both function calling and the structured outputs / JSON mode APIs work with GPT-4o-mini at the same surface as GPT-4o. Tool use is reliable enough for most agent workflows.
What is the GPT-4o-mini context window?
128K tokens of input, 16K maximum output. Same context envelope as GPT-4o. The mini in the name refers to model size, not context.
How fast is GPT-4o-mini?
Time-to-first-token is typically 200-400ms (faster than GPT-4o). Throughput in the streaming phase is also higher. For latency-sensitive workloads, the speed bump alongside the cost reduction is often the bigger win.
How do I track GPT-4o-mini spend per customer?
Tag each request with X-Spanlens-User: <customer_id>. Spanlens aggregates per-customer cost in /users so you can bill, alert on outliers, or detect runaway loops. See /integrations/openai.
Last updated 2026-06-16. Prices in USD at the standard tier. Spot something out of date? Tell us.