GPT-4o pricing
OpenAI's frontier general-purpose model. $2.50 per 1M input, $10 per 1M output, plus prompt caching at half price.
- Provider
- OpenAI
- Context window
- 128K tokens
- Max output
- 16K tokens
- Released
- 2024-05, latest revision 2024-08
What GPT-4o is best for
- ✓Multimodal chat (text, image, audio)
- ✓Complex multi-turn agents and tool use
- ✓Vision tasks requiring frontier quality
- ✓Production chat with strict latency requirements
- ✓Replacing GPT-4-turbo (cheaper and faster)
Monthly cost scenarios
Real-world estimates at common usage levels. Numbers assume no caching, no batching, and the standard tier price.
| Use case | In / req | Out / req | Reqs / mo | Monthly |
|---|---|---|---|---|
| Casual chatbot (small) | 500 | 300 | 5,000 | $21.25 |
| Production assistant (medium) | 1,500 | 800 | 50,000 | $587.50 |
| Long-context summarizer | 8,000 | 1,200 | 20,000 | $640.00 |
| RAG with system prompt | 3,000 | 500 | 100,000 | $1250.00 |
| High-volume API (large) | 1,200 | 600 | 1,000,000 | $9000.00 |
Alternatives to GPT-4o
GPT-4o-mini
About 15x cheaper ($0.15 input / $0.60 output). Same multimodal surface. Use for classification, routing, and any narrow task where frontier quality is not required.
Claude 3.5 Sonnet
Comparable quality, $3.00 input / $15.00 output. Stronger at long-form writing and complex reasoning, slightly more expensive on output. Prompt caching is more generous than OpenAI.
Gemini 2.0 Flash
Significantly cheaper ($0.10 input / $0.40 output) with multimodal support. Use when cost matters more than absolute quality on long-form reasoning.
o3-mini
OpenAI reasoning model ($1.10 input / $4.40 output). Cheaper than GPT-4o but optimized for reasoning workloads — slower TTFT, no vision.
Track GPT-4o usage with Spanlens
Spanlens captures every GPT-4o call with input + output tokens, exact cost, latency, and full request body. One line of code or a baseURL swap. Open source MIT licensed, self-hostable.
FAQ
What is the GPT-4o cost per 1M tokens in 2026?
GPT-4o is priced at $2.50 per 1M input tokens and $10 per 1M output tokens at the standard tier. Cached input is $1.25 per 1M (50% off). Batch API and provisioned throughput tiers have separate pricing.
How does GPT-4o pricing compare to GPT-4-turbo?
GPT-4o is cheaper. GPT-4-turbo was $10 input / $30 output. GPT-4o at $2.50 input / $10 output is roughly 3-4x cheaper across the board with comparable or better quality.
Does GPT-4o support prompt caching?
Yes. OpenAI automatically caches identical prefixes longer than 1024 tokens. Cached input tokens are billed at 50% — $1.25 per 1M instead of $2.50. Caching is automatic with no configuration needed; you can see cached_tokens in the usage response field.
What is the GPT-4o context window?
128K tokens for the input, with 16K maximum output tokens. The full 128K is usable in a single request; OpenAI does not throttle context usage.
Can I run GPT-4o on Azure?
Yes. Azure OpenAI offers GPT-4o at the same per-token pricing under most regions. Azure adds availability SLAs and data-residency options; Spanlens works with both standard OpenAI and Azure endpoints.
How do I monitor GPT-4o costs in production?
Capture every call with model, input + output tokens, cost USD, and prompt version. Aggregate by customer, endpoint, and prompt to find the source of cost spikes. Spanlens does this in one line of code — see /integrations/openai.
Last updated 2026-06-16. Prices in USD at the standard tier. Spot something out of date? Tell us.