GPT-4o pricing

OpenAI's frontier general-purpose model. $2.50 per 1M input, $10 per 1M output, plus prompt caching at half price.

Input

$2.50

per 1M tokens

Cached input: $1.25 per 1M

Output

$10.00

per 1M tokens

Provider: OpenAI
Context window: 128K tokens
Max output: 16K tokens
Released: 2024-05, latest revision 2024-08

What GPT-4o is best for

✓Multimodal chat (text, image, audio)
✓Complex multi-turn agents and tool use
✓Vision tasks requiring frontier quality
✓Production chat with strict latency requirements
✓Replacing GPT-4-turbo (cheaper and faster)

Monthly cost scenarios

Real-world estimates at common usage levels. Numbers assume no caching, no batching, and the standard tier price.

Use case	In / req	Out / req	Reqs / mo	Monthly
Casual chatbot (small)	500	300	5,000	$21.25
Production assistant (medium)	1,500	800	50,000	$587.50
Long-context summarizer	8,000	1,200	20,000	$640.00
RAG with system prompt	3,000	500	100,000	$1250.00
High-volume API (large)	1,200	600	1,000,000	$9000.00

Alternatives to GPT-4o

GPT-4o-mini

About 15x cheaper ($0.15 input / $0.60 output). Same multimodal surface. Use for classification, routing, and any narrow task where frontier quality is not required.

Claude 3.5 Sonnet

Comparable quality, $3.00 input / $15.00 output. Stronger at long-form writing and complex reasoning, slightly more expensive on output. Prompt caching is more generous than OpenAI.

Gemini 2.0 Flash

Significantly cheaper ($0.10 input / $0.40 output) with multimodal support. Use when cost matters more than absolute quality on long-form reasoning.

o3-mini

OpenAI reasoning model ($1.10 input / $4.40 output). Cheaper than GPT-4o but optimized for reasoning workloads — slower TTFT, no vision.

Track GPT-4o usage with Spanlens

Spanlens captures every GPT-4o call with input + output tokens, exact cost, latency, and full request body. One line of code or a baseURL swap. Open source MIT licensed, self-hostable.

Start free →OpenAI integration guide Official pricing ↗

FAQ

What is the GPT-4o cost per 1M tokens in 2026?

GPT-4o is priced at $2.50 per 1M input tokens and $10 per 1M output tokens at the standard tier. Cached input is $1.25 per 1M (50% off). Batch API and provisioned throughput tiers have separate pricing.

How does GPT-4o pricing compare to GPT-4-turbo?

GPT-4o is cheaper. GPT-4-turbo was $10 input / $30 output. GPT-4o at $2.50 input / $10 output is roughly 3-4x cheaper across the board with comparable or better quality.

Does GPT-4o support prompt caching?

Yes. OpenAI automatically caches identical prefixes longer than 1024 tokens. Cached input tokens are billed at 50% — $1.25 per 1M instead of $2.50. Caching is automatic with no configuration needed; you can see cached_tokens in the usage response field.

What is the GPT-4o context window?

128K tokens for the input, with 16K maximum output tokens. The full 128K is usable in a single request; OpenAI does not throttle context usage.

Can I run GPT-4o on Azure?

Yes. Azure OpenAI offers GPT-4o at the same per-token pricing under most regions. Azure adds availability SLAs and data-residency options; Spanlens works with both standard OpenAI and Azure endpoints.

How do I monitor GPT-4o costs in production?

Capture every call with model, input + output tokens, cost USD, and prompt version. Aggregate by customer, endpoint, and prompt to find the source of cost spikes. Spanlens does this in one line of code — see /integrations/openai.

Last updated 2026-06-16. Prices in USD at the standard tier. Spot something out of date? Tell us.