← LLM cost tracking

GPT-4o pricing

OpenAI's frontier general-purpose model. $2.50 per 1M input, $10 per 1M output, plus prompt caching at half price.

Input
$2.50
per 1M tokens
Cached input: $1.25 per 1M
Output
$10.00
per 1M tokens
Provider
OpenAI
Context window
128K tokens
Max output
16K tokens
Released
2024-05, latest revision 2024-08

What GPT-4o is best for

  • Multimodal chat (text, image, audio)
  • Complex multi-turn agents and tool use
  • Vision tasks requiring frontier quality
  • Production chat with strict latency requirements
  • Replacing GPT-4-turbo (cheaper and faster)

Monthly cost scenarios

Real-world estimates at common usage levels. Numbers assume no caching, no batching, and the standard tier price.

Use caseIn / reqOut / reqReqs / moMonthly
Casual chatbot (small)5003005,000$21.25
Production assistant (medium)1,50080050,000$587.50
Long-context summarizer8,0001,20020,000$640.00
RAG with system prompt3,000500100,000$1250.00
High-volume API (large)1,2006001,000,000$9000.00

Alternatives to GPT-4o

GPT-4o-mini

About 15x cheaper ($0.15 input / $0.60 output). Same multimodal surface. Use for classification, routing, and any narrow task where frontier quality is not required.

Claude 3.5 Sonnet

Comparable quality, $3.00 input / $15.00 output. Stronger at long-form writing and complex reasoning, slightly more expensive on output. Prompt caching is more generous than OpenAI.

Gemini 2.0 Flash

Significantly cheaper ($0.10 input / $0.40 output) with multimodal support. Use when cost matters more than absolute quality on long-form reasoning.

o3-mini

OpenAI reasoning model ($1.10 input / $4.40 output). Cheaper than GPT-4o but optimized for reasoning workloads — slower TTFT, no vision.

Track GPT-4o usage with Spanlens

Spanlens captures every GPT-4o call with input + output tokens, exact cost, latency, and full request body. One line of code or a baseURL swap. Open source MIT licensed, self-hostable.

FAQ

What is the GPT-4o cost per 1M tokens in 2026?

GPT-4o is priced at $2.50 per 1M input tokens and $10 per 1M output tokens at the standard tier. Cached input is $1.25 per 1M (50% off). Batch API and provisioned throughput tiers have separate pricing.

How does GPT-4o pricing compare to GPT-4-turbo?

GPT-4o is cheaper. GPT-4-turbo was $10 input / $30 output. GPT-4o at $2.50 input / $10 output is roughly 3-4x cheaper across the board with comparable or better quality.

Does GPT-4o support prompt caching?

Yes. OpenAI automatically caches identical prefixes longer than 1024 tokens. Cached input tokens are billed at 50% — $1.25 per 1M instead of $2.50. Caching is automatic with no configuration needed; you can see cached_tokens in the usage response field.

What is the GPT-4o context window?

128K tokens for the input, with 16K maximum output tokens. The full 128K is usable in a single request; OpenAI does not throttle context usage.

Can I run GPT-4o on Azure?

Yes. Azure OpenAI offers GPT-4o at the same per-token pricing under most regions. Azure adds availability SLAs and data-residency options; Spanlens works with both standard OpenAI and Azure endpoints.

How do I monitor GPT-4o costs in production?

Capture every call with model, input + output tokens, cost USD, and prompt version. Aggregate by customer, endpoint, and prompt to find the source of cost spikes. Spanlens does this in one line of code — see /integrations/openai.

Last updated 2026-06-16. Prices in USD at the standard tier. Spot something out of date? Tell us.