Claude 3.5 Sonnet pricing
Anthropic's flagship working model. $3.00 per 1M input, $15.00 per 1M output, with cache reads at 10% of input price.
- Provider
- Anthropic
- Context window
- 200K tokens
- Max output
- 8K tokens
- Released
- 2024-06, refreshed 2024-10
What Claude 3.5 Sonnet is best for
- ✓Complex multi-turn agents and tool use
- ✓Long-form writing and editing
- ✓Code generation and review
- ✓Long-context document QA (up to 200K tokens)
- ✓Workflows that benefit from aggressive prompt caching
Monthly cost scenarios
Real-world estimates at common usage levels. Numbers assume no caching, no batching, and the standard tier price.
| Use case | In / req | Out / req | Reqs / mo | Monthly |
|---|---|---|---|---|
| Coding assistant | 4,000 | 1,500 | 20,000 | $690.00 |
| Document QA | 50,000 | 800 | 5,000 | $810.00 |
| Production agent | 2,500 | 1,000 | 100,000 | $2250.00 |
| Long-form writing | 1,500 | 3,000 | 10,000 | $495.00 |
| Long-context summarizer | 80,000 | 1,200 | 3,000 | $774.00 |
Alternatives to Claude 3.5 Sonnet
Claude 3.5 Haiku
Smaller Anthropic model at $0.80 input / $4.00 output. About 4x cheaper. Use for high-volume narrow tasks where Sonnet quality is overkill.
GPT-4o
OpenAI frontier at $2.50 input / $10 output. Slightly cheaper than Sonnet, comparable quality. Pick based on which produces better outputs on your eval.
Claude Opus 4
Anthropic flagship at $15 input / $75 output (where available). 5x the cost of Sonnet. Use only for the hardest reasoning steps; route the rest to Sonnet.
Track Claude 3.5 Sonnet usage with Spanlens
Spanlens captures every Claude 3.5 Sonnet call with input + output tokens, exact cost, latency, and full request body. One line of code or a baseURL swap. Open source MIT licensed, self-hostable.
FAQ
What is the Claude 3.5 Sonnet cost per 1M tokens in 2026?
Claude 3.5 Sonnet is priced at $3.00 per 1M input tokens and $15.00 per 1M output tokens. Cache reads are $0.30 per 1M (10% of input) and cache writes are $3.75 per 1M (25% premium over input).
How does Anthropic prompt caching pricing work?
Anthropic charges a one-time cache write at $3.75 per 1M tokens (input + 25%), then 10% of base input price on every cache read until the cache expires (default 5 minutes). For a shared system prompt across many requests, the break-even point is roughly 2-3 cache hits.
What is the Claude 3.5 Sonnet context window?
200K tokens of input, 8K maximum output. The 200K context window is one of the largest among production models — particularly useful for document-grounded tasks.
How does Claude 3.5 Sonnet compare to GPT-4o on cost?
GPT-4o is slightly cheaper ($2.50 input / $10 output vs Anthropic's $3.00 / $15). However, Anthropic's prompt caching at 10% of input makes Sonnet cheaper than GPT-4o for any workload with shared context (e.g. RAG with stable system prompts).
Can I run Claude 3.5 Sonnet on AWS Bedrock?
Yes. Bedrock offers Claude 3.5 Sonnet at the same per-token pricing under most regions. Bedrock authentication uses SigV4 with your AWS credentials. Spanlens works with both direct Anthropic and Bedrock endpoints.
How do I monitor Claude 3.5 Sonnet usage in production?
Capture every Messages API call with input + output + cache token breakdown, latency, model variant, and cost. Aggregate by prompt version to catch caching regressions. Spanlens does this in one line — see /integrations/anthropic.
Last updated 2026-06-16. Prices in USD at the standard tier. Spot something out of date? Tell us.