← LLM cost tracking

Claude 3.5 Sonnet pricing

Anthropic's flagship working model. $3.00 per 1M input, $15.00 per 1M output, with cache reads at 10% of input price.

Input
$3.00
per 1M tokens
Cached input: $0.300 per 1M
Cache write: $3.75 per 1M
Output
$15.00
per 1M tokens
Provider
Anthropic
Context window
200K tokens
Max output
8K tokens
Released
2024-06, refreshed 2024-10

What Claude 3.5 Sonnet is best for

  • Complex multi-turn agents and tool use
  • Long-form writing and editing
  • Code generation and review
  • Long-context document QA (up to 200K tokens)
  • Workflows that benefit from aggressive prompt caching

Monthly cost scenarios

Real-world estimates at common usage levels. Numbers assume no caching, no batching, and the standard tier price.

Use caseIn / reqOut / reqReqs / moMonthly
Coding assistant4,0001,50020,000$690.00
Document QA50,0008005,000$810.00
Production agent2,5001,000100,000$2250.00
Long-form writing1,5003,00010,000$495.00
Long-context summarizer80,0001,2003,000$774.00

Alternatives to Claude 3.5 Sonnet

Claude 3.5 Haiku

Smaller Anthropic model at $0.80 input / $4.00 output. About 4x cheaper. Use for high-volume narrow tasks where Sonnet quality is overkill.

GPT-4o

OpenAI frontier at $2.50 input / $10 output. Slightly cheaper than Sonnet, comparable quality. Pick based on which produces better outputs on your eval.

Claude Opus 4

Anthropic flagship at $15 input / $75 output (where available). 5x the cost of Sonnet. Use only for the hardest reasoning steps; route the rest to Sonnet.

Track Claude 3.5 Sonnet usage with Spanlens

Spanlens captures every Claude 3.5 Sonnet call with input + output tokens, exact cost, latency, and full request body. One line of code or a baseURL swap. Open source MIT licensed, self-hostable.

FAQ

What is the Claude 3.5 Sonnet cost per 1M tokens in 2026?

Claude 3.5 Sonnet is priced at $3.00 per 1M input tokens and $15.00 per 1M output tokens. Cache reads are $0.30 per 1M (10% of input) and cache writes are $3.75 per 1M (25% premium over input).

How does Anthropic prompt caching pricing work?

Anthropic charges a one-time cache write at $3.75 per 1M tokens (input + 25%), then 10% of base input price on every cache read until the cache expires (default 5 minutes). For a shared system prompt across many requests, the break-even point is roughly 2-3 cache hits.

What is the Claude 3.5 Sonnet context window?

200K tokens of input, 8K maximum output. The 200K context window is one of the largest among production models — particularly useful for document-grounded tasks.

How does Claude 3.5 Sonnet compare to GPT-4o on cost?

GPT-4o is slightly cheaper ($2.50 input / $10 output vs Anthropic's $3.00 / $15). However, Anthropic's prompt caching at 10% of input makes Sonnet cheaper than GPT-4o for any workload with shared context (e.g. RAG with stable system prompts).

Can I run Claude 3.5 Sonnet on AWS Bedrock?

Yes. Bedrock offers Claude 3.5 Sonnet at the same per-token pricing under most regions. Bedrock authentication uses SigV4 with your AWS credentials. Spanlens works with both direct Anthropic and Bedrock endpoints.

How do I monitor Claude 3.5 Sonnet usage in production?

Capture every Messages API call with input + output + cache token breakdown, latency, model variant, and cost. Aggregate by prompt version to catch caching regressions. Spanlens does this in one line — see /integrations/anthropic.

Last updated 2026-06-16. Prices in USD at the standard tier. Spot something out of date? Tell us.