Gemini 2.0 Flash pricing

Google's small fast multimodal model. $0.10 per 1M input, $0.40 per 1M output. Among the cheapest production-grade options.

Input

$0.100

per 1M tokens

Output

$0.400

per 1M tokens

Provider: Google
Context window: 1M tokens
Max output: 8K tokens
Released: 2024-12

What Gemini 2.0 Flash is best for

✓Very high volume APIs where cost dominates
✓Multimodal inputs (image, video, audio)
✓Long-context workloads (up to 1M tokens)
✓Grounded responses with Google Search
✓Workflows that previously used GPT-4o-mini

Monthly cost scenarios

Real-world estimates at common usage levels. Numbers assume no caching, no batching, and the standard tier price.

Use case	In / req	Out / req	Reqs / mo	Monthly
Intent classifier	300	20	1,000,000	$38.00
Image captioning	1,000	100	200,000	$28.00
Long doc summarizer	100,000	500	5,000	$51.00
Agent sub-step	1,500	300	500,000	$135.00
Massive scale	500	100	10,000,000	$900.00

Alternatives to Gemini 2.0 Flash

GPT-4o-mini

OpenAI competitor at $0.15 input / $0.60 output. Slightly more expensive. Pick based on which has better quality on your eval; cost is close.

Gemini 1.5 Pro

Google larger model at $1.25 input / $5.00 output. Use for harder reasoning where Flash quality is insufficient.

Claude 3.5 Haiku

Anthropic small model at $0.80 input / $4.00 output. Significantly more expensive than Flash. Pick only if Anthropic's writing quality matters for your task.

Track Gemini 2.0 Flash usage with Spanlens

Spanlens captures every Gemini 2.0 Flash call with input + output tokens, exact cost, latency, and full request body. One line of code or a baseURL swap. Open source MIT licensed, self-hostable.

Start free →Google integration guide Official pricing ↗

FAQ

What is the Gemini 2.0 Flash cost per 1M tokens in 2026?

Gemini 2.0 Flash is priced at $0.10 per 1M input tokens and $0.40 per 1M output tokens. Free tier is available via Google AI Studio with rate limits; paid tier (Vertex AI or AI Studio) starts at these prices.

What is the Gemini 2.0 Flash context window?

1 million tokens. The largest production context window currently available. Useful for whole-document QA, multi-document summarization, and codebase-grounded tasks.

Is Gemini 2.0 Flash multimodal?

Yes. Image, video, audio, and PDF inputs are supported through inline parts or file references. Multimodal inputs are priced per token after Google internally converts media to tokens (about 258 tokens per image, varies by audio length).

Does Gemini 2.0 Flash support function calling?

Yes. Function declarations and function calls work the same as on Gemini 1.5 Pro. Parallel function calling and forced function calling are both supported.

AI Studio or Vertex AI?

AI Studio is the simplest setup with API key auth. Vertex AI offers regional endpoints, SLA, IAM integration, and service account auth. For production, Vertex is the better fit; for prototyping and low-volume, AI Studio is fine.

How do I monitor Gemini usage in production?

Capture every Gemini call with usageMetadata (input + output token counts), latency, model variant, and cost. Spanlens handles both AI Studio and Vertex endpoints — see /integrations/gemini.

Last updated 2026-06-16. Prices in USD at the standard tier. Spot something out of date? Tell us.