← LLM cost tracking

Gemini 2.0 Flash pricing

Google's small fast multimodal model. $0.10 per 1M input, $0.40 per 1M output. Among the cheapest production-grade options.

Input
$0.100
per 1M tokens
Output
$0.400
per 1M tokens
Provider
Google
Context window
1M tokens
Max output
8K tokens
Released
2024-12

What Gemini 2.0 Flash is best for

  • Very high volume APIs where cost dominates
  • Multimodal inputs (image, video, audio)
  • Long-context workloads (up to 1M tokens)
  • Grounded responses with Google Search
  • Workflows that previously used GPT-4o-mini

Monthly cost scenarios

Real-world estimates at common usage levels. Numbers assume no caching, no batching, and the standard tier price.

Use caseIn / reqOut / reqReqs / moMonthly
Intent classifier300201,000,000$38.00
Image captioning1,000100200,000$28.00
Long doc summarizer100,0005005,000$51.00
Agent sub-step1,500300500,000$135.00
Massive scale50010010,000,000$900.00

Alternatives to Gemini 2.0 Flash

GPT-4o-mini

OpenAI competitor at $0.15 input / $0.60 output. Slightly more expensive. Pick based on which has better quality on your eval; cost is close.

Gemini 1.5 Pro

Google larger model at $1.25 input / $5.00 output. Use for harder reasoning where Flash quality is insufficient.

Claude 3.5 Haiku

Anthropic small model at $0.80 input / $4.00 output. Significantly more expensive than Flash. Pick only if Anthropic's writing quality matters for your task.

Track Gemini 2.0 Flash usage with Spanlens

Spanlens captures every Gemini 2.0 Flash call with input + output tokens, exact cost, latency, and full request body. One line of code or a baseURL swap. Open source MIT licensed, self-hostable.

FAQ

What is the Gemini 2.0 Flash cost per 1M tokens in 2026?

Gemini 2.0 Flash is priced at $0.10 per 1M input tokens and $0.40 per 1M output tokens. Free tier is available via Google AI Studio with rate limits; paid tier (Vertex AI or AI Studio) starts at these prices.

What is the Gemini 2.0 Flash context window?

1 million tokens. The largest production context window currently available. Useful for whole-document QA, multi-document summarization, and codebase-grounded tasks.

Is Gemini 2.0 Flash multimodal?

Yes. Image, video, audio, and PDF inputs are supported through inline parts or file references. Multimodal inputs are priced per token after Google internally converts media to tokens (about 258 tokens per image, varies by audio length).

Does Gemini 2.0 Flash support function calling?

Yes. Function declarations and function calls work the same as on Gemini 1.5 Pro. Parallel function calling and forced function calling are both supported.

AI Studio or Vertex AI?

AI Studio is the simplest setup with API key auth. Vertex AI offers regional endpoints, SLA, IAM integration, and service account auth. For production, Vertex is the better fit; for prototyping and low-volume, AI Studio is fine.

How do I monitor Gemini usage in production?

Capture every Gemini call with usageMetadata (input + output token counts), latency, model variant, and cost. Spanlens handles both AI Studio and Vertex endpoints — see /integrations/gemini.

Last updated 2026-06-16. Prices in USD at the standard tier. Spot something out of date? Tell us.