The Long Context Scale
Google Gemini 1.5 Pro features a massive 2,000,000 token context window, allowing developers to upload entire codebases, hours of audio/video, or multiple PDFs. However, Google introduces a tiered pricing model that increases token rates dramatically when context length scales.
Gemini 1.5 Pro Tiered Pricing Model
- Standard Tier (Under 128k Tokens): * Input Cost: $1.25 / 1M tokens * Output Cost: $5.00 / 1M tokens
- Large Context Tier (Over 128k Tokens): * Input Cost: $2.50 / 1M tokens * Output Cost: $10.00 / 1M tokens
Once a prompt contains 128,001 tokens, every single token in that request is billed at the doubled rate.
Optimization Tactics
1. Aggressive Trimming: If a request hover is close to the threshold (e.g. 130,000 tokens), compress the system instructions to get under 128,000 tokens to save 50% on the entire API call. 2. Utilize Prompt Caching: Gemini supports caching for inputs over 32k tokens. Cached tokens read at a 75% discount ($0.31/1M for standard, $0.62/1M for large), mitigating the tiered price penalty significantly.