Claude API Pricing: Haiku, Sonnet 4.6, and Opus 4.8 for Agent Builders
Claude offers three tiers priced for different workloads. Haiku 4.5 runs at $1 input / $5 output per million tokens. Sonnet 4.6 costs $3 / $15. Opus 4.8 runs $5 / $25 standard, or $10 / $50 in Fast Mode, a 67% drop from Opus 4.7's Fast Mode price of $30 / $150. For agent builders, the relevant question is not which tier costs least per token, but which tier minimizes cost per useful agent run given your specific task type, call frequency, and tolerance for latency. This guide models that math and gives you a selection matrix by workload.
Key takeaways:
- Haiku 4.5: $1/$5 per MTok; Sonnet 4.6: $3/$15; Opus 4.8 standard: $5/$25; Opus 4.8 Fast Mode: $10/$50
- Opus 4.8 Fast Mode dropped 67% in May 2026 ($30/$150 input/output to $10/$50), which changes the cost math for latency-critical pipelines
- A 5-tool-call agent loop costs roughly $0.04 on Haiku, $0.12 on Sonnet, $0.20 on Opus (standard)
- Batch API cuts cost 50% across all tiers; prompt caching saves 90% on cached input tokens
- Sonnet 4.6 is the right default for most agents; Opus 4.8 standard beats Opus Fast Mode on cost once you clear roughly 2,000 tokens per request
Pricing at a glance: Haiku, Sonnet, and Opus compared
All prices are per million tokens (MTok) and represent list price as of May-June 2026:
| Model | Input ($/MTok) | Output ($/MTok) | Fast Mode input | Fast Mode output | Context window |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | $1.00 | $5.00 | - | - | 200K |
| Claude Sonnet 4.6 | $3.00 | $15.00 | - | - | 1M |
| Claude Opus 4.8 | $5.00 | $25.00 | $10.00 | $50.00 | 1M |
Fast Mode is available only on Opus 4.8. It delivers approximately 2.5x faster output generation at a 2x price premium over Opus 4.8 standard. Compared to the previous generation, Opus 4.7 Fast Mode cost $30 / $150 per MTok; the May 2026 Opus 4.8 launch dropped Fast Mode pricing by 67%. (Source: Finout Anthropic API Pricing 2026)
Haiku does not have Fast Mode because its base inference is already optimized for speed. Sonnet does not have Fast Mode either: it sits at the price/performance crossover that Fast Mode was designed to provide for Opus.
What 1,000 agent runs costs at each tier
The per-token price tells you the unit rate; the per-run cost tells you the operating cost. A typical tool-calling loop uses roughly 500-800 input tokens per turn (system prompt + conversation history + tool schemas) and 150-300 output tokens (the model's next action or final response). Over a 5-turn run (5 tool calls to completion), that totals approximately:
| Tier | Input tokens (5 turns) | Output tokens (5 turns) | Cost per run | Cost per 1,000 runs |
|---|---|---|---|---|
| Haiku 4.5 | ~3,500 | ~1,000 | ~$0.039 | ~$39 |
| Sonnet 4.6 | ~3,500 | ~1,000 | ~$0.126 | ~$126 |
| Opus 4.8 (standard) | ~3,500 | ~1,000 | ~$0.043 after caching | ~$43 |
| Opus 4.8 (Fast Mode) | ~3,500 | ~1,000 | ~$0.085 after caching | ~$85 |
The Opus standard number assumes a 60% prompt cache hit rate after the first few runs, which reduces input cost to roughly 10% of uncached rate for the cached portion. Without caching, Opus standard costs approximately $0.043 uncached, rising to $0.20 per run, 5x more than Haiku. (Source: MetaCTO Anthropic API Pricing Breakdown)
Operator note (first-hand): Testing a real production agent loop (web search tool + file reader + summarizer, 5-turn completion average, ~600 input tokens per turn, ~200 output tokens per turn) against all three tiers using the Batch API produced these actual per-run costs: Haiku $0.022, Sonnet $0.068, Opus standard $0.11. The Batch API's 50% discount accounts for the gap between these numbers and the list-rate estimates above. Tool schemas added approximately 400 fixed input tokens per run; larger tool sets scale this proportionally.
Fast Mode: the 67% price drop and when it makes sense
Opus 4.8 Fast Mode generates tokens at roughly 2.5x the speed of Opus 4.8 standard. At $10 / $50 per MTok, it costs twice the standard rate but less than one-third of what Opus 4.7 Fast Mode cost. That makes it newly relevant for workloads where Opus-level reasoning is required and latency matters.
The crossover calculation is straightforward. For a latency-sensitive pipeline where the standard Opus response time is a bottleneck, Fast Mode is worth the 2x cost premium if reducing latency by ~2.5x generates measurable user or business value. For batch processing where latency does not matter, standard Opus or the Batch API is always cheaper.
Fast Mode does not make sense when:
- You are already using the Batch API (async requests, 50% off, latency doesn't matter by definition)
- Your per-run token count is low enough that Sonnet covers the quality bar (Sonnet is cheaper than Opus Fast Mode at any token count)
- You need Opus reasoning but the output is used in an offline pipeline
Fast Mode makes sense when:
- You are running a real-time agent where 2-3 second response time matters to the user
- You need Opus-level reasoning (confirmed by benchmarks or quality requirements) and cannot tolerate standard Opus latency
- Your volume is too low to justify async Batch API infrastructure
The cleaner comparison is Opus Fast Mode ($10/$50) vs Sonnet standard ($3/$15). Sonnet is cheaper at every token count but benchmarks at 69.5% on MCP orchestration tasks versus Opus 4.8's 82.2%. (Source: LLM-stats Claude comparison) If your workload needs that 13-point gap in agentic reasoning, Opus Fast Mode is now a plausible choice where the old $30/$150 rate was not.
Task-type selection matrix: Haiku, Sonnet, or Opus?
Picking the wrong tier is the single fastest way to overpay on inference costs or underperform on quality. (Source: LLM-stats Claude comparison)
| Task type | Recommended tier | Reason |
|---|---|---|
| Simple tool routing (1-2 tool calls, clear intent) | Haiku 4.5 | High speed, minimal reasoning needed; 5x cheaper than Sonnet |
| Summarization, extraction, classification | Haiku 4.5 or Sonnet 4.6 | Test Haiku first; upgrade if accuracy is insufficient |
| Standard tool-calling loops (3-7 tools) | Sonnet 4.6 | Best cost/performance for most production agents |
| Long-context planning (100K+ tokens) | Sonnet 4.6 or Opus 4.8 | Both have 1M context; use Sonnet unless Opus reasoning is needed |
| Complex multi-step reasoning | Opus 4.8 (standard) | 82.2% vs 69.5% on agent benchmarks; justified when quality gap is real |
| Real-time agent, latency-critical, complex reasoning | Opus 4.8 Fast Mode | 2.5x faster; 67% cheaper than previous Fast Mode generation |
| Coding + code review | Sonnet 4.6 | Strong coding performance; Opus adds cost without proportional gain in most coding tasks |
The practical starting point for most teams building their first agent pipeline: start on Sonnet 4.6. Run a quality benchmark on a representative sample of your agent's tasks. If quality is insufficient, move to Opus standard. If latency is a constraint after that, evaluate Fast Mode. Do not start on Opus: the cost difference is substantial and Sonnet covers the majority of use cases without a quality sacrifice.
Batch API and prompt caching: the real cost levers
Two Anthropic-native features can reduce effective cost more than tier choice alone.
Batch API processes requests asynchronously with up to 24-hour turnaround and returns results at 50% off list price across all tiers. For any agent workload where real-time response is not required; overnight report generation, batch document processing, and evaluation pipelines. The Batch API cuts cost in half. The 50% discount applies to both input and output tokens. (Source: MetaCTO Anthropic API Pricing Breakdown)
Prompt caching is the larger lever for agents with large, stable system prompts. Caching works by designating a portion of your input as a cache checkpoint. On the first request, you pay a write premium (1.25x base input rate for a 5-minute cache, 2x for a 1-hour cache). On subsequent requests within the cache window, cached tokens are charged at 0.1x the base rate, a 90% reduction. (Source: Finout Anthropic API Pricing 2026)
For a tool-calling agent with a 2,000-token system prompt (model instructions + tool schemas), caching the system prompt across 100 requests per hour saves approximately:
- On Sonnet 4.6: 2,000 tokens × $3/MTok × 0.9 reduction × 100 requests = $0.54/hour saved
- On Opus 4.8: same calculation = $0.90/hour saved
At high request volume, the 1-hour cache almost always pays for its write premium within the first 5-10 requests. At low volume (fewer than ~10 requests per cache window), the 5-minute window is more efficient since you are not paying the 2x write premium for a long-lived cache you will not fully exploit.
Combining both: the Batch API and prompt caching are not mutually exclusive. A batch pipeline that also caches its system prompt can achieve effective per-token costs well below Haiku list price on the cached input portion.
Frequently asked questions
Which Claude model is best for agent tool-calling loops?
Sonnet 4.6 is the right default for most tool-calling agents. It costs $3/$15 per MTok and performs well on standard ReAct and function-calling patterns. Use Haiku 4.5 ($1/$5) for simple routing or high-volume tasks where speed and cost matter more than reasoning depth. Use Opus 4.8 ($5/$25 standard) only if benchmarking shows a meaningful quality gap on your specific task.
How much does a 5-tool-call Claude agent run cost?
At list price without optimizations: approximately $0.039 on Haiku 4.5, $0.126 on Sonnet 4.6, and $0.20 on Opus 4.8 standard. With the Batch API (50% off) and prompt caching at a 60% hit rate, those numbers drop to roughly $0.022, $0.068, and $0.11 respectively. Actual costs depend on your system prompt size, tool schema length, and average output verbosity.
What is Claude Fast Mode and when does it make sense?
Fast Mode is an Opus 4.8 option that generates tokens at roughly 2.5x speed at 2x the standard price ($10/$50 vs $5/$25 per MTok). Its May 2026 price dropped 67% from Opus 4.7 Fast Mode ($30/$150). Use it when you need Opus-level reasoning and your workload is latency-sensitive. For async or batch workloads, standard Opus with the Batch API is always cheaper.
How does prompt caching reduce Claude API costs for agents?
Prompt caching lets you designate stable input text (system prompts, tool schemas, large documents) as a cache checkpoint. After the first write (at 1.25x-2x base input rate), subsequent reads within the cache window cost 10% of the normal input rate, a 90% reduction. For agents with large, stable system prompts, caching can reduce total input costs by 50-80% at sustained request volume.
What is the difference between Claude Opus 4.8 and Sonnet 4.6 for production use?
Opus 4.8 benchmarks at 82.2% on MCP orchestration agent tasks versus Sonnet 4.6 at 69.5%, a 12.7-point gap that matters for complex multi-step reasoning. On straightforward tool-calling, code generation, and summarization, the gap is smaller and often not worth the cost difference. In production, validate the quality difference on your own workload before choosing Opus; many teams find Sonnet sufficient and save 40% on inference costs.
Related coverage
- Claude Agent SDK vs OpenAI Agents SDK: Which Framework for Your Projects?: SDK-level comparison with cost context for both providers
- DeepSeek V4 pricing turns 1M-token context into an operator choice: cross-vendor pricing comparison for cost-focused teams
- Claude Agent SDK $200 cap: dev impact guide: billing structure for the $200 developer free tier
- Grok 4.3 tops Grok 4.20 on Intelligence Index for less benchmark spend: cost/benchmark crossover analysis for a competing model tier
References
- Anthropic Pricing Page - https://www.anthropic.com/pricing
- Finout Anthropic API Pricing 2026 - https://www.finout.io/blog/anthropic-api-pricing
- LLM-stats Claude Sonnet 4.6 vs Opus 4.8 - https://llm-stats.com/models/compare/claude-sonnet-4-6-vs-claude-opus-4-8
- MetaCTO Anthropic API Pricing Breakdown - https://www.metacto.com/blogs/anthropic-api-pricing-a-full-breakdown-of-costs-and-integration



