← 部落格

本文暫未提供您所選語言的版本,目前顯示英文版本。

Claude Opus 4.8: Release Date, Pricing, Benchmarks, and Builder Notes

Claude Opus 4.8 is live. See what Anthropic confirmed about release timing, API access, pricing, benchmarks, and what builders should evaluate before adoption.

By Dora 10 min read
Claude Opus 4.8: Release Date, Pricing, Benchmarks, and Builder Notes

Hello, guys. Dora here.

Anthropic shipped Opus 4.8 on May 28, 2026 — less than two months after 4.7. The Opus 4.8 release date matters less than what actually changed, but I want to write this down while it’s fresh, because the framing in the announcement and the framing builders actually need are not quite the same thing. The headline numbers are real. The thing they don’t tell you is which workloads benefit, which don’t, and how much of this you can verify yourself before you flip a switch in production.

This piece walks through ​release status, pricing, benchmarks​, what changed for builders, and the community signal that’s still forming. I’m pulling from the official announcement and the System Card, and treating anything else as secondary.

What Claude Opus 4.8 Is

Official release status and API model ID

The Opus 4.8 release date is May 28, 2026. Available immediately on the Claude API, Amazon Bedrock, Google Cloud’s Vertex AI, and GitHub Copilot. The API model ID is Claude-Opus-4-8. That’s the string you’ll put in your config.

Per Anthropic’s announcement, anthropic Opus 4.8 is described as “a modest but tangible improvement on its predecessor.” That phrasing is unusually restrained for a launch post and worth reading literally — Anthropic itself is not selling this as a generational jump.

What changed from Opus 4.7

Three changes that actually matter for builders:

First, honesty. Anthropic says Opus 4.8 is roughly four times less likely than 4.7 to let flaws in its own code pass without flagging them. Early testers report it surfaces uncertainty more readily. This is the kind of thing that doesn’t show up in a single demo but compounds across long-running agentic sessions.

Second, agentic reliability. Anthropic frames the model as “more reliable and sharper in its judgement” on agentic tasks. Cognition (Devin) specifically noted the comment-verbosity and tool-calling issues from 4.7 are fixed.

Third, a faster and cheaper fast mode. Same model, configurable for higher throughput at lower cost. Details in the pricing section.

There’s also a new feature layer launched alongside: Dynamic Workflows in Claude Code (research preview, Enterprise/Team/Max plans), effort control in Claude.ai and Cowork, and the Messages API now accepts system entries inside the messages array. The Messages API change matters more than it sounds — it lets you update Claude’s instructions mid-task without breaking the prompt cache. For long agent runs, that’s a quiet but real efficiency win.

Opus 4.8 Pricing and API Availability

Regular usage vs fast mode

Opus 4.8 pricing, regular usage: $5 per million input tokens, $25 per million output tokens​. Identical to Opus 4.7. Nothing to renegotiate, nothing to refactor.

Fast mode: $10 per million input, $50 per million output. Anthropic says fast mode is roughly 2.5× faster and three times cheaper than fast mode was for previous models. The math here is comparing fast-mode-to-fast-mode across versions, not fast-mode-to-regular within 4.8. Read it carefully.

A note on effort levels, because this is where most builders will spend time tuning: Opus 4.8 defaults to “high” effort. On coding tasks, that spends a similar number of tokens to Opus 4.7’s default — but with better results. “Extra” (xhigh in Claude Code) and “max” are available for harder work and longer asynchronous runs. Anthropic raised Claude Code rate limits to accommodate higher effort levels, which is the kind of operational detail that matters more than benchmark scores.

What teams should verify in Console and docs

I’d treat these as the only authoritative reference points that won’t drift:

Always confirm pricing and limits in Console before committing — model lineups update, fast-mode availability can vary by region or platform, and what’s true on the API may differ slightly on Bedrock or Vertex on launch week.

Opus 4.8 Benchmarks and What They Mean

Coding, agentic skills, reasoning, and knowledge work

The Opus 4.8 benchmark numbers Anthropic published, with 4.7 as comparison where given:

  • Agentic coding (SWE-Bench Pro): 64.3% → 69.2%
  • Multidisciplinary reasoning with tools: 54.7% → 57.9%
  • Misalignment behavior score (lower is better): 2.5 → 1.9 — putting Opus 4.8 effectively level with Mythos Preview on alignment
  • OSWorld-Verified (computer use): 84% reported by Browserbase, described as a “meaningful jump” over both 4.7 and GPT-5.5

On Terminal-Bench 2.1, Anthropic reports GPT-5.5 still leads at 83.4% under Codex CLI harness — Anthropic is forthright about this in the footnotes of the announcement. Worth noting because most launch posts skip past competitor advantages.

Partner-reported numbers are louder. Databricks said its Genie agent runs over PDFs and diagrams at 61% cheaper token cost than under 4.7. Browserbase reported 84% on Online-Mind2Web. Devin-maker Cognition called it a direct capability gain for engineers. These are not independent — they’re testimonials. But the kinds of improvements being cited (token efficiency on multimodal inputs, tool-calling cleanliness, agent end-to-end completion) are operationally specific in a way pure marketing claims usually aren’t.

Why official benchmarks still need workload testing

Benchmarks tell you what the model can do on a fixed eval. They don’t tell you what it does on your specific prompts, your specific tool harness, your specific retrieval setup. Two things to actually test before switching production traffic:

One — does the honesty improvement help or hurt your workflow? More flagging of uncertainty is great for code review, agent supervision, and analysis. It can be friction for low-stakes generation tasks where you want a confident answer.

Two — does the default high effort fit your latency budget? Anthropic says token spend is similar to 4.7 on coding tasks at the new default, but “similar” isn’t “identical” and your workload may diverge. Run a representative batch, measure cost and latency end-to-end, then decide.

I’d rather burn one afternoon on workload testing than ship a switch and discover the regression in week three.

Opus 4.8 vs Opus 4.7 for Builders

Better judgment, honesty, and agentic task reliability

Where I’d expect builders to feel a difference:

  • Long-running agent sessions. The honesty gain plus the tool-calling cleanup compound over many steps. Anthropic’s Dynamic Workflows feature is built on this — codebase-scale migrations, hundreds of parallel subagents, with verification before reporting back. That kind of workload only makes sense if the underlying model doesn’t fabricate progress.
  • Multimodal-heavy pipelines. Databricks’ 61% token cost reduction on PDFs and diagrams isn’t a small claim. If your workflow runs documents through the model, this is worth measuring on your own data.
  • Computer-use and browser agents. 84% on Online-Mind2Web is a real step forward in a category that’s been historically brittle.

Where I wouldn’t expect a meaningful shift: simple chat, single-shot prompts, short context tasks. The 4.7-to-4.8 delta is most visible at the high end of complexity, not the floor.

When to switch, test, or wait

A short decision frame, since this is the question most teams will ask:

  • Switch now​: if you’re running agentic workloads, computer-use agents, or any pipeline where Opus 4.7’s tool-calling or comment-verbosity issues bit you. Pricing is identical. Risk of regression is low.
  • Test first​: if your workflow depends on a specific behavior (response length, tone, particular agent patterns). The honesty change can shift outputs in ways that touch downstream prompts.
  • Wait​: if you’ve tuned hard to 4.7 for a stable workflow and don’t run agentic tasks. There’s no urgency, and Anthropic itself called this a “modest” improvement.

Practical move: shadow-test 4.8 on a sample of production traffic for a few days. Compare against 4.7 on the metrics you actually care about — cost, latency, completion rate, error rate. Then decide.

Reddit and Community Discussion: What to Treat Carefully

Opus 4.8 reddit threads will start landing in the days after launch. They always do. What they’re useful for: surfacing surprising failure modes, regression reports on specific workflows, prompt patterns that broke. What they’re not useful for: average performance assessments, statistical claims, or “is it worth switching” verdicts during week one.

Community signal in the first 72 hours of a launch skews toward the loud — people who hit problems post first, people whose workflows improved silently don’t post at all. The early enterprise partner quotes in Anthropic’s announcement are also testimonials. Both are data points, not conclusions.

I’d watch the community discussion specifically for: regression reports on previously-working prompts, real cost comparisons with workload context, and feedback on the new effort control behavior. Skip the takes. Wait two to three weeks for a steadier picture if your decision isn’t time-critical.

FAQ

What is Claude Opus 4.8?

Claude Opus 4.8 is anthropic Opus 4.8’s flagship model upgrade released May 28, 2026. Anthropic positions it as a “modest but tangible” improvement over Opus 4.7, with gains in agentic coding, reasoning, knowledge work, and honesty. The API model ID is Claude-Opus-4-8.

How do developers use Claude Opus 4.8?

Through the Claude API using the Claude-Opus-4-8 model ID, as well as via Amazon Bedrock, Google Cloud Vertex AI, and platforms like GitHub Copilot, where it’s now generally available. Effort levels — low, medium, high (default), extra/xhigh, max — are configurable. For full integration details, refer to Anthropic’s official documentation.

Is Opus 4.8 pricing different from Opus 4.7?

Regular pricing is unchanged: $5 per million input tokens, $25 per million output tokens. Fast mode pricing differs and is described separately in Anthropic’s announcement. Always confirm current pricing in Console or the official pricing page before committing — please refer to the official latest documentation for the most current numbers.

Should teams switch from Opus 4.7 to Opus 4.8?

Depends on workload. If you run agentic pipelines, computer-use agents, or workflows where 4.7’s tool-calling or verbosity issues hurt, switching makes sense at parity pricing. If you’ve tuned to 4.7 for a specific behavior, shadow-test first. There’s no general answer.

Conclusion

Claude 4.8 is what it claims to be — a real but measured upgrade that pays off most on agentic, long-running, and multimodal workloads. Same pricing as 4.7 for regular usage means the switching cost is operational, not financial. The honesty improvement is the change I’d watch most closely in Claude 4.8, because it affects how the model behaves in failure cases, not just in success cases.

If you’re a builder, test it on your own workload before deciding. Don’t trust the launch testimonials, don’t trust the early Reddit threads, don’t trust this post either. Run it yourself.

That’s it. More to come once I’ve had a week with it.

Previous Posts: