Claude Opus 4.6 and Sonnet 4.6: Everything You Need to Know

A deep dive into Anthropic's Claude Opus 4.6 and Sonnet 4.6 — the most capable Claude models yet, featuring 1M context windows, adaptive thinking, and state-of-the-art benchmarks.

5 min read

Anthropic has raised the bar again. With the release of Claude Opus 4.6 (February 5, 2026) and Claude Sonnet 4.6 (February 17, 2026), the Claude model family delivers major gains in coding, agentic workflows, long-context reasoning, and computer use — all while keeping pricing unchanged from the previous generation.

Here’s what makes the 4.6 generation a significant leap forward.

Claude Opus 4.6: The Most Capable Claude Ever

Opus 4.6 is Anthropic’s flagship model, designed for the most demanding tasks in coding, research, and complex reasoning.

1M Context Window at Standard Pricing

For the first time, an Opus-class model ships with a 1 million token context window — and there are no long-context surcharges. This means you can feed entire codebases, lengthy legal documents, or massive datasets into a single prompt without worrying about extra costs.

128K Output Tokens

Opus 4.6 doubles the maximum output from 64K to 128K tokens, making it far more practical for generating long-form content, detailed code, or comprehensive analyses in a single response.

Adaptive Thinking

Gone are the days of manually tuning extended thinking budgets. Opus 4.6 introduces adaptive thinking, where Claude dynamically decides when and how deeply to reason. You can set one of four effort levels — low, medium, high (default), or max — and let the model allocate its reasoning budget accordingly.

Interleaved Thinking

In agentic workflows, Claude can now think between tool calls. Rather than planning everything upfront and then executing, the model reasons at each step, adjusting its approach based on intermediate results. This makes multi-step tasks significantly more reliable.

Context Compaction

When conversations approach the context limit, Opus 4.6 automatically summarizes and replaces older context instead of simply truncating. This enables longer sustained interactions — particularly valuable for coding sessions, debugging, and research workflows that span many turns.

Claude Sonnet 4.6: Closing the Gap

Sonnet 4.6 is now the default model for Free and Pro users on claude.ai. What’s remarkable about this release is how close Sonnet comes to Opus-level performance — the gap between the two is the narrowest it has ever been.

Sonnet 4.6 shares the same core improvements: 1M context window, adaptive thinking, extended thinking, and interleaved thinking. All at a significantly lower price point.

Benchmark Highlights

The numbers tell a compelling story:

BenchmarkOpus 4.6Sonnet 4.6
SWE-bench Verified (real GitHub issues)~80.8%79.6%
OSWorld-Verified (computer use)72.7%72.5%
Terminal-Bench 2.0 (agentic coding)#1 overall59.1%
Humanity’s Last Exam#1 overall
ARC-AGI-258.3% (4.3x gain)
BigLaw Bench (legal reasoning)90.2%
MRCR v2 8-needle @ 1M (long-context)76%

A few standouts worth noting:

  • SWE-bench Verified: Sonnet 4.6 scores 79.6%, nearly matching Opus at 80.8%. For most coding tasks, the difference is negligible.
  • OSWorld: Both models score above 72% on autonomous computer use — a massive jump from the previous generation and well ahead of competing models.
  • ARC-AGI-2: Sonnet 4.6 jumped from 13.6% to 58.3%, a 4.3x improvement — the largest single-generation gain in Claude history.
  • Long-context retrieval: Opus 4.6 scores 76% on the 8-needle retrieval task at 1M context, compared to just 18.5% for Sonnet 4.5. A 4x improvement in finding information buried deep in long documents.

Pricing

Both models maintain the same pricing as their 4.5 predecessors:

ModelInput (per 1M tokens)Output (per 1M tokens)
Opus 4.6$5$25
Sonnet 4.6$3$15
Haiku 4.5$1$5

The 1M context window is included at standard pricing for both Opus and Sonnet — no premium tiers or surcharges.

When to Use Which Model

Choose Opus 4.6 when you need:

  • Maximum accuracy on complex, multi-step reasoning
  • Long-context tasks requiring precise retrieval across massive documents
  • Agentic coding workflows where reliability is paramount
  • Legal, scientific, or financial analysis demanding the highest accuracy

Choose Sonnet 4.6 when you need:

  • Strong coding and reasoning at a lower cost
  • Computer use and agentic tasks (performance is nearly identical to Opus)
  • A great balance between capability and speed
  • High-volume workloads where the 40% cost savings add up

Choose Haiku 4.5 when you need:

  • Fast, lightweight tasks like classification, summarization, or simple Q&A
  • Budget-sensitive applications at scale

What This Means for Developers

The 4.6 generation represents a shift in how developers can build with Claude:

  1. Agentic workflows are now practical. Interleaved thinking and improved tool use mean Claude can handle complex, multi-step tasks with far fewer errors. Terminal-Bench and OSWorld scores confirm this.

  2. Context is no longer a bottleneck. With 1M tokens at standard pricing and automatic context compaction, you can build applications that reason over entire repositories, document collections, or conversation histories.

  3. The value tier is exceptionally strong. Sonnet 4.6 performs within 1-2% of Opus on most coding and computer use benchmarks. For many production workloads, it’s the smart default.

  4. Adaptive thinking simplifies integration. Instead of tuning thinking budgets per task, you set an effort level and let the model handle the rest. This reduces prompt engineering overhead and makes performance more consistent.

The Bottom Line

Claude Opus 4.6 and Sonnet 4.6 deliver the largest capability jump in a single Claude generation. The 1M context window, adaptive thinking, and interleaved reasoning aren’t just spec-sheet improvements — they fundamentally change what you can build.

Opus 4.6 sets new benchmarks across the board. Sonnet 4.6 gets remarkably close at 60% of the price. And with Haiku 4.5 still available for lightweight tasks, the full Claude lineup covers every use case from budget to frontier.

The models are available now through the Claude API, claude.ai, and partner platforms including Amazon Bedrock and Google Cloud Vertex AI.