MiniMax M2.7: The Self-Evolving AI Model That Rivals Claude and GPT at a Fraction of the Cost

MiniMax M2.7: The Self-Evolving AI Model That Rivals Claude and GPT at a Fraction of the Cost

MiniMax M2.7: A Self-Evolving Model That Rewrites the Rules of AI Agents

What happens when you let an AI model participate in its own evolution? MiniMax just answered that question with M2.7 — a next-generation flagship text model that doesn’t just execute tasks, but actively improves itself through real-world interaction. Built on the OpenClaw (Agent Harness) framework, M2.7 autonomously ran over 100 rounds of scaffold optimization during training, achieving a 30% performance improvement on internal evaluations — without human intervention.

The result is a model that matches or approaches Claude Opus 4.6 and GPT-5 on the hardest coding and agent benchmarks, runs 3x faster, and costs a fraction of the price. Here’s everything you need to know.

What Makes M2.7 Different: Self-Improvement

Most AI models are trained, evaluated, and deployed as static artifacts. M2.7 breaks that pattern. It’s MiniMax’s first model that deeply participates in its own evolution — involved in updating its own memory, building training skills, and improving its own learning process.

During development, M2.7 autonomously:

  • Executed 100+ iteration cycles optimizing its own scaffold performance
  • Managed 30–50% of reinforcement learning research workflows independently
  • Participated in 22 ML competitions, achieving 9 gold medals in best trials
  • Achieved a 66.6% medal rate on MLE-Bench Lite, tying with Google’s Gemini 3.1

This isn’t just a training technique — it’s a signal of where AI development is headed. Models that can evaluate and improve their own performance represent a fundamentally different paradigm from static train-and-deploy cycles.

Benchmark Performance: Punching Way Above Its Weight

M2.7 activates only 10 billion parameters — making it the smallest model in the Tier-1 performance class. Despite this efficiency, it competes head-to-head with models orders of magnitude larger.

Software Engineering

BenchmarkM2.7Claude Opus 4.6GPT-5.3 Codex
SWE-Pro56.22%~57%56.2%
SWE-bench Verified78%55%
VIBE-Pro (end-to-end delivery)55.6%
Terminal Bench 257.0%

M2.7 nearly matches Opus on SWE-Pro and significantly outperforms it on SWE-bench Verified (78% vs 55%). On VIBE-Pro — which measures end-to-end project delivery rather than isolated patches — M2.7 scores 55.6%, demonstrating real-world engineering capability beyond benchmark-specific optimization.

Professional Productivity

BenchmarkM2.7Best Competitor
GDPval-AA (Office tasks)ELO 1495Highest among open-source models
Skill Adherence (40 complex tasks)97%
MM Claw (Agent evaluation)62.7%Approaching Sonnet 4.6

M2.7’s ELO score of 1495 on GDPval-AA — which evaluates real-world office productivity tasks across Excel, PowerPoint, Word, and complex document editing — is the highest among all open-source models. The 97% skill adherence rate across 40+ complex tasks (each exceeding 2,000 tokens) demonstrates reliable execution on the kind of intricate, multi-step workflows that trip up most models.

Machine Learning Research

BenchmarkM2.7Gemini 3.1GPT-5.4
MLE-Bench Lite (medal rate)66.6%66.6%71.2%

M2.7 ties with Google’s Gemini 3.1 and approaches GPT-5.4’s state-of-the-art on machine learning competition benchmarks — a remarkable result for a model with only 10B activated parameters.

Speed and Pricing: The Real Disruption

Raw benchmark scores tell one story. Cost-adjusted performance tells a completely different one.

MetricM2.7Claude Opus 4.6GPT-5
Speed100 TPS~33 TPS~40 TPS
Input cost$0.30/M tokens$15/M tokens$10/M tokens
Output cost$1.20/M tokens$75/M tokens$30/M tokens
Blended cost (with cache)$0.06/M tokens
Activated parameters10B

M2.7 is 50x cheaper than Opus on input and 60x cheaper on output — while matching it on SWE-Pro. At 100 tokens per second, it’s also 3x faster. With automatic cache optimization, the effective blended cost drops to just $0.06 per million tokens.

For teams running high-volume agent workloads, coding assistants, or document processing pipelines, this cost structure changes the economics of what’s feasible.

Core Capabilities

Agent-Centric Workflows

M2.7 is built from the ground up for agentic use cases. The OpenClaw framework enables:

  • Continuous self-improvement in real-world environments
  • Multi-agent collaboration with native capabilities in role boundaries, adversarial reasoning, and protocol adherence
  • Active participation in execution and decision-making rather than passive response generation
  • Complex environment interaction with 97% skill adherence on intricate multi-step tasks

Software Engineering

Beyond benchmarks, M2.7 handles real-world engineering workflows:

  • End-to-end project delivery (not just isolated code patches)
  • Log analysis and debugging
  • Code security review
  • Machine learning pipeline development

Office Suite Excellence

Enhanced capabilities for professional productivity:

  • Complex Excel operations and formula generation
  • PowerPoint creation and editing
  • Word document manipulation
  • Multi-turn modification support — iterate on documents through conversation

Character and Emotional Intelligence

M2.7 includes enhanced identity preservation and emotional intelligence capabilities, providing a foundation for interactive entertainment, roleplay, and character-driven applications.

Two API Variants

VariantSpeedQualityUse Case
M2.7StandardFull qualityProduction, complex tasks
M2.7-highspeedFasterIdentical resultsHigh-throughput, latency-sensitive

Both variants produce identical results — the highspeed variant simply processes faster for latency-sensitive applications.

Developer Tool Compatibility

M2.7 integrates with the tools developers already use:

  • AI Coding: Claude Code, Cursor, Cline, Codex CLI, Roo Code, Kilo Code
  • Agents: OpenCode, Droid, TRAE, Grok CLI
  • Platforms: MiniMax Agent, MiniMax API Platform

OpenRoom: Interactive Agent Demo

MiniMax also open-sourced OpenRoom — an interactive agent demonstration that moves AI interaction beyond plain text into graphical environments. Most of the code was AI-generated, demonstrating M2.7’s practical coding capabilities.

M2.7 vs the Competition: Who Should Use What

If you need…Best choice
Maximum benchmark ceiling regardless of costClaude Opus 4.6
Best cost-adjusted coding performanceMiniMax M2.7
Fastest inference speedMiniMax M2.7 (100 TPS)
High-volume agent workloadsMiniMax M2.7 (50x cheaper)
Office productivity automationMiniMax M2.7 (highest GDPval-AA ELO)
Established ecosystem and integrationsClaude or GPT
Self-improving agent capabilitiesMiniMax M2.7 (OpenClaw)

Try M2.7 on WaveSpeedAI

WaveSpeedAI provides access to MiniMax M2.7 alongside hundreds of other AI models through a unified platform. Whether you’re building coding agents, document processing pipelines, or interactive applications, M2.7’s combination of Tier-1 performance and fraction-of-the-cost pricing makes it the most efficient choice for production workloads.

Try MiniMax M2.7 on WaveSpeedAI →

No subscriptions. No cold starts. Pay only for what you use.

The Bottom Line

MiniMax M2.7 isn’t just another model release — it’s a proof of concept for self-evolving AI. A model with only 10B activated parameters matching Opus and GPT-5 on the hardest engineering benchmarks, while running 3x faster at 50x lower cost, represents exactly the kind of disruption that reshapes how teams build with AI.

The question isn’t whether M2.7 is good enough. It’s whether you can justify paying 50x more for marginal gains.

Sources: