MiniMax M2.7: The Self-Evolving AI Model That Rivals Claude and GPT at a Fraction of the Cost

MiniMax M2.7: A Self-Evolving Model That Rewrites the Rules of AI Agents

What happens when you let an AI model participate in its own evolution? MiniMax just answered that question with M2.7 — a next-generation flagship text model that doesn’t just execute tasks, but actively improves itself through real-world interaction. Built on the OpenClaw (Agent Harness) framework, M2.7 autonomously ran over 100 rounds of scaffold optimization during training, achieving a 30% performance improvement on internal evaluations — without human intervention.

The result is a model that matches or approaches Claude Opus 4.6 and GPT-5 on the hardest coding and agent benchmarks, runs 3x faster, and costs a fraction of the price. Here’s everything you need to know.

What Makes M2.7 Different: Self-Improvement

Most AI models are trained, evaluated, and deployed as static artifacts. M2.7 breaks that pattern. It’s MiniMax’s first model that deeply participates in its own evolution — involved in updating its own memory, building training skills, and improving its own learning process.

During development, M2.7 autonomously:

Executed 100+ iteration cycles optimizing its own scaffold performance
Managed 30–50% of reinforcement learning research workflows independently
Participated in 22 ML competitions, achieving 9 gold medals in best trials
Achieved a 66.6% medal rate on MLE-Bench Lite, tying with Google’s Gemini 3.1

This isn’t just a training technique — it’s a signal of where AI development is headed. Models that can evaluate and improve their own performance represent a fundamentally different paradigm from static train-and-deploy cycles.

Benchmark Performance: Punching Way Above Its Weight

M2.7 activates only 10 billion parameters — making it the smallest model in the Tier-1 performance class. Despite this efficiency, it competes head-to-head with models orders of magnitude larger.

Software Engineering

Benchmark	M2.7	Claude Opus 4.6	GPT-5.3 Codex
SWE-Pro	56.22%	~57%	56.2%
SWE-bench Verified	78%	55%	—
VIBE-Pro (end-to-end delivery)	55.6%	—	—
Terminal Bench 2	57.0%	—	—

M2.7 nearly matches Opus on SWE-Pro and significantly outperforms it on SWE-bench Verified (78% vs 55%). On VIBE-Pro — which measures end-to-end project delivery rather than isolated patches — M2.7 scores 55.6%, demonstrating real-world engineering capability beyond benchmark-specific optimization.

Professional Productivity

Benchmark	M2.7	Best Competitor
GDPval-AA (Office tasks)	ELO 1495	Highest among open-source models
Skill Adherence (40 complex tasks)	97%	—
MM Claw (Agent evaluation)	62.7%	Approaching Sonnet 4.6

M2.7’s ELO score of 1495 on GDPval-AA — which evaluates real-world office productivity tasks across Excel, PowerPoint, Word, and complex document editing — is the highest among all open-source models. The 97% skill adherence rate across 40+ complex tasks (each exceeding 2,000 tokens) demonstrates reliable execution on the kind of intricate, multi-step workflows that trip up most models.

Machine Learning Research

Benchmark	M2.7	Gemini 3.1	GPT-5.4
MLE-Bench Lite (medal rate)	66.6%	66.6%	71.2%

M2.7 ties with Google’s Gemini 3.1 and approaches GPT-5.4’s state-of-the-art on machine learning competition benchmarks — a remarkable result for a model with only 10B activated parameters.

Speed and Pricing: The Real Disruption

Raw benchmark scores tell one story. Cost-adjusted performance tells a completely different one.

Metric	M2.7	Claude Opus 4.6	GPT-5
Speed	100 TPS	~33 TPS	~40 TPS
Input cost	$0.30/M tokens	$15/M tokens	$10/M tokens
Output cost	$1.20/M tokens	$75/M tokens	$30/M tokens
Blended cost (with cache)	$0.06/M tokens	—	—
Activated parameters	10B	—	—

M2.7 is 50x cheaper than Opus on input and 60x cheaper on output — while matching it on SWE-Pro. At 100 tokens per second, it’s also 3x faster. With automatic cache optimization, the effective blended cost drops to just $0.06 per million tokens.

For teams running high-volume agent workloads, coding assistants, or document processing pipelines, this cost structure changes the economics of what’s feasible.

Core Capabilities

Agent-Centric Workflows

M2.7 is built from the ground up for agentic use cases. The OpenClaw framework enables:

Continuous self-improvement in real-world environments
Multi-agent collaboration with native capabilities in role boundaries, adversarial reasoning, and protocol adherence
Active participation in execution and decision-making rather than passive response generation
Complex environment interaction with 97% skill adherence on intricate multi-step tasks

Software Engineering

Beyond benchmarks, M2.7 handles real-world engineering workflows:

End-to-end project delivery (not just isolated code patches)
Log analysis and debugging
Code security review
Machine learning pipeline development

Office Suite Excellence

Enhanced capabilities for professional productivity:

Complex Excel operations and formula generation
PowerPoint creation and editing
Word document manipulation
Multi-turn modification support — iterate on documents through conversation

Character and Emotional Intelligence

M2.7 includes enhanced identity preservation and emotional intelligence capabilities, providing a foundation for interactive entertainment, roleplay, and character-driven applications.

Two API Variants

Variant	Speed	Quality	Use Case
M2.7	Standard	Full quality	Production, complex tasks
M2.7-highspeed	Faster	Identical results	High-throughput, latency-sensitive

Both variants produce identical results — the highspeed variant simply processes faster for latency-sensitive applications.

Developer Tool Compatibility

M2.7 integrates with the tools developers already use:

AI Coding: Claude Code, Cursor, Cline, Codex CLI, Roo Code, Kilo Code
Agents: OpenCode, Droid, TRAE, Grok CLI
Platforms: MiniMax Agent, MiniMax API Platform

OpenRoom: Interactive Agent Demo

MiniMax also open-sourced OpenRoom — an interactive agent demonstration that moves AI interaction beyond plain text into graphical environments. Most of the code was AI-generated, demonstrating M2.7’s practical coding capabilities.

Repository: github.com/MiniMax-AI/OpenRoom
Live Demo: openroom.ai

M2.7 vs the Competition: Who Should Use What

If you need…	Best choice
Maximum benchmark ceiling regardless of cost	Claude Opus 4.6
Best cost-adjusted coding performance	MiniMax M2.7
Fastest inference speed	MiniMax M2.7 (100 TPS)
High-volume agent workloads	MiniMax M2.7 (50x cheaper)
Office productivity automation	MiniMax M2.7 (highest GDPval-AA ELO)
Established ecosystem and integrations	Claude or GPT
Self-improving agent capabilities	MiniMax M2.7 (OpenClaw)

Try M2.7 on WaveSpeedAI

WaveSpeedAI provides access to MiniMax M2.7 alongside hundreds of other AI models through a unified platform. Whether you’re building coding agents, document processing pipelines, or interactive applications, M2.7’s combination of Tier-1 performance and fraction-of-the-cost pricing makes it the most efficient choice for production workloads.

Try MiniMax M2.7 on WaveSpeedAI →

No subscriptions. No cold starts. Pay only for what you use.

The Bottom Line

MiniMax M2.7 isn’t just another model release — it’s a proof of concept for self-evolving AI. A model with only 10B activated parameters matching Opus and GPT-5 on the hardest engineering benchmarks, while running 3x faster at 50x lower cost, represents exactly the kind of disruption that reshapes how teams build with AI.

The question isn’t whether M2.7 is good enough. It’s whether you can justify paying 50x more for marginal gains.

Sources:

MiniMax M2.7: A Self-Evolving Model That Rewrites the Rules of AI Agents

What Makes M2.7 Different: Self-Improvement

Benchmark Performance: Punching Way Above Its Weight

Software Engineering

Professional Productivity

Machine Learning Research

Speed and Pricing: The Real Disruption

Core Capabilities

Agent-Centric Workflows

Software Engineering

Office Suite Excellence

Character and Emotional Intelligence

Two API Variants

Developer Tool Compatibility

OpenRoom: Interactive Agent Demo

M2.7 vs the Competition: Who Should Use What

Try M2.7 on WaveSpeedAI

The Bottom Line

Related Articles

Best Fotor Alternative in 2026: WaveSpeedAI for AI Image Generation & Editing

Best Free Audio Converter in 2026: Convert MP3, WAV, FLAC, AAC Instantly

Best Free Image Converter in 2026: Convert PNG, JPG, WebP, HEIC Locally

Best Free Video Converter in 2026: Convert MP4, MOV, AVI, WebM Without Watermarks

Best Janitor AI Alternative in 2026: Why Creators Are Switching to WaveSpeedAI

Best Media.io Alternative in 2026: WaveSpeedAI for AI Video, Image & Audio Tools