On June 2, Mustafa Suleyman, CEO of Microsoft AI, took the Build 2026 stage in San Francisco and announced seven new models built entirely by Microsoft. The lineup includes MAI-Thinking-1, a reasoning model; MAI-Code-1-Flash, a coding model; MAI-Image-2.5 and its flash variant; MAI-Transcribe-1.5; and MAI-Voice-2 with a flash variant. None of them run on OpenAI.
What the numbers actually say
MAI-Thinking-1 is a mid-sized sparse Mixture of Experts model with 35 billion active parameters and a 256,000-token context window, trained entirely on commercially licensed data without distillation from any third-party model. It reaches 97.0 percent on AIME 2025 and 94.5 percent on AIME 2026, benchmarks that test mathematical and multi-step scientific reasoning. Microsoft noted that independent raters preferred MAI-Thinking-1 over Anthropic's Claude Sonnet 4.6 in blind testing, and stated the model matches Claude Opus 4.6 on coding tasks in SWE-bench Pro. Those claims come from evaluations run by Surge, Microsoft's independent human rating partner. Microsoft has published a preprint describing the evaluation methodology, but full reproduction of the results by independent labs has not yet occurred, leaving those benchmark claims open to challenge until confirmed externally.
MAI-Code-1-Flash launched June 2. It is built end-to-end by Microsoft using clean and appropriately licensed data. Official metrics for MAI-Code-1-Flash include 51 percent on SWE-Bench Pro and 5 billion active parameters. It outperforms Claude Haiku 4.5 with better price-to-performance across coding benchmarks. In the new token-based billing in GitHub Copilot, MAI-Code-1-Flash is priced cheaper than Claude Haiku 4.5. MAI-Code-1-Flash is rolling out to approximately 10 percent of individual users as a starting point, and users who select Auto in the VS Code model picker may get routed to the model.
After refining its models for the needs of consulting firm McKinsey, Microsoft was able to outperform OpenAI's GPT 5-5, with 10 times better cost efficiency, said Mustafa Suleyman, CEO of Microsoft AI. That figure is projection-based and carries the usual caveats around serving-cost differentials at scale, but it puts a number on what Microsoft is actually building toward: a world where it routes fewer tokens to OpenAI and keeps the margin.
The rest of the family
MAI-Image-2.5 and its flash variant are Microsoft's first models to serve both text-to-image and image-to-image workloads, with the flash variant ranking number two on the Arena AI leaderboard, surpassing Nano Banana 2 on image editing. MAI-Image-2.5 is already live in PowerPoint and rolling out to OneDrive. MAI-Transcribe-1.5 delivers state-of-the-art accuracy across 43 languages, and MAI-Voice-2 adds support for more than 15 additional languages with new voice options.
Beyond Microsoft Foundry and its own products, the company announced that its MAI models will also become available through Fireworks AI, Baseten, and OpenRouter. MAI-Thinking-1 stays in private preview on Foundry for now. The coding model is live. Everything else is staged.
Why this matters outside the enterprise
The shift has a direct read-through for smaller operators. GitHub Copilot is the AI coding tool that most independent developers and small engineering teams actually pay for. The AI coding market has taken off of late, with developers and people without technical backgrounds using text-based prompts to produce sophisticated software. When Microsoft routes more of those sessions to its own models — cheaper to serve, trained on the same Copilot production harnesses those users generate — it can lower the per-token price without negotiating with OpenAI. A solo developer or two-person shop paying for Copilot seats will feel that in their bill before they read a press release about it.
MAI-Code-1-Flash was trained directly with GitHub Copilot harnesses used in production, allowing it to learn how to interact with surrounding tools and systems in agentic coding tasks, making it uniquely well suited to real-world Copilot workflows compared to other available models. That framing — trained on actual user behavior inside the tool, not on curated benchmarks — is a different kind of claim than raw benchmark performance. It also means the model's behavior is shaped partly by how Copilot users actually write code, which is not something OpenAI controls.
Microsoft is attempting to play at more layers of the AI stack as OpenAI and Anthropic continue to record historic growth and push toward the public market. Seven models in one day is a specific answer to that problem. Whether the benchmarks hold up under independent scrutiny is a different question, and one that matters more than the keynote.