One developer reported going from $29 a month to $750. Another reported $50 climbing to $3,000. Both were running normal agentic sessions against large repositories, the kind of work that looked free under Copilot's old flat-rate subscription and now costs between $0.10 and $0.30 per 1,000 tokens billed through the AI Credits system.

On June 1, 2026, Copilot completed the shift to usage-based billing, meaning every suggestion, chat completion, and code review now consumes credits against a monthly budget. For developers on the $10 Pro plan, Opus models are gone. For students, premium model access is gone. GitHub introduced a new $100-per-month Max tier with 20,000 credits — roughly $200 of usage — for sustained heavy workloads.

What the billing change actually means for your sprint budget

The bills aren't edge cases from unusual usage. They result from running normal agentic sessions with models like Claude 3.7 Sonnet or GPT-4o against large codebases, which cost between $0.10 and $0.30 per 1,000 tokens under Copilot's AI Credits system. Once baseline premium requests are exhausted, each subsequent agent action costs roughly $0.01. A heavy debugging session can execute dozens of hidden prompts, silently racking up large overage fees.

As of May 2026, GitHub paused new signups for individual Copilot plans to manage demand and serve existing customers. Existing subscribers retain access. The practical effect for someone trying to onboard a new developer today: they can't get a Pro plan at all.

The alternatives don't all solve the problem the same way. Claude Code's interactive plan has no per-token ceiling. Pro costs $20 per month, Max 5x costs $100, Max 20x costs $200 — flat rates for interactive sessions in the terminal. A billing split arriving June 15, 2026 creates a separate credit pool for programmatic and API use — the Agent SDK and pipeline runs — but manual terminal sessions remain covered under the flat rate.

Windsurf is now Devin Desktop as of June 2, 2026. Cognition retired the Windsurf brand, relaunching the IDE as Devin Desktop with the Agent Command Center as the default surface and support for the open Agent Client Protocol, so Codex, Claude Agent, OpenCode, and other ACP agents run inside it. The rename matters for teams that had Windsurf in their toolchain documentation — the product is the same binary, but the brand is gone.

Benchmark scores, speeds, and where each tool actually wins

Claude Opus 4.8, released May 29, 2026, scores 88.6% on SWE-bench Verified — the highest of any model available today. SWE-bench tests real GitHub issues against real codebases, meaning an 88.6% score represents the model correctly resolving nearly 9 in 10 benchmark software engineering tasks autonomously.

Cursor supplements Claude models with its own inference infrastructure running at 200-plus tokens per second for raw completion speed. Windsurf's headline number had been inference speed at 950 tokens per second — the fastest available before the Devin Desktop rebrand. Cursor restructured its Teams pricing in June 2026 into Standard seats at $32 per seat per month on an annual plan and new Premium seats at $96 per seat per month on an annual plan, offering 5x Standard usage.

The AGENTS.md convention turns the repo itself into the agent's onboarding guide, holding how to run tests, what style to follow, and where not to touch. Codex, Cursor, Copilot, and Windsurf all read it natively. OpenAI started it; Google, Cursor, and Sourcegraph joined; and since December 2025, it has sat under the Agentic AI Foundation at the Linux Foundation alongside MCP. Claude Code still reads its own CLAUDE.md, though the direction points toward a single instruction file that spans tools and makes agent behavior portable.

On SWE-bench Verified, leading scores now sit within a narrow band of each other as of mid-May 2026, and Cursor will happily run any of the models. The model gap that defined the tool wars in 2025 has largely closed. What hasn't closed is the billing structure gap.

What this costs a small shop running daily agents

For a small development team or solo founder using agentic sessions daily, the calculus is fairly direct. Developers who leverage continuous AI agents will inevitably push their real monthly costs into the $100 to $200 range regardless of which tool they pick at the power-user tier. The difference is whether that number is predictable. If budget predictability is the priority, Claude Code is the clearest option, using hard capacity walls instead of variable flex billing to eliminate invoice shock. It's increasingly common for developers to stack a cheap autocomplete tool with a heavy agentic engine to balance speed and deep reasoning while optimizing costs.

Annual costs for 10 developers vary significantly across tools: GitHub Copilot Business runs $2,280 per year; Kiro Pro $2,400 per year; Google Antigravity via AI Pro $2,400 per year; Cursor Teams Standard on an annual plan approximately $3,840 per year; and Devin Desktop Teams $4,800 per year. Claude Code Teams is negotiated per seat. None of those numbers include overage from agentic workloads — only Copilot's new model turns overage into an open variable.

The tool that billed developers $3,000 in June used to be the affordable one. The $20 tools are now the predictable ones. That's a meaningful inversion for any team managing a software budget by spreadsheet rather than by surprise.