Beating Opus 4.6 and coming within striking distance of gpt-5.4 is impressive! Particularly given larger labs like Meta are struggling to catch up to OpenAI/Anthropic.
More competition among model vendors is great for developers!
Cursor is in a very tough situation right now. They don't have SOTA models (see the lack of benchmarks in the release), and they likely cannot subsidize usage through cheap subscriptions like claude code and openai do.
I wonder what's their plan moving forward, they have been releasing a ton of random features lalely.
More competition among model vendors is great for developers!
I wonder what's their plan moving forward, they have been releasing a ton of random features lalely.
We don't plan on reporting SWE-bench Verified, for similar reasons to OpenAI: https://openai.com/index/why-we-no-longer-evaluate-swe-bench...