The best AI of 2025: we compare ChatGPT, Claude and Gemini

The best AI of 2025 is yet to be seen. This is the only way we can navigate the maelstrom, without making mistakes, that the development of artificial intelligence models has become. Every few months, a contender announces its new model, better than the previous ones. So the list of the leading AI models on the market is constantly changing. The consolation is that this is for the better. Or, at least, that is what the tests accompanying each new AI model suggest. Standardised tests that show their own results and those of their competitors.

If we recently told you which was the best AI for creating images, the best AI for programming, and introduced you to the AI models coming out of China, those lists may undergo slight changes based on the most recent presentations by OpenAI, Google, and Anthropic. All three were announced in May 2025. Thus, with the data provided by these three ‘giants’ of artificial intelligence, we can update the ranking.

The first to show its hand was OpenAI, announcing Codex, its cloud-based AI agent focused on software engineering and programming. To perform complex tasks, it uses codex-1, an optimised version of the OpenAI o3 model. Next came Google, which presented dozens of new features at its Google I/O 2025 event. One of them was Gemini 2.5 Pro, its most comprehensive AI model to date. Announced in March, it incorporates several improvements. While Google’s announcements were still being digested, Anthropic unveiled Claude 4, the latest version of its AI model. So, which is the best AI of the first half of 2025?

The best AI of the first half of 2025

It is clear that there are many more rivals in the race to be the best AI. But on this occasion, we will focus on these three as they are the frontrunners. And also because both Anthropic and Google compare their models with the competition. Although that comparison is increasingly limited, as we will see. For its part, OpenAI has decided to focus on its own models and ignore the rest.

Let’s take this step by step. As we saw earlier, in mid-May, OpenAI unveiled its Codex tool, powered by its codex-1 AI model. It is a version of o3 trained to work with code and software. In the presentation, OpenAI presented the results of its tests, comparing codex-1 with o3. And then codex-1 with o4-mini, o3 and o1. These are among its most powerful AI models, except for GPT-4, which remains ChatGPT’s default. Specifically, the test used is SWE-bench. And, of course, the winner is codex-1. Therefore, it could be said that, in this sense, it is currently the best AI of 2025.

For its part, Google presented new features in Gemini 2.5, its latest AI model, which it unveiled in an experimental version at the end of March this year. Two months later, we can now see the final version of Gemini 2.5 in two versions: Gemini 2.5 Flash and Gemini 2.5 Pro. While codex-1 excels in software development, Gemini 2.5 Pro performs well in mathematics, coding, and multimodal tasks. That is, tasks requiring the processing and interpretation of different types of data at the same time (text, image, video, data). To this we must add the Deep Think function, which is the equivalent of OpenAI’s Deep Research mode. Both are used to perform more complex tasks.

No AI competition would be complete without a third contender

In the tests carried out and published on its official website, Gemini 2.5 Pro shows better results than OpenAI o3 and OpenAI o4-mini. Given that codex-1 is closely related to o3, we can deduce that Gemini 2.5 Pro would outperform codex-1 in the tests. In search of more objective results, if possible, we took a look at the specialised portal LLM Stats. However, its data is not up to date, so we cannot compare Gemini 2.5 Pro with the latest improvements that have just been announced. Nevertheless, Google’s model sets its sights high in this battle to be the best AI of 2025.

Now that we have a clear picture of the OpenAI and Google tests, there is still Anthropic. It presented its new Claude 4 model shortly after Google, and its tests mention all three contenders. In software engineering alone, which is codex-1’s area of strength, Claude 4 shows better results in the standard SWE-bench Verified test. Specifically, the test compares Claude 4 (in its Opus and Sonnet versions, the two most powerful) alongside the previous Sonnet version and the competition: Codex-1, o3 and GPT-4.1 from OpenAI and Gemini 2.5 Pro from Google.

In the rest of the tests carried out by Anthropic, Claude 4 also outperforms Codex-1 and Gemini 2.5 Pro in tasks related to code programming using an AI agent. This is the crucial point: automating the programming process and performing complex tasks without our intervention. However, it shows weaker results in mathematics, graduate-level reasoning, and visual reasoning tests.

Determining which is the best AI of 2025 is therefore extremely difficult. Not only because of the variety of use cases they cover (one may be good at coding but not so good at image generation, for example), but also because of the speed at which the industry is advancing. The best AI of 2025 in May may not be the same in just a few weeks. What does seem clear is that, at least for now, the contest is now between three contenders: Claude, Gemini, and ChatGPT.