Chinese Challenger DeepSeek AI Nears GPT-4o Performance with Efficient LLM
A formidable Chinese AI lab emerges, releasing an open-source model that challenges OpenAI and Google's top LLMs.
May 29, 2025

A new challenger has emerged in the rapidly evolving landscape of large language models, with China-based AI lab DeepSeek releasing an updated model, DeepSeek-R1-0528, that it claims rivals the performance of leading models from OpenAI and Google. The company announced that the new model, an update to its R1 series, demonstrates "significantly improved its depth of reasoning and inference capabilities."[1] This development signals intensifying competition among AI developers and highlights the increasing sophistication of models originating from China.
DeepSeek AI, founded in July 2023 and backed by Chinese hedge fund High-Flyer, has been making waves with its open-source approach and the efficiency of its model development.[2][3][4] The R1 series, and its predecessors like DeepSeek-V2 and DeepSeek-V3, leverage a Mixture-of-Experts (MoE) architecture.[5][6][7] This design approach utilizes a large number of total parameters but only activates a fraction of them for any given task, leading to more efficient training and inference.[5][6][7] For instance, the DeepSeek-R1-0528 model has a total of 671 billion parameters, with 37 billion activated during an inference pass.[8][9] This efficiency is a key factor in DeepSeek's ability to develop powerful models at a reported fraction of the cost of some Western counterparts.[10][2][11] DeepSeek has also been transparent about its training data, with DeepSeek-V2 being pretrained on 8.1 trillion tokens and DeepSeek-V3 on 14.8 trillion tokens.[12][5][6][11] The company has also released models specifically for coding, like DeepSeek Coder V2, and vision-language tasks, such as DeepSeek-VL.[13][14][15][16][17][18][19]
The central claim surrounding DeepSeek-R1-0528 is its performance parity with models like OpenAI's "o3" (which likely refers to GPT-4o, OpenAI's latest flagship model) and Google's "Gemini 2.5 Pro" (likely a reference to Gemini 1.5 Pro, Google's current advanced offering, as Gemini 2.5 Pro has not been officially announced).[20][1] DeepSeek's Hugging Face page for R1-0528 states its overall performance is "approaching that of leading models, such as O3 and Gemini 2.5 Pro."[1] Benchmark data released by DeepSeek for R1-0528 shows significant improvements over the previous R1 version. For example, on the AIME 2025 test (a math benchmark), accuracy reportedly increased from 70% to 87.5%, attributed to an enhanced thinking depth, with the average tokens used per question increasing from 12K to 23K.[1] On other benchmarks, R1-0528 is reported to achieve 93.4% on MMLU-Redux (EM), 85.0% on MMLU-Pro (EM), and 81.0% on GPQA-Diamond (Pass@1).[1] In coding, it scored 73.3% on LiveCodeBench (2408-2505) (Pass@1) and a rating of 1930 on Codeforces-Div1.[1] Some initial community testing also suggests the updated R1 performs similarly to OpenAI's GPT-4o in style and performance, particularly in professional-style responses and its ability to engage in self-correction through chains of reasoning.[20] One comparative analysis focused on scientific text categorization found that DeepSeek R1 provided more complete coverage than GPT-4o, categorizing more sentences, particularly those containing mathematical symbols, though overall category agreement between the two models was 44.71%.[21] Another test involving a distilled 70B version of DeepSeek R1 found it outperformed Llama 3 70B and nearly matched GPT-4o in classifying error types in server logs, though GPT-4o had a slight edge in classifying severity levels.[22]
The emergence of a highly capable model like DeepSeek-R1-0528 has several implications for the AI industry. Firstly, it underscores the rapid advancements being made by AI labs outside of the dominant Western players, particularly in China.[2][4] This intensifies global competition, potentially accelerating the pace of innovation and driving down costs for end-users. DeepSeek's emphasis on open-source models, with available model weights, can foster broader research and development within the AI community, allowing more developers and organizations to build upon these advanced foundations.[3][11][23][24] The company’s focus on efficient model architectures like MoE also highlights a trend towards creating powerful AI that is less resource-intensive to train and deploy, which could democratize access to cutting-edge AI capabilities.[25][6][7][26] Furthermore, DeepSeek has also been working on distilling its large models into smaller, yet highly performant dense models, which could be crucial for applications requiring lower latency and computational resources.[27][28][24] The company is also actively developing vision-language models, indicating a broader ambition to compete across multiple AI modalities.[13][14][18][19]
In conclusion, the introduction of DeepSeek-R1-0528 marks another significant milestone in the development of large language models. While claims of direct equivalence to industry giants like OpenAI's GPT-4o and Google's Gemini 1.5 Pro will continue to be scrutinized through independent testing and real-world applications, the reported benchmark performance and the underlying efficient architecture signal that DeepSeek AI is a formidable new entrant.[20][1] The company's open-source philosophy and focus on reasoning capabilities could spur further innovation and broaden access to advanced AI technologies, ultimately reshaping the competitive dynamics of the global AI industry.[3][4][11] The "minor update" branding by DeepSeek for such performance jumps also suggests a rapid iteration cycle and ambitious future developments.[29][1]
Research Queries Used
DeepSeek-R1-0528 model release
DeepSeek-R1 capabilities and benchmarks
DeepSeek-R1 vs GPT-4o performance
DeepSeek-R1 vs Gemini 1.5 Pro performance
DeepSeek AI company profile
DeepSeek Coder V2 Lite
DeepSeek LLM benchmarks HumanEval
DeepSeek LLM benchmarks MMLU
DeepSeek LLM benchmarks GPQA
DeepSeek MoE models
DeepSeek-V2 performance
DeepSeek-VL model
Sources
[3]
[5]
[6]
[7]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]