Alibaba's Qwen2.5-Max Enters Global Top 10

Published: February 12, 2025 16:33

On February 4, the latest rankings from the globally renowned AI large model evaluation platform, Chatbot Arena, revealed that Alibaba's Qwen2.5-Max model has entered the global top ten for the first time, surpassing the recently popular DeepSeek-V3 and leading other top proprietary models like O1-Mini and Claude-3.5-Sonnet.

Specifically, Qwen2.5-Max ranks first in mathematics and programming, and second in handling hard prompts. The official evaluation from Chatbot Arena praises Qwen2.5-Max for its strong performance across multiple domains, particularly in specialized technical areas like programming, mathematics, and hard prompts.

The latest version, Qwen2.5-Max, uses an advanced mixture-of-experts (MoE) architecture, with over 20 trillion tokens of pre-training data. It is optimized with supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) techniques, excelling in knowledge, programming, general abilities, and human alignment.

Whether for language models or multimodal models, Qwen is pre-trained on large-scale multilingual and multimodal data and fine-tuned with high-quality datasets to align more closely with human preferences. Qwen possesses a range of capabilities, including natural language understanding, text generation, visual understanding, audio processing, tool usage, role-playing, and interactive AI agent functions.

Key features of Qwen2.5 include:

- Easy-to-use decoder-based dense language models, available in 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameter sizes, with both base and instruction-fine-tuned variants (where "B" stands for billion, with 72B referring to 72 billion parameters).

- Pre-trained on the latest datasets, including up to 18 trillion tokens.

- Significant improvements in instruction following, long-text generation (over 8K tokens), structured data comprehension (e.g., tables), and the generation of structured outputs, especially JSON.

- Enhanced adaptability to diverse system prompts, improving role-playing and background settings for chatbots.

- Supports a context length of up to 128K tokens and generates up to 8K tokens of text.

- Supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.

In fact, over the past year, the domestic large model industry in China has seen several waves of price reductions. For instance, Alibaba Cloud’s Tongyi Qianwen visual understanding model saw its entire line reduced by more than 80%, with a cost as low as 0.0015 yuan per thousand tokens. ByteDance’s Doubao visual understanding model charges just 3 cents per thousand tokens, 85% cheaper than industry prices. Baidu’s Wenxin Yiyan has made its two major models, ERNIE Speed and ERNIE Lite, available for free to users.

The rise of domestic models in China has made it clear that OpenAI is no longer the sole dominant force in the large model field. The technological capabilities of these models can now rival, and even exceed, those of international mainstream models. As noted by Chatbot Arena:“Chinese large models, represented by Qwen2.5-Max, are catching up fast.”OpenAI CEO Sam Altman acknowledged the impact of China’s AI rise after the launch of O3-Mini, stating that it had weakened OpenAI’s technological lead.