Humanity's Last Exam

February 3rd 2025 | Superintelligence Newsletter

Feb 03, 2025

Hey Superintelligence Fam! 👋

🚀 Humanity’s Last Exam has arrived - a groundbreaking challenge designed by 300+ organizations, spanning 100+ subjects with 3,000 expert-crafted questions that push AI to its absolute limits. As AI demolishes past benchmarks, this new test aims to measure something greater: Can AI outthink, outreason, and outperform human experts across the board? (more in the research spotlight section).

💡 This isn’t just a challenge - it’s a testament to the future. With every new leap in models, the line between human intelligence and machine capability blurs. Some call it progress, others call it inevitable - but one thing is clear: AI is racing towards a future that once belonged to science fiction.

Let’s dive in today’s edition. Reading Time : 3 Minutes

EU Officially Bans AI Systems with 'Unacceptable Risk' Under AI Act The European Union enforces the AI Act, prohibiting AI applications that pose "unacceptable risk," including social scoring and manipulative systems. Non-compliance may result in fines up to €35 million or 7% of global revenue.
Alibaba Launches Qwen2.5-Max AI Model Alibaba introduces Qwen2.5-Max, a large-scale Mixture-of-Experts model pretrained on over 20 trillion tokens. It outperforms existing models in benchmarks like MMLU-Pro and LiveCodeBench, marking a significant advancement in AI development.
AI-Powered Phishing Attacks Target Gmail Users (2.5B Accounts at Risk) Sophisticated phishing attacks using AI-generated voices and spoofed emails are targeting Gmail's 2.5 billion users. Attackers pose as Google support to extract credentials. Users are advised to enable Google's Advanced Protection Program for enhanced security.

Humanity's Last Exam: A benchmark with 3,000 expert-crafted questions across 100+ subjects; current AI models achieve up to 9.4% accuracy, indicating substantial room for improvement. Above image showcases the questions category breakdown.
Kimi k1.5: A multimodal LLM trained with reinforcement learning, utilizing long context scaling up to 128k tokens and improved policy optimization, achieving state-of-the-art performance in reasoning tasks.
Chain-of-Agents: A framework where multiple LLM agents collaborate by processing text chunks sequentially, enhancing performance on long-context tasks like question answering and summarization.

o3-mini : OpenAI's cost-efficient reasoning model with low/medium/high modes for STEM tasks, offering 24% faster responses than o1-mini and free-tier access via ChatGPT.
Qwen2.5-Plus : Alibaba’s multimodal AI excelling in coding, math, and visual tasks, with 128K-token context and JSON data parsing. Part of the Qwen2.5 series with API compatibility.
Janus Pro 7B : DeepSeek’s open-source image generator outperforming DALL-E 3 on benchmarks like GenEval (80%). Unifies multimodal understanding/generation via split visual encoding.
OpenRouter.ai : Model aggregation platform hosting DeepSeek R1, Gemini 2.0 Flash, and Amazon Nova Micro for text/code tasks at competitive pricing .
Kimi.ai : Moonshot AI’s free multimodal model with reinforcement learning, 128K-token context, and real-time web search across 100+ sites. Outperforms GPT-4o in math/coding.

Learn How to Run DeepSeek R1 Locally – Free & GPT Level AI on Your PC (Mac, Windows & Linux)

Between December 28, 2024, and February 2, 2025, global AI ethics saw significant advancements. UNESCO’s South Asian Women4Ethical AI Chapter launched on January 19, prioritizing gender-sensitive AI frameworks and regional inclusivity. The Vatican released Antiqua et Nova on January 28, addressing AI’s ethical risks in healthcare, labor, and human relations. Regulatory cohesion efforts intensified, with the EU advancing its AI Act and Canada proposing the Artificial Intelligence and Data Act. Academic updates, like Johns Hopkins’ AI ethics curriculum revisions, highlighted breakthroughs in LLM interpretability.

AI: Who's Going To Win and Why... | Bruno Aziza

Thank you for tuning in to this week's edition of Superintelligence Newsletter! Stay connected for more groundbreaking insights and updates on the latest in AI and superintelligence.

For more in-depth articles and expert perspectives, visit our website | Have feedback? Provide feedback.

If you wish to partner with us then Explore Here

Stay curious, stay informed, and keep pushing the boundaries of what's possible!

Until Next Time!

Superintelligence Team.

Humanity's Last Exam

February 3rd 2025 | Superintelligence Newsletter

Hey Superintelligence Fam! 👋

Learn How to Run DeepSeek R1 Locally – Free & GPT Level AI on Your PC (Mac, Windows & Linux)

Discussion about this post