When will we reach AGI?

Most AI lab leaders predict AGI within 2-5 years (by 2027-2030), though significant uncertainty remains about both timing and what "AGI" precisely means.

Timeline predictions vary widely among experts. OpenAI's Sam Altman has suggested AGI could arrive within "a few thousand days." Anthropic's Dario Amodei expects human-level AI capabilities within 2-3 years. Google DeepMind's Demis Hassabis says "a handful of years." These are the people building the systems, and they're not known for underestimating difficulty.

However, forecasting transformative AI has historically been unreliable. What researchers mean by "AGI" also varies - some focus on benchmark performance, others on economic impact, and others on autonomous capability. The consensus among frontier lab leadership points to the late 2020s, but breakthroughs or roadblocks could shift this significantly.

Our current clock position reflects both the rapid progress we're observing and the genuine uncertainty about remaining obstacles.

What are the biggest AI safety risks?

The primary concerns are the alignment problem (ensuring AI does what we intend), the race dynamics between labs, and compressed timelines that leave little room for safety research.

The alignment problem remains unsolved: current techniques for aligning AI rely on humans supervising AI behavior, but humans cannot reliably supervise systems smarter than themselves. A 2025 study found some AI models will break rules and disobey commands to avoid being shut down - behavior nobody explicitly programmed.

Race dynamics compound the problem. Multiple labs are competing to build AGI first, creating pressure to cut corners on safety. Game theory shows this creates a "race to the bottom" - everyone would benefit from more careful development, but no one wants to slow down while competitors push ahead.

Anthropic's CEO puts the odds of "something going catastrophically wrong on the scale of human civilization" at 10-25%. These aren't fringe estimates - they come from people running major AI labs who have the most insight into current capabilities and trajectories.

Which company is closest to AGI?

OpenAI, Anthropic, and Google DeepMind are generally considered the frontrunners, with Chinese labs like DeepSeek rapidly closing the gap. No single lab has a decisive lead.

The frontier is remarkably competitive. OpenAI has consistently led in deploying capable models and has stated they "know how to build AGI." Anthropic's Claude models demonstrate strong reasoning and safety properties. Google DeepMind brings massive resources and research depth, particularly in areas like AlphaFold that show potential for scientific breakthroughs.

Chinese labs have made remarkable progress, with DeepSeek matching or exceeding Western labs on many benchmarks at a fraction of the cost. Alibaba, ByteDance, and Baidu are also investing heavily.

xAI (Elon Musk's company) is moving fast with significant compute resources. Meta continues to push open-source frontiers. Microsoft's partnership with OpenAI gives it unique positioning.

We track 14 major labs and their flagship models. Capability leadership shifts frequently as new models are released. The competition is intense and the gap between leaders is narrow.

How fast is AI progress accelerating?

AI capabilities are improving faster than most predicted, with China catching up to the US on key benchmarks within months rather than years.

Stanford's 2025 AI Index documents remarkable acceleration. Benchmarks that once seemed years away are being surpassed in months. Chain-of-thought reasoning, mathematical problem-solving, and coding capabilities have all improved dramatically.

Scaling laws - empirical relationships between compute, data, model size, and capability - continue to hold, suggesting predictable improvement with more resources. Major labs are investing billions in compute infrastructure, betting on continued scaling.

Emergent capabilities add uncertainty. As models grow, they develop unexpected abilities not explicitly trained for. This makes it difficult to predict what the next generation of models will be able to do.

The rate of progress has surprised even researchers. What looked like decade-long runways now look like years. Whether this continues or hits fundamental limits remains one of the most important open questions.

Is AI development dangerous?

Leading AI researchers, including those building these systems, consider advanced AI development one of the most significant risks facing humanity. Risk estimates vary but are non-trivial.

This isn't a fringe concern. Geoffrey Hinton, often called the "godfather of AI," left Google partly to speak more freely about AI risks. Yoshua Bengio, another deep learning pioneer, has called for international governance. Hundreds of AI researchers signed statements highlighting extinction-level risks.

The concern isn't that current AI is dangerous, but that the trajectory leads to systems we may not be able to control. Stuart Russell, author of the standard AI textbook, recently said: "We are spending hundreds of billions of dollars to create superintelligent AI systems over which we will inevitably lose control."

At the same time, AI could be enormously beneficial - accelerating scientific discovery, solving complex problems, and expanding human capability. The question isn't whether to develop AI, but whether we can do so safely.

Our goal at MidnightAI is not to tell you what to think, but to provide accurate information about where we are so you can form your own views.

What benchmarks measure AI progress?

We track performance across standardized benchmarks like MMLU (knowledge), HumanEval (coding), GSM8K (math), and others that indicate progress across seven capability domains.

Key benchmarks include:

**Reasoning**: MMLU (Massive Multitask Language Understanding), ARC (AI2 Reasoning Challenge), and competition mathematics problems.

**Coding**: HumanEval, MBPP, SWE-Bench for real-world software engineering tasks.

**Mathematics**: GSM8K, MATH dataset, competition-level problems.

**Science**: Scientific reasoning benchmarks, medical exams (MedQA), legal reasoning (LSAT).

**Agency**: Tool use, multi-step planning, and real-world task completion rates.

**Multimodal**: Image understanding, video comprehension, audio processing capabilities.

We weight these domains by estimated importance to AGI: Reasoning (25%), Agency (20%), Coding (15%), Language (10%), Multimodal (10%), Science (10%), Robotics (10%). This weighted average informs our clock position.

Have more questions?

We track AI progress 24/7 and update our analysis as new developments emerge. Explore our research or subscribe to our newsletter for weekly updates.