A comprehensive timeline of major AI achievements and predictions for future developments in artificial intelligence
First mass production of AI-powered humanoid robots for commercial use
AI systems capable of conducting independent scientific research
Models achieve human-expert level on complex multi-step reasoning benchmarks
荣耀's 'Lightning' robot completed Beijing half-marathon in 50:26, faster than human world record, marking breakthrough in autonomous athletic performance.
View Source →Anthropic CEO meets with White House officials over national security concerns regarding Claude Mythos's autonomous vulnerability discovery capabilities.
View Source →New framework enables verification-aware speculative decoding with step-level error detection without external reward models, improving multi-step reasoning accuracy.
View Source →Breakthrough in 3D policy learning with transformer-based encoder solving training instabilities and overfitting issues.
View Source →Novel reasoning-driven framework treats sign language translation as cross-modal reasoning task with latent thought sequences.
View Source →OpenAI introduces multi-turn approach for precise GUI interaction with visual feedback and error correction.
View Source →System enabling autonomous AI agents to sustain coherent ML research progress across comprehension, implementation, and experimentation over days.
View Source →rDPO framework introduces instance-specific rubrics for fine-grained visual reasoning preference optimization in multimodal tasks.
View Source →Monthly-refreshed benchmark testing whether LLMs can find known security vulnerabilities in real repository codebases with sandboxed exploration.
View Source →Berkeley researchers demonstrate systematic ways to break top AI agent benchmarks, highlighting fundamental evaluation methodology issues.
View Source →Open-source visual web agent with transparent training data and methodology for autonomous web navigation tasks.
View Source →Anthropic introduces ClawBench, a comprehensive evaluation framework testing AI agents on 153 everyday online tasks across 144 live platforms.
View Source →Research breakthrough addressing agents' meta-cognitive deficits in arbitrating between internal knowledge and external tool usage.
View Source →Meta introduces Muse Spark, positioning it as a step toward personal superintelligence capabilities for individual users.
View Source →Research breakthrough allows full-precision training of 100+ billion parameter language models on a single GPU, dramatically reducing training costs.
View Source →Anthropic releases specialized Claude model variant focused on advanced cybersecurity capabilities with detailed system card documentation.
View Source →Google releases Gemma-4 series with any-to-any and image-text-to-text capabilities across multiple parameter sizes (4B-31B).
View Source →Claude successfully wrote a complete FreeBSD remote kernel RCE exploit with root shell, demonstrating advanced cybersecurity capabilities.
View Source →Original Alibaba Qwen technical lead publishes influential essay on transitioning from reasoning to agentic thinking paradigms.
View Source →New benchmark designed to measure artificial general intelligence through novel reasoning tasks, addressing limitations of previous AI evaluation methods.
View Source →First multimodal framework combining video, tactile sensing, and action prediction for contact-rich physical interactions.
View Source →OpenAI introduces framework to accelerate multimodal agent reasoning through speculative perception and planning.
View Source →First AI system confirmed to solve an open mathematical research problem, marking breakthrough in AI mathematical reasoning capabilities.
View Source →First demonstration of a 400 billion parameter language model running natively on a mobile device, showcasing dramatic advances in on-device AI.
View Source →Researchers discover discrete 3-4 layer 'reasoning circuits' in transformers that can be duplicated to dramatically improve logical deduction performance without training.
View Source →Research introduces framework enabling language models to continuously improve from real-world deployment experience rather than offline training only.
View Source →Nvidia introduces purpose-built CPU architecture specifically designed for agentic AI workloads, marking hardware specialization for autonomous agents.
View Source →Investment bank warns of imminent AI breakthrough driven by rapid computing expansion that could strain power grids and disrupt jobs globally.
View Source →Legendary programmer John Carmack publicly disputes OpenAI and other labs' aggressive AGI timelines, stating 'We Are Not on the Brink of AGI' with significant implications for industry investment.
View Source →Anthropic's Claude models now support 1 million token context windows in general availability, enabling processing of extremely long documents.
View Source →First desktop agent that learns tasks from single demonstrations across GUI apps, browsers, terminals, and messaging tools in unified sessions.
View Source →Nvidia announces major strategic shift with $26 billion investment in open-source AI models over five years, competing directly with OpenAI and other closed-source providers.
View Source →Research reveals how data correlations determine feature geometry in neural networks, extending beyond sparse uncorrelated settings.
View Source →Open-source inference engine achieves faster performance than llama.cpp, MLX, and Ollama on Apple Silicon using custom Metal shaders.
View Source →Research demonstrates that chain-of-thought reasoning substantially expands LLMs' ability to recall factual knowledge from parameters.
View Source →LLM trained on Python execution traces can predict line-by-line execution and function as a neural interpreter with debugging capabilities.
View Source →AI-powered age verification systems now achieve 1-2 year accuracy in determining user ages, enabling widespread implementation of child safety laws across multiple jurisdictions.
View Source →DNA foundation model trained on 100,000+ species can identify genetic patterns across entire tree of life, published in Nature.
View Source →First trainable INT8 attention system that quantizes six of seven attention operations while preserving training performance.
View Source →Research shows GPT-5, Claude-4.5, and Qwen-3 can execute rare strategic actions while maintaining calibration, raising safety concerns.
View Source →China's AI model usage reached 4.12 trillion tokens vs US 2.94 trillion tokens in one week, marking historic shift.
View Source →Shanghai hospital launches world's first traceable AI agent system for rare disease diagnosis, published in Nature.
View Source →Department of Defense designates Anthropic as supply-chain risk amid clash over military AI partnerships, marking escalation in AI governance conflicts.
View Source →OpenAI secures record-breaking $110B funding round with major investors including SoftBank, Nvidia, and Amazon, highlighting massive AI investment scale.
View Source →Anthropic abandons a major safety commitment, marking a significant shift in AI safety policy approach from one of the leading safety-focused AI companies.
View Source →Anthropic alleges 16 Chinese AI entities systematically distilled Claude through API harvesting, raising IP protection concerns.
View Source →Google's Aletheia agent powered by Gemini 3 Deep Think autonomously solved 6 out of 10 problems in the inaugural FirstProof mathematics challenge, demonstrating advanced mathematical reasoning capabilities.
View Source →Milestones are identified through analysis of research publications, product announcements, and expert assessments. Predictions are based on current progress trajectories and capability assessments.
Read our methodology