MidnightAI.org
Weekly Intelligence Report
Monday, January 26, 2026 - Sunday, February 1, 2026
Executive Summary
This week revealed significant cracks in the AI industry's facade of unstoppable progress. The most striking development was Apple's reported abandonment of its in-house LLM efforts in favor of partnering with Google for Siri's AI capabilities—a move that, if confirmed, signals the immense difficulty even tech giants face in developing competitive AI models. Equally sobering was the APEX benchmark's demonstration that leading AI agents succeed on only 24% of real white-collar tasks from banking, consulting, and law, providing hard evidence that workplace AI remains far from the transformative force many have claimed.
The week also highlighted critical infrastructure and safety concerns that are often overshadowed by capability announcements. A professor's loss of two years of research after disabling ChatGPT's data sharing feature exposed fundamental flaws in how AI systems handle user data persistence. Meanwhile, procurement industry data showed that while AI adoption is universal, only 11% of organizations feel ready to scale their implementations due to data quality and governance issues. These developments, combined with reports of science communication restrictions under the Trump administration, paint a picture of an AI landscape facing significant technical, organizational, and political headwinds that contrast sharply with the optimistic narratives often promoted by AI companies.
Key Developments
Apple Abandons In-House AI Development for Google Partnership
Apple reportedly scraps years of internal LLM development efforts, opting instead to use Google's Gemini model for next-generation Siri features. The partnership will leverage Apple's private cloud infrastructure while relying on Google's AI capabilities.
If confirmed, this represents a major strategic shift for Apple and validates the extreme difficulty and cost of developing competitive LLMs, even for the world's most valuable company. It also concentrates AI power further among a small number of players.
AI Agents Achieve Only 24% Success Rate on Real Office Tasks
Mercor's APEX-Agents benchmark tested leading AI models on actual white-collar tasks from banking, consulting, and law firms, revealing a 76% failure rate. This contrasts sharply with high scores on academic benchmarks.
Provides concrete evidence that current AI agents are far from replacing knowledge workers, despite industry claims. The gap between academic benchmarks and real-world performance highlights fundamental limitations in current AI systems.
Academic Loses Two Years of Research Due to ChatGPT Data Handling
A University of Cologne professor reports losing all research materials stored in ChatGPT after disabling the data authorization feature, including grant applications, paper revisions, and course materials accumulated over two years.
Exposes critical flaws in how AI systems handle user data and the risks of using consumer AI tools for professional work. Highlights the lack of proper data persistence and user control in current AI interfaces.
Capability Progress
Language
+2 ptsDespite incremental improvements in benchmarks, real-world performance gaps and Apple's reported pivot suggest fundamental challenges remain in achieving reliable language AI
- -Apple reportedly abandoning internal LLM development (announced)
- -AI agents demonstrate only 24% success on real office tasks (verified)
Science
+5 ptsProgress in medical imaging applications continues, but concerns about AI reliability in scientific domains grow with documented proof fabrication issues
- -AI-powered CT scan standardization project launches (announced)
- -Research on AI-generated fake mathematical proofs (verified)
Multimodal
+1 ptsSolid research progress in evaluation frameworks and specific applications, though no major capability breakthroughs demonstrated
- -VisGym benchmark introduces 17 environments for testing (verified)
- -SyncLight enables consistent multi-view relighting (verified)
Robotics
+1 ptsMultiple announcements but limited verified progress; focus remains on demonstrations rather than practical deployment
- -Chinese robots announced for Spring Festival performance (announced)
- -Segway plans robotics integration in vehicles (announced)
Company Activity
Apple's reported abandonment of internal LLM development in favor of partnering with Google marks a potential strategic pivot. While unconfirmed, Bloomberg's reporting suggests the company struggled to match competitors' AI capabilities despite significant investment. The claimed iOS 26.4 beta timeline for new Siri features remains to be verified.
Google potentially gains a major strategic win if the Apple partnership materializes, extending Gemini's reach to iOS devices. However, this remains unconfirmed by either company. The company continues research output but made no major announcements this week.
Relatively quiet week for OpenAI with no major announcements. The company faced negative attention from the ChatGPT data loss incident affecting an academic user, highlighting ongoing challenges with data persistence and user control in their consumer products.
Emerging Trends
- 1.Reality Check on AI Agent Capabilities(85% confidence)
- • APEX benchmark shows 76% failure rate on real tasks (verified)
- • Only 11% of companies ready to scale AI implementations (verified)
- • Growing documentation of AI limitations and failures (verified)
- 2.Consolidation of AI Development Among Tech Giants(75% confidence)
- • Apple reportedly abandoning internal LLM efforts (announced)
- • Smaller players struggling to compete on foundation models (observed)
- • Increasing partnership announcements (announced)
- 3.Growing Scrutiny of AI Reliability in Professional Contexts(80% confidence)
- • Academic loses research in ChatGPT (verified)
- • AI faking mathematical proofs documented (verified)
- • Procurement struggles with AI scaling (verified)
Looking Ahead
- •Verification of Apple-Google AI partnership and its implications for the competitive landscape
- •Whether iOS 26.4 beta actually includes new Gemini-powered Siri features as reported
- •Impact of potential federal science communication restrictions on AI research transparency
- •Real-world performance data from Chinese humanoid robots at Spring Festival broadcast
- •Industry response to APEX benchmark revealing significant AI agent limitations