MidnightAI.org
Monday, January 26, 2026 - Sunday, February 1, 2026
This week revealed significant cracks in the AI industry's facade of unstoppable progress. The most striking development was Apple's reported abandonment of its in-house LLM efforts in favor of partnering with Google for Siri's AI capabilities—a move that, if confirmed, signals the immense difficulty even tech giants face in developing competitive AI models. Equally sobering was the APEX benchmark's demonstration that leading AI agents succeed on only 24% of real white-collar tasks from banking, consulting, and law, providing hard evidence that workplace AI remains far from the transformative force many have claimed.
The week also highlighted critical infrastructure and safety concerns that are often overshadowed by capability announcements. A professor's loss of two years of research after disabling ChatGPT's data sharing feature exposed fundamental flaws in how AI systems handle user data persistence. Meanwhile, procurement industry data showed that while AI adoption is universal, only 11% of organizations feel ready to scale their implementations due to data quality and governance issues. These developments, combined with reports of science communication restrictions under the Trump administration, paint a picture of an AI landscape facing significant technical, organizational, and political headwinds that contrast sharply with the optimistic narratives often promoted by AI companies.
Apple reportedly scraps years of internal LLM development efforts, opting instead to use Google's Gemini model for next-generation Siri features. The partnership will leverage Apple's private cloud infrastructure while relying on Google's AI capabilities.
If confirmed, this represents a major strategic shift for Apple and validates the extreme difficulty and cost of developing competitive LLMs, even for the world's most valuable company. It also concentrates AI power further among a small number of players.
Mercor's APEX-Agents benchmark tested leading AI models on actual white-collar tasks from banking, consulting, and law firms, revealing a 76% failure rate. This contrasts sharply with high scores on academic benchmarks.
Provides concrete evidence that current AI agents are far from replacing knowledge workers, despite industry claims. The gap between academic benchmarks and real-world performance highlights fundamental limitations in current AI systems.
A University of Cologne professor reports losing all research materials stored in ChatGPT after disabling the data authorization feature, including grant applications, paper revisions, and course materials accumulated over two years.
Exposes critical flaws in how AI systems handle user data and the risks of using consumer AI tools for professional work. Highlights the lack of proper data persistence and user control in current AI interfaces.
Despite incremental improvements in benchmarks, real-world performance gaps and Apple's reported pivot suggest fundamental challenges remain in achieving reliable language AI
Progress in medical imaging applications continues, but concerns about AI reliability in scientific domains grow with documented proof fabrication issues
Solid research progress in evaluation frameworks and specific applications, though no major capability breakthroughs demonstrated
Multiple announcements but limited verified progress; focus remains on demonstrations rather than practical deployment
Apple's reported abandonment of internal LLM development in favor of partnering with Google marks a potential strategic pivot. While unconfirmed, Bloomberg's reporting suggests the company struggled to match competitors' AI capabilities despite significant investment. The claimed iOS 26.4 beta timeline for new Siri features remains to be verified.
Google potentially gains a major strategic win if the Apple partnership materializes, extending Gemini's reach to iOS devices. However, this remains unconfirmed by either company. The company continues research output but made no major announcements this week.
Relatively quiet week for OpenAI with no major announcements. The company faced negative attention from the ChatGPT data loss incident affecting an academic user, highlighting ongoing challenges with data persistence and user control in their consumer products.