METR trendline of Long Tasks Agent Capabilities breaks down immediately after publication More: smol.ai
OpenAI releases GDPVal assessing frontier models to be close to parity (50%) of human expert performance, particularly in Software Development but also other domains. Later models like GPT 5.2 Pro report 74.1% exceeding human experts.
More: smol.ai
Independent Austrian developer Pete Steinberger creates OpenClaw over Christmas using Codex, overtaking some of the most popular open source projects and joining OpenAI.
More: sama
SemiAnalysis estimates Claude Code makes up 4% of Public Commits on GitHub in Feb 2026, trending to 25-50% by year end.
More: latent.space
Google DeepMind trains an "advanced Gemini model with Deep Think" that is officially certified to achieve Gold performance at the International Math Olympiad, done purely in token space under human rules (whereas 2024's Silver needed a AlphaProof and AlphaGeometry model and over 60 hours). Same model later wins IOI, ICPC and IOAA gold.
More: deedy deedy