The Unsettling Pace of AI Progress: A Physics PhD’s Perspective
As a physicist watching the artificial intelligence (AI) revolution as a curious bystander, I find myself in an increasingly familiar state of awe and unease. While I am not an AI expert, the recent developments have left me questioning my prior assumptions on the timeline of AI development.
The jump from OpenAI's o1-mini, o1-preview, and o1-full models to o3 has been nothing short of staggering. We're witnessing performance improvements that I hadn't expected to see for another 2-5 years – particularly the leap from 25-30% to 76% on the ARC-AGI benchmark (reaching 88% at an exorbitant cost of compute). This isn't just incremental progress; it's like watching a toddler’s cognitive abilities develop into a teenager’s in a matter of months.
My perspective on AI capabilities changed with OpenAI’s o1 series released in mid-September. As someone who likes to solve mathematics and physics problems for fun, I found myself humbled by these models' abilities. Problems that would take me hours to devise a solution for were aced by o1 in minutes. The skeptic in me wanted to attribute this to sophisticated pattern matching, but that argument is starting to collapse.
Perhaps most shocking to me is o3's performance on the FrontierMath benchmark, recently released in November 2024. While previous state-of-the-art AI models struggled to achieve 2%, o3 broke the grading curve and achieved 25%. A 10x improvement over previous state-of-the-art models isn't just progress – it's a paradigm shift. These are mathematical problems that challenge the world's best mathematicians, problems that demand days of deep contemplation. Even Terence Tao, 2006 Fields Medalist and in my opinion, the greatest living mathematician, said that these math problems “are extremely challenging…I think they will resist AIs for several years at least.” (so, what does one think when it’s making this kind of progress in less than 2 months?) These aren’t longer simple language models performing next-token prediction anymore; this is new AI-model territory that even those of us with non-AI technical backgrounds struggle to fully comprehend.
What worries me most isn't just the capabilities these models demonstrate, but their peculiar limitations. How can an AI model that impresses Nobel laureates and Fields Medalists simultaneously fail tasks a child could solve? This inconsistency, along with the fundamental lack of interpretability in these models, raises serious concerns about their deployment on a large scale.
When ChatGPT 3.5 emerged in November 2022, I talked with friends about how this is just the start of what’s to come. Yet I doubt any of us truly grasped the speed at which these AI models would improve. As a recently minted (June 2023) Physics PhD, I've spent years learning and thinking about complex systems, but this rapid evolution of AI is something I’ve never conceived of outside of Terminator movies and Avengers: Age of Ultron.
The deployment of these AI systems feels inevitable. While I'll continue my AI testing and analysis on YouTube, I do so with growing uncertainty about my ability to fully grasp the implications. Perhaps my concerns are unfounded – and part of me hopes they do. But in an era of such unprecedented technological advancement, I believe voicing my concerns is not just justified but necessary.
The question is not whether these systems will reshape our world because I think they already are, but how prepared are we for the transformation they'll bring? How can we adapt to such a rapid pace of development when we humans evolved on timescales of tens of thousands to millions of years? And while I may not have all the answers, I know enough to understand that this is a conversation we need to be having now, not later.