The International Mathematical Olympiad is not just a terrifying sequence of words for someone as maths-blind as myself, but also a notoriously challenging world championship mathematics competition for high school students from over 100 different countries. Each year students compete to show off their mathematical prowess in a chosen host country, each aiming to solve problems that would make the rest of us cower in fear.
Google DeepMind has announced that two of its AI systems, AlphaProof and AlphaGeometry 2, took on this year’s contest questions as a combined system. The AI had its solutions scored by previous gold-medalist winners Professor Sir Timothy Gowers and Dr Joseph Myers, the latter of which is Chair of the IMO 2024 Problem Selection Committee itself.
Not only did the AI chalk up a combined score of 28 out of 42, one point off the 29 required for a gold medal, but also achieved a perfect score on the competition’s hardest problem (via Ars Technica). Just as well really, as two combinatorics problems remained unsolved. Still, stick to what you’re good at, ey?
There’s a slight fly in the ointment, however. In a Twitter thread, Prof Sir Timothy Gowers points out that while the AI did indeed score higher than most, it needed a lot longer than human competitors to do so. Human candidates submit their answers in two four-and-a-half-hour sessions—and while one problem was solved by the AI within minutes, it took up to three days to solve the others.
“If the human competitors had been allowed that sort of time per problem they would undoubtedly have scored higher,” wrote Sir Gowers.
“Nevertheless, (i) this is well beyond what automatic theorem provers could do before, and (ii) these times are likely to come down as efficiency gains are made.”
Google DeepMind have produced a program that in a certain sense has achieved a silver-medal peformance at this year’s International Mathematical Olympiad. 🧵https://t.co/DIcsYXUv97July 25, 2024
Not only that, but it’s not like the AI sat down in front of a test paper and began chewing on its pencil. The problems were manually translated into Lean, a proof assistant and programming language, so the autoformalization of the questions was carried out by old-fashioned humans.
Still, as the good Professor points out, what the AI has achieved here is a lot more involved and nuanced than simply brute forcing the problems:
“We might be close to having a program that would enable mathematicians to get answers to a wide range of questions, provided those questions weren’t *too* difficult—the kind of thing one can do in a couple of hours.”
“Are we close to the point where mathematicians are redundant? It’s hard to say. I would guess that we’re still a breakthrough or two short of that.”