Topic of the Month
AI's Dawn of Reason
OpenAI pulled the veil back on its latest AI model this month. It had long been rumored the company was working on a secret initiative, first called Q* and then Project Strawberry internally, to improve AI's reasoning abilities. The new o1 release—which scraps the GPT naming scheme in a product reset—is said to deliver on that promise.
In a blog post, the company wrote that o1 makes strides in areas heavy in reasoning where previous models, including its own GPT-4, have struggled. This includes marked improvement on benchmarks—these are often human exams given to AI—measuring o1's ability to answer questions in math, science, and coding, some at a PhD level.
OpenAI achieved its breakthrough by combining reinforcement learning—an AI approach that’s yielded impressive results in game-playing—and chain-of-thought reasoning. The latter chops difficult problems into smaller, more manageable steps and follows them through to a solution. “Through reinforcement learning, o1 learns to hone its chain of thought and refine the strategies it uses,” OpenAI wrote in its blog post. “It learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn’t working.”
The model ranked in the top 11 percent in competitive coding, scored well enough to qualify for the Math Olympiad, a math competition for high school students, and exceeded human PhDs in a benchmark measuring knowledge in advanced physics, biology, and chemistry. It also significantly outperformed GPT-4o in these areas. But notably, OpenAI wrote, it may not match its predecessor in tasks that are more strictly limited to language.
Exactly how much such benchmarks can tell us about AI’s abilities, beyond showing how models compare to each other and prior generations of AI, is hotly debated. Critics say they fall short in some areas, like the quality of the test itself or whether exact or similar questions, answers, and knowledge exists online and therefore in each model's training data. Further, if all models perform similarly on existing benchmarks, we'll need new ones. Fortunately, there are already efforts afoot to make harder, more illustrative AI tests.
Still, the ability to perform multi-step reasoning has long been a goal in the industry, and o1 appears to be progress. Google DeepMind is also going after AI that can reason. DeepMind’s AlphaGeometry mashed together a large language model and a symbolic model—a more traditional, hard-coded approach—to match top high schoolers at geometry. DeepMind’s CEO, Demis Hassabis, has also said they’re looking to use reinforcement learning, which is their “bread and butter,” to improve future models.
Crucially, o1 proves AI can progress without resorting to scaling, in which developers improve models by making them bigger. That said, this month also showed scaling will continue, as players moved to secure cash and energy.
OpenAI’s release of o1 alongside its advanced voice mode coincided with reports the company is raising new funds from investors at an eye-opening $150 billion valuation, nearly twice the company’s valuation around this time last year. Anthropic is also said to be in the midst of its own funding round with a potential valuation of $40 billion.
While both companies are bringing in revenue and generative AI’s user base is growing fast—OpenAI’s has doubled in the last year—it’s not enough to keep up with operations and the ballooning costs of training next-generation AI models. In addition to new funding rounds, a coalition including Microsoft, BlackRock, Global Infrastructure Partners, and MGX announced efforts to raise an astonishing $100 billion to build out AI infrastructure—$30 billion in private equity capital and the rest via debt financing.
Assuming investment continues at this pace, a report from Epoch AI explored whether scaling is technically even feasible. Will we be able to find enough of the primary inputs—power, chips, and data—to maintain AI scaling at the current rate? The report found that, yes, it is technically possible to scale models by least 10,000x over OpenAI’s GPT-4 through 2030. The biggest sticking point: Powering the coming quantum leap in data centers.
It’s no surprise, then, that Sam Altman reportedly pitched the US government on plans to build several five-gigawatt data centers to be located around the country. Five gigawatts, Bloomberg writes, is like five nuclear plants powering three million homes. Speaking of which, Microsoft has also announced plans to reopen Pennsylvania’s Three Mile Island nuclear plant.
There’s clearly financial will to forge ahead for the time being. The level of investment reflects the size of the opportunity big tech believes is possible. As cash plowed into AI runs into the hundreds of billions; leaders think the return could be in the trillions.
Future investment will depend on how long scaling bears fruit. The next generation of models must show clear advances over this generation. It’s also possible new breakthroughs outside scaling, like o1, will show we can do more with less. For now, though, the path is laid out. Tech is going after big AI.
Know someone who might enjoy the Singularity Monthly?
Share this newsletter with them.