Topic of the Month
AI Agents Arrive
All year, AI companies have been saying agents are the next big step. Chatbots are great, but what we really want is a full-blown assistant taking care of business. Agents, they say, could handle tasks from data entry to travel planning at a prompt.
Anthropic recently stole a march on its competitors when it announced Claude can now view a user’s screen and control their computer with a mouse and keyboard. Using this new functionality, Claude can devise multi-step actions to complete requests.
In one example, the AI plans a trip to see the sun rise over the Golden Gate Bridge. It opens a web browser, sifts through web results, and chooses a prime viewing location. Then it looks up what time the sun rises and gets directions on Google Maps. Finally, it creates a calendar invite, complete with notes on when to leave to arrive on time. Claude does all this by taking screenshots of a user’s display after each step, planning out what it should do next, and using the computer to make it happen—rinse and repeat.
“I think we're going to enter into a new era where a model can use all of the tools that you use as a person to get tasks done,” Jared Kaplan, chief science officer at Anthropic and an associate professor at Johns Hopkins University, told Wired.
Basic agents have been kicking around since soon after OpenAI launched ChatGPT. But Anthropic is the first big AI company to add this particular kind of agentic AI to its foundation model. The new abilities are not yet widely available, however, and Anthropic took care to note the functionality is imperfect and early in development. They’ve released it through the Claude API so companies and developers can try it and give feedback.
Anthropic’s competitors won’t take long to catch up. Google is reportedly developing a browser-controlling agent called Jarvis and could preview it as soon as December. And OpenAI has been working on its own computer-controlling agent for nearly a year.
While such agents may become a crucial component of operating systems—perhaps supplanting apps or changing how we use them—companies are aiming for enterprises first for things like data entry or form completion. “What would you do if you got rid of a bunch of hours of copy and pasting or whatever you end up doing?” Mike Krieger, OpenAI’s chief product officer, told Wired. “I'd go and play more guitar.”
Agents reaching their fullest potential are still off in the future though. Claude does outperform the competition according to the OSWorld benchmark, a measure of an AI model's computer skills. But its scores are low compared to human abilities. The AI completes tasks 14.9 percent of the time versus 75 percent for the average person. (Other quirks, Anthropic said, include Claude losing interest and wandering down internet rabbit holes.)
Agents recording screenshots may also be a tough sell for companies and individuals unless Anthropic proves they’re very secure. Microsoft received considerable blowback about a screenshotting feature on its Copilot+ PCs earlier this year, with critics calling out both security and privacy issues. Also, letting agents run wild on more complicated tasks magnifies existing risks. Hallucinations are still common in these models, so allowing one control of a device to take a series of automatic actions without human oversight means it could do damage before anyone notices and can reel it back in.
While agents may be next year’s dish, companies are still looking to cash in on existing AI products. OpenAI recently launched its first search product to compete with Google and Perplexity and a new product for coding. Google, meanwhile, has a hit on its hands with NotebookLM and says 25 percent of its new code is now AI-generated. Additionally, cloud revenues at Amazon, Microsoft, and Google are growing thanks to AI adoption.
One way to make sense of all this is that developers are still playing with generative AI and finding new ways to mold, package, and improve existing models for different purposes. Meanwhile, behind the scenes, the next generation of models is in development. An article on The Verge claimed OpenAI is currently training a successor to GPT-4, reportedly code-named Orion, and could release it as early as December. OpenAI disputed the reporting, but there’s little reason to believe it isn’t going full tilt to finish its next big algorithm—and that Meta and Google aren’t on the same page.
If the next crop of even-bigger AI models proves to be another leap in capability, which isn’t guaranteed, they could function like a software update for existing applications—coding, research, information synthesis, agents, and chatbots—while enabling the discovery of more new ones. As investors and companies continue funneling funds into the sector, the pressure is on to spark more ChatGPT moments—and beyond that, massive hits to pay all debts.
Know someone who might enjoy the Singularity Monthly? Share this newsletter with them.
|
|