Anthropic's New AI Model Takes Control of Your Computer

Anthropic says it is teaching its Claude AI model to perform general computing tasks based on prompts. In demonstration videos, the model is shown controlling the cursor of a computer to conduct research for an outing on the town, searching the web for places to visit near the user’s home and even adding an itinerary to their desktop calendar.

The functionality is only available to developers today, and it’s unclear what pricing looks like or how well the tech actually works. Anthropic says in a tweet about the new capabilities that during testing its model, Claude got sidetracked from a coding assignment and started searching Google for images of Yellowstone National Park. So, yeah… there are still kinks to work out.

From a technical perspective, Anthropic says that Claude is able to control a computer by taking screenshots and send them back to the model, studying what’s on the screen, including the distance between the cursor position and a button it needs to click, and returning commands to continue with a task.

Anthropic, which is backed by the likes of Amazon and Google, says Claude is the “first frontier AI model to offer computer use in public beta.”

It’s unclear what automated computer use might be useful for in practice. Anthropic suggests it could be used to perform repetitive tasks or open-ended research. If anyone figures out how to use this new functionality, the /r/overemployed community on Reddit will likely be the first. At the very least it could perhaps be the new mouse jiggler for Wells Fargo employees. Or maybe you could use it to go through your social media accounts and delete all your old posts without needing to find a third-party tool to do it. Things that are not mission critical or require factual accuracy.

Although there has been a lot of hype in the AI space, and companies have spent billions of dollars developing AI chatbots, most revenue in the space is still generated by the companies like Nvidia that provide GPUs to these AI companies. Anthropic has raised more than $7 billion in the past year alone.

The latest buzzword tech companies are pumping to sell the technology is “agents,” or autonomous bots that purportedly can complete tasks on their own. Microsoft on Monday announced the ability to create autonomous agents with Copilot that could do “everything from accelerating lead generation and processing sales orders to automating your supply chain.”

Salesforce CEO Marc Benioff dismissively called Microsoft’s product “Clippy 2.0” for being inaccurate—though of course, he was saying this as he promotes Salesforce’s own competing AI products. Salesforce wants to enable its customers to create their own custom agents that can serve purposes like answering customer support emails or prospecting for new clients.

White collar workers still don’t seem to be taking up chatbots like ChatGPT or Claude. Reception to Microsoft’s Copilot assistant has been lukewarm, with only a tiny fraction of Microsoft 365 customers spending the $30 a month for access to AI tools. But Microsoft has reoriented its entire company around this AI boom, and it needs to show investors a return on that investment. So, agents are the new thing.

The biggest problem, as always, is that AI chatbots like ChatGPT and Google’s Gemini produce a lot of output that’s factually inaccurate, poor in quality, or reads like it obviously wasn’t written by a human. The amount of time it takes to correct and clean up the bot’s output almost negates any efficiencies produced by them in the first place. That’s fine for going down rabbit holes in your spare time, but in the workplace it’s not acceptable to be producing error-riddled work. I would be nervous about setting Claude to go wild through my email, only for it to send people jargon back in response, or screw up some other task that I have to go back and fix. The fact that OpenAI itself admits most of its active users are students sort of says it all.

Anthropic in a tweet about the new functionality itself admits that computer use should be tested with “low-risk tasks.”

Source link