OpenAI is releasing a “research preview” of an AI agent called Operator that can “go to the web to perform tasks for you,” according to a blog post. “Using its own browser, it can look at a webpage and interact with it by typing, clicking, and scrolling,” OpenAI says. It’s launching first in the US for subscribers of OpenAI’s $200 per month ChatGPT Pro tier.
Operator relies a “Computer-Using Agent” model that combines GPT-4o’s vision capabilities with “advanced reasoning through reinforcement learning” to be able to interact with GUIs, OpenAI says. “Operator can ‘see’ (through screenshots) and ‘interact’ (using all the actions a mouse and keyboard allow) with a browser, enabling it to take action on the web without requiring custom API integrations,” according to OpenAI.
Operator can use reasoning to “self-correct,” and if it gets stuck, it will give the user control. It will also ask the user to take over when a website asks for sensitive information like login credentials and “should” ask for a user to approve actions like sending an email. OpenAI also says that Operator has been designed to “refuse harmful requests and block disallowed content.”
OpenAI says that it’s collaborating with companies such as DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, Uber so that Operator “addresses real-world needs while respecting established norms.” But the company cautions that not everything might work as you expect just yet; the tool currently has problems with “complex interfaces like creating slideshows or managing calendars.”
Down the line, OpenAI says it plans to bring Operator to Plus, Team, and Enterprise users and “integrate these capabilities into ChatGPT.”
+ There are no comments
Add yours