Trying to break OpenAI's new o1 models? You might get banned

wrecking-gettyimages-1132404873 — John Lund/Getty Images

Even the smartest AI models are prone to hallucinations, which can be amusing when provoked. May I remind you of glue pizza? However, if you try to induce hallucinations in OpenAI’s advanced o1 reasoning models, you may lose access to the model altogether.

OpenAI unveiled its o1 models last week, which were trained to “think before they speak” and, as a result, are capable of solving complex math, science, and coding problems using advanced reasoning. With a model touting such impressive capabilities, naturally, people set out to break its string of reasoning.

Also: How well can OpenAI’s o1-preview code? It aced my 4 tests – and showed its work in surprising detail

However, as first spotted by Wired, users who tried to do so got warnings within the chatbot interface, informing them that their actions violated OpenAI’s terms of use and usage policies. The user actions included mentioning terms such as “reasoning trace” or “reasoning.”

Furthermore, a user shared the OpenAI ChatGPT Policy Violation email via X, which informed them the system detected a policy violation for “attempting to circumvent safeguards or safety mitigations in our [OpenAI’s] services.” The email also requested that the user “halt” that activity. Although the email screenshot did not specify the consequences, OpenAI delineates the consequences of such violations in its Terms of Use documentation.

Per OpenAI’s Terms of Use, last updated on January 31, 2024, the company reserves the right to “suspend or terminate your access to our Services or delete your account” if they determine that a user breached the Terms or Usage Policies, could cause risk or harm to OpenAI and other users, or do not comply with the law.

Reactions to these policies have been a mixed bag, with some people complaining that these limitations hinder proper red-teaming, while others are glad that active precautions are being taken to protect against loopholes in newer models.

If you want to try the o1 models for yourself, you can create a free ChatGPT account, sign in, toggle “alpha modes” from the model picker, and choose o1-mini. If you want to try o1-preview, you’ll have to subscribe to a ChatGPT Plus account for $20 per month.

Source link