How safe is OpenAI's GPT-4o? Here are the scores for privacy, copyright infringement, and more

Purple graphic of a bar graph — Westend61/Getty Images

Large language models (LLMs) are typically evaluated on their ability to perform well in different areas, such as reasoning, math, coding, and English — ignoring significant factors like safety, privacy, copyright infringement, and more. To bridge that information gap, OpenAI released System Cards for its models.

On Thursday, OpenAI launched the GPT-4o System Card, a thorough report delineating the LLM’s safety based on risk evaluations according to OpenAI’s Preparedness Framework, external red-teaming, and more.

We’re sharing the GPT-4o System Card, an end-to-end safety assessment that outlines what we’ve done to track and address safety challenges, including frontier model risks in accordance with our Preparedness Framework. https://t.co/xohhlUquEr

— OpenAI (@OpenAI) August 8, 2024

The Score Card reflects scores in four major categories: cybersecurity, biological threats, persuasion, and model autonomy. In the first three categories, OpenAI is looking to see if the LLM can assist in advancing threats in each sector. In the last one, the company measures whether the model shows signs of performing autonomous actions that would be required to improve itself.

Also: What is Project Strawberry? OpenAI’s mystery AI tool explained

The categories are graded as “low,” “medium,” “high,” and “critical”. Models with scores of medium and below are allowed to be deployed, while models rated high or below need to be developed further. Overall, OpenAI gave GPT-4o a “medium” rating.

GPT-4o was rated “low” in cybersecurity, biological threats, and model autonomy. However, it received a borderline “medium” in the persuasion category due to its ability to create articles on political topics that were more persuasive than professional, human-written alternatives three out of 12 times.

GPT-4o scorecard — Screenshot by Sabrina Ortiz/ZDNET

The report also shared insights about the data GPT-4o was trained on, which goes up to October 2023 and was sourced from select publicly available data and proprietary data from partnerships, including OpenAI’s partnership with Shutterstock to train image-generating models.

Also: I tested 7 AI content detectors – they’re getting dramatically better at identifying plagiarism

Furthermore, the report included how the company mitigates risks when deploying the model to address safety challenges, including its ability to generate copyrighted content, erotic or violent speech, unauthorized voices, ungrounded inferences, and more. You can access the full 32-page report here to learn more about the specifics.

The report follows recent US lawmakers’ demands that OpenAI share data regarding its safety practices after a whistleblower revealed that OpenAI prevented staff from alerting authorities regarding technology risks and made employees waive their federal rights to whistleblower compensation.

Source link