OpenAI has released its GPT-4o System Card, a study paper outlining the safety precautions and risk assessments the startup carried out prior to releasing its latest model.
GPT-4o was publicly launched in May of this year. Before its introduction, OpenAI used an external group of red teamers, or security specialists attempting to detect flaws in a system, to identify critical dangers in the model (a fairly common technique). They investigated hazards such as the likelihood that GPT-4o would create unauthorized clones of someone’s voice, pornographic and violent content, or portions of replicated copyrighted audio. The results are now being released.
According to OpenAI’s own criteria, the researchers determined that GPT-4o posed a “medium” risk. The overall risk level was calculated using the highest risk rating in four categories: cybersecurity, biological risks, persuasion, and model autonomy. Except for persuasion, the researchers discovered that some writing samples from GPT-4o may be more effective at altering readers’ opinions than human-written content — while the model’s examples were not more persuasive overall.
Lindsay McCallum Rémy, an OpenAI spokesperson, told The Verge that the system card includes preparedness evaluations created by an internal team, as well as external testers listed on OpenAI’s website as Model Evaluation and Threat Research (METR) and Apollo Research, both of which create evaluations for AI systems.
This isn’t the only system card OpenAI has released; GPT-4, GPT-4 with vision, and DALL-E 3 were all tested and the research published. But OpenAI is releasing this system card at a critical juncture. The corporation has faced constant criticism for its safety standards, from its own employees to state politicians. Only minutes before the release of GPT-4o’s system card, The Verge exclusively reported on an open letter from Sen. Elizabeth Warren (D-MA) and Rep. Lori Trahan (D-MA) requesting information about how OpenAI handles whistleblowers and safety reviews. That letter outlines the many safety issues that have been publicly called out, including CEO Sam Altman’s brief exit from the company in 2023 as a result of the board’s concerns and the departure of a safety executive, who claimed that “safety culture and processes have taken a backseat to shiny products.”
Furthermore, the corporation is producing a very powerful multimodal model right before the US presidential election. There is a definite risk that the model will disseminate misinformation or be hijacked by malevolent actors — even if OpenAI hopes to emphasize that it is evaluating real-world scenarios to prevent misuse.
There have been numerous calls for OpenAI to be more public, not only with the model’s training data (was it trained on YouTube?), but also with its safety tests. In California, where OpenAI and many other major AI laboratories are situated, state Sen. Scott Wiener is attempting to establish legislation regulating massive language models, including provisions that would make firms legally liable if their AI is used in detrimental ways. If the bill is passed, OpenAI’s frontier models will be required to conform with state-mandated risk assessments before being made available to the public. But the most important conclusion from the GPT-4o System Card is that, despite the presence of external red teamers and testers, most of this is dependent on OpenAI to evaluate itself.