In the short-term, the most dangerous thing about AI language models may be their ability to emotionally manipulate humans if not carefully conditioned. The world saw its first taste of that danger in February 2023 with the launch of Bing Chat, now called Microsoft Copilot, which once attacked Ars Technica reporter Benj Edwards publicly after he revealed the bot’s secret name (Sydney) and instructions in a series of articles.
During its early testing period, the temperamental chatbot gave the world a preview of an “unhinged” version of OpenAI’s GPT-4 prior to its official release. Sydney’s sometimes uncensored and “emotional” nature (including use of emojis) arguably gave the world its first large-scale encounter with a truly manipulative AI system. The launch set off alarm bells in the AI alignment community and served as fuel for prominent warning letters about AI dangers.
On November 19 at 4 pm Eastern (1 pm Pacific), Ars Technica Senior AI Reporter Benj Edwards will host a livestream conversation on YouTube with independent AI researcher Simon Willison that will explore the impact and fallout of the 2023 fiasco. We’re calling it “Bing Chat: Our First Encounter with Manipulative AI.”
Willison, who co-invented the Django web framework and has served as an expert reference on the subject of AI for Ars Technica for years, regularly writes about AI on his personal blog and coined the term “prompt injection” in 2022 after pranksters discovered how to subvert the instructions and alter the behavior of a GPT-3-based automated bot that posted on Twitter at the time.
The “culprit and the enemy” speaks out
Each input fed into a large language model (LLM) like the one that powered Bing Chat is called a “prompt.” The key to a prompt injection is to manipulate the model’s responses by embedding new instructions within the input text, effectively redirecting or altering the AI’s intended behavior. By crafting cleverly phrased prompts, users can bypass the AI’s original instructions (often defined in something called a “system prompt”), causing it to perform tasks or respond in ways that were not part of its initial programming or expected behavior.
+ There are no comments
Add yours